After years of experience of developing ETL applications, I can say that they are generally tested less rigorously than transaction systems. A number of technical factors contribute to this, including:
- difficulty in preparing adequate test data
- availability of test environments
- long run times
In addition to these factors, organization is difficult: the traditional separation of developers and testers does not work well for ETL projects. This is because testers have to be extremely well qualified. All testers must:
- be able to write complex SQL queries
- understand the peculiarities of working with large data volumes
- be aware of the commercial value of data
These are in fact the qualifications typically required of a mid-level developer. Nonetheless, it is almost impossible to find software testers with these qualifications. This is often down to an unwillingness to work as a "mere" tester.
Peer review techniques can therefore be considered, where both the developer and tester can be expected to possess similar qualifications. In my experience, however, peer reviews do not work either. This is due mainly to the following reasons:
- Developers are not motivated to pay much attention to what their colleagues are doing; they tend to concentrate on their own work above all else.
- Managers tend to be keen on "saving resources" by not allowing sufficient time for reviews.
- Although the reviewer receives input from the developer, he does not receive any from the business department. This makes it difficult for him to know whether the task has been correctly understood.
With these problems in mind, I would like to suggest a different method. This seeks to improve testing quality by separating the tasks of development and testing. The method requires two functions: let's call them analyst and developer. The developer is responsible for the actual development of the deliverables, for integration, performance tuning, etc. The analyst provides input for the developer – and here is the fundamental difference – checks the results of the developer's work. The main idea is to make the analyst responsible for the data tests. This works well because:
- Analysts are highly motivated to be critical about data quality because they are ultimately responsible for presenting the solution to the customer.
- They are the ones in the project team who know best what the data should look like.
- They are sufficiently qualified to be able to carry out tests properly.
This method does not require any additional hierarchy level within the project: analysts and developers are usually on the same hierarchical level. In my experience, a ratio of one analyst to two or three developers is about right.
There are other benefits too:
- Developers can concentrate on technology; their business knowledge and communication skills do not need to be as strong, which means you are not obliged to look for employees with a strong skillset in both areas.
- Relaxing the requirement for developers to be commercially aware makes it easier to outsource development tasks.
One of the fundamental principles of software testing is that no one person should ever be responsible for implementation and testing. To comply with this on DWH projects, analysts are given responsibility for checking the data.