|
Refer Data Quality in Data Warehouse for more context.
For even a medium size data warehouse, there can be thousands of check you
have to run to assess the accuracy of ETL for the previous night data. The
most common approach is to run some reports on source systems and Data-warehouse
and compare the two. The level at which you will do depend upon the reliability
experience of the ETL. For example, you may like to do it for overall sales ' For
each sales office' for each sales officer in each sales office..
Typically you can fix the level of deviation which you will allow. You may:
- Sset the tolerance range of say +/
- X% to that of value in the source system, depending upon the level of
data quality and complexity of the data transformation done. This delta
can be for the same day or the average across certain number of days in
the past (wit maximum cap for any single day)
- Set-up the maximum standard deviation allowed across multiple instances
(multiple sales offices) for a given day.
- Set-up the maximum standard deviation allowed across the multiple days
for the same instance (for last 30 days for the same office)
If the data is not meeting the standards, one then need to drill down further
and investigate. Experience will let you optimize on the level of checks you
should apply. |