Subject of 'Data Quality is covered comprehensively in earlier sections as a 'Foundation Element' equally applicable across Data-Warehouse, Source Systems, Destination Systems OR all other Data Repositories.
'Data Quality' section provides 90% of the input you need for Data-Warehouse. In this page, we are looking at specific items in context of Data Warehouse. The Data Quality Assurance is taken care of during Extraction, Transformation process as well as improvement in source systems for upstream quality. The Data Quality assurance actions mentioned here are mainly the multi-cornered reconciliation routines.
There are no specific rules OR guidelines on when you should do reconciliation and checking. It depends on the level of complexity and confidence of staging process and criticality of data. The quality assurance steps include: Data Quality Assurance between Source systems and staging areas–
This validates that the facts and dimensions values have not changed at the base level between the source systems and the post-Transformation. The typical recon is the aggregate checks on all tables and sample checks on specific values.
Completion of Data Transformation in terms of Data-Sets, Attributes and the facts-
Many people question the need of doing this, when the scripts run do take care of this. However, if you have a good metadata, you have one more reference to check the creation of all Dimensions, Attributes and Facts.
Data Quality Assurance between the staging area and loaded area.This part takes care of the completeness of Loading. One can run a script to test the aggregate values across the data sets and loaded area.
Data Quality Assurance between the Loaded Area and Destination Systems.Check the data congruency between the desktop level cubes, the reports, Data modeling systems databases. Data Quality Assurance between the Destination Systems and Source Systems.Check the Data congruency between the source production systems and desktop level cubes, the reports, Data modeling systems databases. The quality assurance part of data Warehouse is generally the biggest component of the Data warehouse testing. Data warehouse is not a transactional system. Therefore, comparing aggregates across the stages of processing will mostly test its operations. |