Data Mapping Individual Values - Single Column analysis The most basic, yet powerful, data quality analysis consists of querying each and every individual column in a database. Each column is examined in order to find and sort all the individual values, together with their frequency.
Single column analysis, primarily detects data format errors, valid set errors, valid range errors, null/non null errors and so on and above all primary key validation. (refer domain value controls)
For example- PIN codes (many a times used for location based analysis and also used for mailing lists). ZIP codes are something that usually has a specific format for a given country, for example- five digits. In that case, they should not contain four digits and certainly not letters. A comprehensive single column analysis can validate the ZIP+4+2 format | ZIP | +4 | +2 | %age values | | Valid | Valid | Valid | 5% | | Valid | Valid | Not Null, Invalid | 10% | | Valid | Valid | Null | 5% | | Valid | Not Null, Invalid | NA | 10% | | Valid | Null | NA | 25% | | Not Null, Invalid | NA | NA | 40% | | Null | NA | NA | 5% | Table shows an example of PIN codes, that have been sorted, and their frequency. Non-existing, suspicious OR the outright faulty values can be found.
The verification of address fields in a database is an example- of a data format analysis. Customer code verification is the format as well as the coding logic validation. Transaction value field verification can be valid range validation (one cannot have home water purifier selling for USD 1 million) etc. Null/Non null validation is generally common across all verifications. All master table primary keys go through ‘being unique’ verification.
Multiple column Analysis/Coupled Values analysis
Once the basic single column analysis has been done, it is time to analyze relations between different columns containing related information. This analysis, also called the multiple/double column analysis, verifies the data model linkages like foreign key match, multiple column primary key verification. Please also refer and data model controls. Specific examples of multiple column analysis are:
- The foreign key referential integrity check (A cheque code in the collection table should have an equivalent record in the ‘Financial instrument detail table’. Wondering why we need to normalize?? This will be in a case, when a single cheque is collected against a multiple sales).
- Multiple column primary key uniqueness validation- (‘Flight code+ sequence number+date’ needs to be unique in the ‘flight details’ master of an airline operations management system)
Business rules validation (If the manufactured item status is ‘WIP’, no record should be found in ‘billing’ table). - Completeness verification (at least one contact detail telephone 1, telephone 2, mobile no.. is available).
- Transaction Consistency verification (If a card status is ‘billed’ in core card processing system, it should also be ‘billed’ in card statement system).
- Master table to master table consistency verification (Customer description in the CRM system is congruent to the customer description in core system customer master AND the all ‘Customer master’ tables should be in synch for the common information fields).
- Master table to transaction table consistency verification (similar to foreign key check. The sales should not be selling the products which are not existing in product master. Billing should not be sent to the customers, which are not existing OR are inactive in customer master)
TIP- This assessment can be done in two stages- One can first use the data definition language and the program specs to understand the business rules associated with the data. In the second stage, one can then run the discovery process on the data itself, using these business rules as a reference. This is different from a single stage pure discovery (reverse engineering) process.
PLEASE REFER Execution-MiHPractice Tool Data Mapping & Assessment Report |