|
Data Profiling
To understand more about the data profiling, you may refer Data Mapping and Assessment chapter. Data Model analysis, and Column analysis fall under the preview of Data Profiling. This includes Single Column Analysis, Multi-Column analysis, Data Value Distributions, Rate of Data Build-up.
Ability to profile data for various levels and scopes:
You should be able to define the target data for data profiling at various levels. You should be able to assign target data as:
- For a given database
- For a given entity (for example - all customer related data in database...)
- For a data set: All data in the database which was created in last 3 months.
- Data filtered through complex conditions (for example data updated by a given feeding system)
An easy to use interface to find the areas of poor quality
One should be able to provide the conditions, and find the areas of bad quality. Moreover, the data quality system should be pre-defined conditions to check the bad quality. For example rules which define what is an incorrect telephone number, address, PIN-code, name etc...
Ability to profile the value distributions
A data-profiling tool not only checks the quality of data, but it also finds the value distributions of the data. For example you should be able to find on how many instances of the 'Joe' and 'Joseph' exist, or how many records have faulty telephone numbers or how many instances are against what range of customer income (income distribution...).
Ability to do single column and multi-column analysis
Through Single column analysis, the tool should be able to detect data format errors, valid set errors, valid range errors, null/non null errors and so on and above all primary key validation. Through Multiple column analysis, the system should be able to do:
- The foreign key referential integrity check
- Multiple column primary key uniqueness validation
- Business rules validation
- Completeness verification
- Transaction Consistency verification
- Master table to master table consistency verification
- Master table to transaction table consistency verification
Data Quality rules should be able to invoked custom exits
This is a typical feature for any good development architecture. You should be able to go out of your data quality software program and invoke a data quality validation routine and get back to the DQ program with the results. One example of this is the pin-code standardization routines where standard off the shell program exists and you should be able use them through these custom hooks.
Data Monitoring
We recommend you to refer Data Quality Monitoring and what is data quality in case you need to have additional subject matter expertise
Database Monitoring
The DQ tool should be able to do a database level monitoring. By database level monitoring we are referring to post-fact database level monitoring. A DQ tool should be able to monitor the quality of data as defined by business rules. This can include:
- The dirty data check: The garbled telephone numbers, PIN-codes etc.
- Data Completeness check
- Data Consistency check
The key here is the complexity of business rules which you can place and also the speed of completing the monitoring tasks and generating monitoring report. If your database monitoring is done at detailed transaction level, it may add a significant load on your system. One advice could be that you may do it on an offline replica.
Transaction Monitoring
Transaction monitoring is more of a real-time monitoring, which is done as the processing is going on in your system. This part of DQ tool, sits on top of the target system. One should be able to place the:
- Transaction Monitoring business rules
- The business rules related to the action to be taken once an exception is trapped
- Execute the action (like sending a message to the target system to roll-back, send alerts, commit but add a flag etc...)
Batch Monitoring
While the transaction monitoring can be done both for batch and online processing, the batch monitoring is a combination of database-monitoring and transaction monitoring. In a batch processing you should be able to check on transaction by transaction or bunch of transactions or post the completion of a specific job.
In-built data profiling tools or ability to integrate with one
The data quality monitoring capability of a DQ tool should be well-linked to data profiling capability. Data Profiling studies the actual state of data and data monitoring performs a similar function. The key difference is the data monitoring is highly targeted.
Ability to generate data monitoring logs and reports:
A data monitoring tool should be able to have a trail of what data monitoring activities were done at what time, the resources used and the time spent. The data monitoring reports are an obvious requirements. These reports should be configurable and definable in the DQ tool.
Ability to generate real-time or post-facto alerts
A good data monitoring tool will generate pager, mobile, SMS and email alert to multiple recipients, based upon the business rule defined.
|