Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Prioritized customer matrix Customer Service Alignment  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  Data Management Tools  →  CHAPTER -  Data Integration Tools  → 

Data Quality through Data Integration Tools

Data quality is core to data integration. Data Integration tools should be able to have embedded data quality controls in batch as well as real-time processes. Data Quality should include extensive data searching, matching, cleansing and augmentation capabilities. This can be combined with extensive reporting and exception handling features.

Data Quality generally comes out as the one of the biggest reasons for success or failure of data integration platforms. All benefits of DI are nullified, if it cannot provide the expected (though not perfect) or acceptable level of data quality. As all the data integration is happening at the back-end, the depth of data quality related features is key determinant to select you DI tool. We recommend you to go through Data Quality Section**, if you need subject matter knowledge around data quality.

  • Data Quality Controls in batch Processes: One should be able to incorporate data quality controls and assurance methods** in the batch processes. You should be able to do it through batch process/routines design wizard.
  • Data Quality Controls in near-time/real-time processes: This is same as what you do in batch-processes. The difference being that you will be doing it in near-time/real-time. This means that you would define different rules vis-a-vis what you do in the batch-process in terms of validations. For example, in a batch-process, you may place data quality validations over a batch of records, whereas in real-time, you may have to do it for every record which is passing through the DI environment.
  • Ability to configure data quality controls and business rules through a wizard or programming: The data quality controls, assurance methods and linked business rules, should be able to be designed through a wizard.  Wizard should be able to allow you to design:
    • Controls for a specific data integration task
    • Re-usable library of data quality assurance controls (like duplicate record check, duplicate file check, header and footer controls...).
    • The actions to be taken, in case of data quality issue during data integration, including the alerts to be generated.
  • Data Quality rules should be able to invoke custom exits: This is a typical feature for any good development architecture. You should be able to go out of your data integration program and invoke a data quality validation routine and get back to the DI program with the results. One example of this is the pin-code sanitation routines where standard off the shelf programs exists and you should be able use them through these custom hooks.
  • Data quality Rules should be callable through SOA: This means that you can call data quality routines or function as a service. This is again a typical feature for any sophisticated application/architecture.
  • Extensive data cleansing capabilities: A Data Integration tool though may be able to call data cleansing routines as a service or through custom exits. However, most of the cleansing activities are fairly specific to the ETL you are doing. Therefore there is not much scope for meeting all your cleansing needs through off the shelf packages/functions. A DI needs to have in-built (or an add-on) data cleansing capability fully integrated with the other DI functions.
  • Data Cleansing with localizations for wide range of languages and locations: This applies on names, locations, addresses etc.
  • Ability to build customized data quality rules through wizards: Just like other transformation business rules, one should be able to build and incorporate the data quality related rules (data-exchange checks, cleansing, standardization...)
  • Standard databases available for locations, names, pin-codes, geo-codes etc..: This is linked to data quality rules and cleansing capabilities. For the purpose of standardization, a DI will have the databases of standard names, locations and addresses. For example, it will have a mapping which says that 'NY, N.Y., Newyork' will standardized to 'New York'.
  • In-built data profiling tools or ability to integrate with one: Data Profiling tools are not only linked to data quality but in designing your overall integration routines. Data profiling essentially does an analysis on the actual data structure, data quality and data value distributions within a database. You may refer to the chapter of data mapping and assessment**, where some (not all) of the DMA activities are done through data profiling tools.
  • Post-facto or real-time data monitoring: You may refer data-monitoring** for more details. The DI tool should be able to do all the activities as mentioned in that chapter, for the data integration related tasks.
  • Ability to have the audit trail of all data integration activities: DI tool should attach the transaction codes, source system codes etc... for all the initial, intermediate and final activities so that you can trace-back to the original source and all the intermediate steps which have led to the current state.
  • Ability to generate real-time or post-facto alerts: This is applicable to any IT platform. One should be able to feed the rules on what alerts should be sent to whom in what condition.
 

   Prioritized customer matrix Customer Service Alignment  
 
All Topics in: "Data Integration Tools" Chapter
 Design & Analysis support and Wizards →  Connectivity and Computing Support →  Data Quality through Data Integration Tools →  Extraction, Transformation and Loading →  Data Integration Metadata Management →  Data Integration- Migration, Synchronization, Federation →  Collaboration and Administration Support → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
Data Management Tools

Featured Pages
Exception Analysis
Business Intelligence Metadata Management and Program
Analyze well, but be decisive
What is MDM-CDI?

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Parallel Dimensional Hierarchy
Multiple Path Hierarchy
Derived Facts table in DW
Data Quality Control Procedures