Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Aligning behaviors and skills Aligning information pipelines  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  Data Management Tools  →  CHAPTER -  Data Quality Tools  → 

Data Cleansing and Augmentation

Data Cleansing will include de-duping and standardization. Data Augmentation will include extrapolation, aggregations etc...

Data Cleansing and Correction

So far in data profiling and data monitoring, we have covered the domain of 'what's the state of data?'. As we moved forward the DQ tools should also be helping in fixing the state of data to improve the data quality.

De-duping- This include an ability to:

  • Merge the duplicate or near duplicate records to have the best of multiple records. This means that you can pick-up name and address from one record and PIN-code from the other. The merging can be driven automatically based upon the business rules or should be able to be done manually. The other aspect of merging is numeric data like financial transaction values.
  • Select the best record: This is opposite of merging. The tool should be able to select the most suitable record and delete the others.
  • Select the best record and fix it: After selecting the best record, the system should be able to do the corrections as mentioned in the following points.

Standardizing

The tool should be able to refer to a standards database or use business rules to standardize the target data. This standardization can have following shades:

  • The generic standardization on names, locations and pin codes
  • The context specific standardizations like standardizing the customer IDs, product-IDs, product names etc...

Spelling corrections

This is fairly simple, and works like the spell check of any productivity tool. However, given that it’s an enterprise tool, it will have more robust capability. It should also be able to fix the spellings in the batch-mode.

Standard databases available for locations, names, pin-codes, geo-codes etc..

This is linked to data quality rules and cleansing capabilities. For the purpose of standardization, a DI will have the databases of standard names, locations and addresses. For example, it will have a mapping which says that 'NY, N.Y., Newyork' will standardized to 'New York'.

Data Cleansing with localizations for wide range of languages and locations

This applies on names, locations, addresses etc.

Data Augmentation and Enrichment Capabilities

Most of the augmented and enriched data is not used for production processing purposes as it is not expected to be an accurate data. The augmentation and enrichment is generally used for analytics and data-mining.

  • Ability to fill-in the missing data using extrapolation technique: Extrapolation technique is used to update the customer data based upon some heuristics. example is to extrapolate the current salary of a customer, based upon his salary five years ago.
  • Ability to share house-holding information across various components of a BI platform:  House-holding is a technique by which you tag multiple customer record to a common group. The example is customers belonging to the same family or same association.
  • Applying cluster, averages and means: The tools should be able to apply different aggregation functions to estimate the value of blank fields.
  • Using most probable value: The tool should be able to deduct the probable value based on statistical analysis or heuristics. For example, If the customer has an income above USD 25000, and he is above 30 years, the field 'whether taken mortgage' will have most probable value as 'yes'.
  • Ability to derive the data: A tool should be able to derive the age based on date of birth and state on the basis of city.
 

   Aligning behaviors and skills Aligning information pipelines  
 
All Topics in: "Data Quality Tools" Chapter
 Data Profiling and Monitoring →  Data Searching and Matching →  Data Cleansing and Augmentation →  Data Quality Tools Wizards →  Collaboration and Administration Support →  Data Quality Tools Integration → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
Data Management Tools

Featured Pages
Master-Data-Management CDI Hub Architecture
Business Stakein data monitoring
Integrate stand-alone BI
Data Quality Program Initiation

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Dimensional Modeling vs. Relational Modeling
MDM CDI Hub Source
BI Competency Centre- Services
Data Warehouse Project Scoping and Planning