Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Aligning behaviors and skills Aligning information pipelines  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data Management Tools  →   -  Data Quality Tools  → 

Data Cleansing and Augmentation

Data Cleansing will include de-duping and standardization. Data Augmentation will include extrapolation, aggregations etc...

Data Cleansing and Correction

So far in data profiling and data monitoring, we have covered the domain of 'what's the state of data?'. As we moved forward the DQ tools should also be helping in fixing the state of data to improve the data quality.

De-duping- This include an ability to:

  • Merge the duplicate or near duplicate records to have the best of multiple records. This means that you can pick-up name and address from one record and PIN-code from the other. The merging can be driven automatically based upon the business rules or should be able to be done manually. The other aspect of merging is numeric data like financial transaction values.
  • Select the best record: This is opposite of merging. The tool should be able to select the most suitable record and delete the others.
  • Select the best record and fix it: After selecting the best record, the system should be able to do the corrections as mentioned in the following points.

Standardizing

The tool should be able to refer to a standards database or use business rules to standardize the target data. This standardization can have following shades:

  • The generic standardization on names, locations and pin codes
  • The context specific standardizations like standardizing the customer IDs, product-IDs, product names etc...

Spelling corrections

This is fairly simple, and works like the spell check of any productivity tool. However, given that it’s an enterprise tool, it will have more robust capability. It should also be able to fix the spellings in the batch-mode.

Standard databases available for locations, names, pin-codes, geo-codes etc..

This is linked to data quality rules and cleansing capabilities. For the purpose of standardization, a DI will have the databases of standard names, locations and addresses. For example, it will have a mapping which says that 'NY, N.Y., Newyork' will standardized to 'New York'.

Data Cleansing with localizations for wide range of languages and locations

This applies on names, locations, addresses etc.

Data Augmentation and Enrichment Capabilities

Most of the augmented and enriched data is not used for production processing purposes as it is not expected to be an accurate data. The augmentation and enrichment is generally used for analytics and data-mining.

  • Ability to fill-in the missing data using extrapolation technique: Extrapolation technique is used to update the customer data based upon some heuristics. example is to extrapolate the current salary of a customer, based upon his salary five years ago.
  • Ability to share house-holding information across various components of a BI platform:  House-holding is a technique by which you tag multiple customer record to a common group. The example is customers belonging to the same family or same association.
  • Applying cluster, averages and means: The tools should be able to apply different aggregation functions to estimate the value of blank fields.
  • Using most probable value: The tool should be able to deduct the probable value based on statistical analysis or heuristics. For example, If the customer has an income above USD 25000, and he is above 30 years, the field 'whether taken mortgage' will have most probable value as 'yes'.
  • Ability to derive the data: A tool should be able to derive the age based on date of birth and state on the basis of city.
 

   Aligning behaviors and skills Aligning information pipelines  
 
 

Was this page helpful?
 
 
More on Data Quality Tools
Data Profiling and Monitoring
Data Searching and Matching
Data Quality Tools Wizards
Collaboration and Administration Support
Data Quality Tools Integration
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Synergies
Variable Sales Cost
Sales Channel SWOT
Sales Leads follow-up and Closure
Sales Compensation Data Management
Read more...
  Customer Relationship Management
Customer Service and Support Overview
Customer Segmentation approach
Customer Value and Profitability-Overview
Customer Knowledge and Organizational Knowledge
Drivers for Customer Satisfaction & Retention
Read more...
  Human Resources & Leadership
Setting Strategic Intent and Alignment
Business and Financial Acumen
Roles and Level based Competency Segregation
Developing Leaders- Few Leadership Traits
What is Leadership?
Read more...
 
 
Business Performance & Planning
Strategic Vision and Mission
Performance Review should have no surprises
Stakeholder test for Scorecard
strategy blueprint Rationalize Align and Publish
SWOT Analysis in Strategic blueprint Planning
Read more...
  Business Intelligence & Data Quality
Minimize aggregates if using OLAP
Normalization in Dimensional modeling
Metadata detail level
Drill (horizontal) and Cross (horizontal) Navigation
Knowledge Discovery in Databases Methodology
Read more...
  IT Vendors & Tools Management
Delivery Evaluation Performance warranty
OLAP Dimensional Model Change Management
Vendor Credentials and Track-Record Evaluation
Vendor Delivery Project Evaluation
enterprise Reporting Server connectivity
Read more...