Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Aligning performance metrics- Cost-Quality-Time Aligning the contractual  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data Management Tools  →   -  Data Quality Tools  → 

Data Searching and Matching

Data Searching and matching is the first step that you take before you cleanse and augment the data. The Searching and matching capabilities should include the N-gram indexing, pattern matching and match-codes etc...

Data Searching and matching is essentially done to identify the duplicates as well as household groups (like people from the same family, same company, same associations etc...).

Searching and matching through parsing:

Parsing is a technique, which splits the long strings of the customer data into individual components, which are then fed into the data searching & matching routines. For example - The parsing rules tell the parsing program, that where-ever it finds a word, which matches any of the possible entries of 'first name' reference list, it should assign the same to the first name. It then tells that where-ever it finds the character string, which is five character and not with in the reference list of names, addresses, it should assign it to the ZIP. A good DQ tool will have pre-defined parsing algorithms. One should be able to change those parsing algorithms (though generally it is not done, because a good DQ tool has sophisticated and statistically well-tested parsing algorithms.

Searching and matching through pattern matching:

In conjunction with parsing, you can feed all possible patterns in which data could be stored. The tools OR the queries, which you can run then should scan the data for each pattern and produce the outputs as per a standard pattern. For example the sequence will be:

  • Different date patterns are fed into the tool.
  • The tool scans the database, and picks up the dates, which match one of the fed patterns.
  • The output of this dates data is fed into data correction module, which may standardize all the dates following different patterns into a standard pattern.

Searching and matching through N-gram indexing:

An n-gram is a set of 'n' consecutive characters extracted from a word OR code. Typical values for 'n' are 2 OR 3. These extracted n-grams are subsequently indexed for all names OR addresses in the database. At search time, the idea is that words OR codes that are similar between the search and file data will have a high proportion of n-grams in common.The n-gram index based searching is used for string or text matching.This is fairly standard algorithm which comes pre-defined with good data quality tools. One should be able to change the following parameters:

  • The number of characters for N-Gram indexing. For example you can specify that you want to create 2 or/and 3 or/and 4 character N-gram index.
  • The level of match: You can define on how much %age of n-gram match should be considered as a match. For example you can say that 90% 2 character N-gram match+ 75% 3 character N-gram match will be considered match candidates.

Searching and Matching through match-codes:

We would consider it as a pattern matching, with a little difference. In the match code, there are the defined sequences in which data could reside. For example, you can have a code of first name+Last Name+address+ ZIP. This match code will not have a standard length of each component (as in pattern matching). Each component will be referring to domain rules (OR list of possible values).

Searching through wild-cards:

This is strong arm tactic, whereby you can search along with wild-card symbols like ? or *.

 

   Aligning performance metrics- Cost-Quality-Time Aligning the contractual  
 
 

Was this page helpful?
 
 
More on Data Quality Tools
Data Profiling and Monitoring
Data Cleansing and Augmentation
Data Quality Tools Wizards
Collaboration and Administration Support
Data Quality Tools Integration
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Leads Management SWOT
Sales Objectives Clarity
Sales Campaign Business Intelligence
Sales strike rate
Sales Channel Management System
Read more...
  Customer Relationship Management
Customer Value and Profitability Data Management
What is Customer Segmentation?
Drivers for Customer Satisfaction & Retention
Customer-Centric product-service management
Customer Value and Profitability Tips and Actions
Read more...
  Human Resources & Leadership
Lead diverse and collaborative teams
Fostering Innovation
People become the way you treat them
Setting Strategic Intent and Alignment
Lead Change
Read more...
 
 
Business Performance & Planning
Strategic Planning leadership commitment
Strategy Blueprint Information Gathering
External Info Assessment Report
Stakeholder test for Scorecard
Internal Info Assessment Report
Read more...
  Business Intelligence & Data Quality
Metadata Architecture Selection
Data Mart + Dimensions +facts
Data Warehouse Design Phase
Dimension Attributes as NULL
Don't rely only on business requirements for BI
Read more...
  IT Vendors & Tools Management
Vendor Partnership and alliance Evaluation
Data Cleansing and Augmentation
Metadata Repository sharing
Multi Layer Architecture
Data Searching and Matching
Read more...