Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Customer Data Variations Customer Data Correction and Techniques  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data Quality  →   -  Customer Data Quality for Customer Relationship Management  → 

Customer Data Searching and Matching

Before we fix or augment the data, its important to identify the bad data and also to assess on how bad it is. This topic lists out the methods for identifying the matching or duplicate reports.

Following is the list of methods deployed to identify the matching candidates in the customer database.

Customer Data Searching through Parsing

Parsing is a technique, which splits the long strings of the customer data into individual components, which are then fed into the data searching & matching routines. For example - The parsing rules tell the parsing program, that where-ever it finds a word, which matches any of the possible entries of 'first name' reference list, it should assign the same to the first name. It then tells that where-ever it finds the character string, which is five character and not with in the reference list of names, addresses, it should assign it to the ZIP.

In another very simplistic example, have you ever used the function of 'text to columns' in excel. However, the parsing tools are highly evolved now-a days. For example :

Input Data

BETH CHRISTINE PARKER, SLS MGR
REGIONAL PORT AUTHORITY
FEDERAL BUILDING
12800 LAKE CALUMET
HEGEWISCH IL

Output Data

Parsed data
First Name: BETH
Middle Name: CHRISTINE
Last Name: PARKER
Title: SLS MGR
Firm: REGIONAL PORT AUTHORITY
Firm location: FEDERAL BUILDING
Range: 12800
Street: LAKE CALUMET
City: HEGEWISCH
State: IL

Customer Data Searching through Pattern Matching

In conjunction with parsing, you can feed all possible patterns in which data could be stored. The tools OR the queries, which you can run then scan the data for each pattern and produce the following output:

  • The data parsed as per any pattern, which fits the data. For example one telephone could be parsed using a pattern 00-XXX (Country code)-YYY (Area Code) -ZZZ-AAAA and the other could be XXX-YYY-ZZZ-AAAA OR the third one (YYY)-ZZZ-AAAA.
  • State on which and how much of the data is meeting which kind of pattern.
  • What are the patterns in which the data is stored?.


Searching Customer Data through N-Gram Indexing

An n-gram is a set of 'n' consecutive characters extracted from a word OR code. Typical values for 'n' are 2 OR 3. These extracted n-grams are subsequently indexed for all names OR addresses in the database. At search time, the idea is that words OR codes that are similar between the search and file data will have a high proportion of n-grams in common. N-grams are particularly well suited to string and text searching; However, unless supported by extensive rule bases for phonetic and synonym variation, as well as for noise words, do not readily overcome the typical error and variation found in identity data, nor do they easily scale to very large data volumes.

Match-Codes

We would consider it as a pattern matching, with a little difference. In the match code, there are the defined sequences in which data could reside. For example, you can have a code of first name+Last Name+address+ ZIP. This match code will not have a standard length of each component (as in pattern matching). Each component will be referring to domain rules (OR list of possible values).

As you start the match code program, it will pick-up the first code and apply it on customer record. If the match code is fully complied with, it is OK OR it will move onto the next match-code.

Searching With Wildcards

Its a conventional method of search, where the 'like' OR '%' OR '?' keywords are used. This can be used as a strong arm tactic to run queries and then keep on narrowing the results using various filters.

There is one more way of narrowing the outcome of results. One can create a data base of possible variations for names, cities etc, and place an additional condition on the outcomes to the system.

 

   Customer Data Variations Customer Data Correction and Techniques  
 
 

Was this page helpful?
 
 
More on Data Quality for CRM
Customer Data Quality Impacts
Customer Data Challenges
Customer Data Variations
Customer Data Correction and Techniques
Customer Data Augmentation and Enrichment
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Channel Data Management
Sales Campaign Business Intelligence
Sales Compensation components
Enhancing Sales Channel productivity
Sales Behavior
Read more...
  Customer Relationship Management
Customer Service and Support - Strategic Role
Customer-Centric product-service management
Customer Segmentation Data Management
Customer Value and Profitability Data Management
Customer Segmentation approach
Read more...
  Human Resources & Leadership
Fostering Innovation
Maximize the output first and then the potential
Be straight and blunt, till you team gets used to it
Customer Focus
Fitting leadership dimension in employee performance
Read more...
 
 
Business Performance & Planning
Strategic Vision and Mission
Strategic Business Plan
Scorecard Health Checklist
SWOT Analysis in Strategic blueprint Planning
Performance Review should have no surprises
Read more...
  Business Intelligence & Data Quality
Fact tables for efficient data warehouse
Data Group Master
ODS- Operational Data Store
Data Mining Techniques- Predictive Modeling
Fix Business Intelligence at functional level first
Read more...
  IT Vendors & Tools Management
Technical Customization Evaluation
Data Cleansing and Augmentation
Vendor Commercial Evaluation- pre Implementation
Single point vendor needs to be cost-effective
Load, Log and Cache Management for Reports
Read more...