Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Customer Data Variations Customer Data Correction and Techniques  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  Data Quality  →  CHAPTER -  Customer Data Quality for Customer Relationship Management  → 

Customer Data Searching and Matching

Before we fix or augment the data, its important to identify the bad data and also to assess on how bad it is. This topic lists out the methods for identifying the matching or duplicate reports.

Following is the list of methods deployed to identify the matching candidates in the customer database.

Customer Data Searching through Parsing

Parsing is a technique, which splits the long strings of the customer data into individual components, which are then fed into the data searching & matching routines. For example - The parsing rules tell the parsing program, that where-ever it finds a word, which matches any of the possible entries of 'first name' reference list, it should assign the same to the first name. It then tells that where-ever it finds the character string, which is five character and not with in the reference list of names, addresses, it should assign it to the ZIP.

In another very simplistic example, have you ever used the function of 'text to columns' in excel. However, the parsing tools are highly evolved now-a days. For example :

Input Data

BETH CHRISTINE PARKER, SLS MGR
REGIONAL PORT AUTHORITY
FEDERAL BUILDING
12800 LAKE CALUMET
HEGEWISCH IL

Output Data

Parsed data
First Name: BETH
Middle Name: CHRISTINE
Last Name: PARKER
Title: SLS MGR
Firm: REGIONAL PORT AUTHORITY
Firm location: FEDERAL BUILDING
Range: 12800
Street: LAKE CALUMET
City: HEGEWISCH
State: IL

Customer Data Searching through Pattern Matching

In conjunction with parsing, you can feed all possible patterns in which data could be stored. The tools OR the queries, which you can run then scan the data for each pattern and produce the following output:

  • The data parsed as per any pattern, which fits the data. For example one telephone could be parsed using a pattern 00-XXX (Country code)-YYY (Area Code) -ZZZ-AAAA and the other could be XXX-YYY-ZZZ-AAAA OR the third one (YYY)-ZZZ-AAAA.
  • State on which and how much of the data is meeting which kind of pattern.
  • What are the patterns in which the data is stored?.


Searching Customer Data through N-Gram Indexing

An n-gram is a set of 'n' consecutive characters extracted from a word OR code. Typical values for 'n' are 2 OR 3. These extracted n-grams are subsequently indexed for all names OR addresses in the database. At search time, the idea is that words OR codes that are similar between the search and file data will have a high proportion of n-grams in common. N-grams are particularly well suited to string and text searching; However, unless supported by extensive rule bases for phonetic and synonym variation, as well as for noise words, do not readily overcome the typical error and variation found in identity data, nor do they easily scale to very large data volumes.

Match-Codes

We would consider it as a pattern matching, with a little difference. In the match code, there are the defined sequences in which data could reside. For example, you can have a code of first name+Last Name+address+ ZIP. This match code will not have a standard length of each component (as in pattern matching). Each component will be referring to domain rules (OR list of possible values).

As you start the match code program, it will pick-up the first code and apply it on customer record. If the match code is fully complied with, it is OK OR it will move onto the next match-code.

Searching With Wildcards

Its a conventional method of search, where the 'like' OR '%' OR '?' keywords are used. This can be used as a strong arm tactic to run queries and then keep on narrowing the results using various filters.

There is one more way of narrowing the outcome of results. One can create a data base of possible variations for names, cities etc, and place an additional condition on the outcomes to the system.

 

   Customer Data Variations Customer Data Correction and Techniques  
 
All Topics in: "Customer Data Quality for Customer Relationship Management" Chapter
 Customer Data Quality Impacts →  Customer Data Challenges →  Customer Data Variations →  Customer Data Searching and Matching →  Customer Data Correction and Techniques →  Customer Data Augmentation and Enrichment → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
Data Quality

Featured Pages
ODS- Operational Data Store
Metadata detail level
Capabilities- Foundation & Transaction MDM
Data Group Master

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Data Warehouse Information Systems Assessment
Source to Target Table
OLAP in Business Intelligence- What is OLAP?
Back-Room Data Warehouse Metadata