|
Following is the list of methods deployed to identify the matching candidates in the customer database.
Customer Data Searching through ParsingParsing is a technique, which splits the long strings of the customer data into individual components, which are then fed into the data searching & matching routines. For example - The parsing rules tell the parsing program, that where-ever it finds a word, which matches any of the possible entries of 'first name' reference list, it should assign the same to the first name. It then tells that where-ever it finds the character string, which is five character and not with in the reference list of names, addresses, it should assign it to the ZIP. In another very simplistic example, have you ever used the function of 'text to columns' in excel. However, the parsing tools are highly evolved now-a days. For example : Input Data BETH CHRISTINE PARKER, SLS MGR REGIONAL PORT AUTHORITY FEDERAL BUILDING 12800 LAKE CALUMET HEGEWISCH IL Output Data Parsed data First Name: BETH Middle Name: CHRISTINE Last Name: PARKER Title: SLS MGR Firm: REGIONAL PORT AUTHORITY Firm location: FEDERAL BUILDING Range: 12800 Street: LAKE CALUMET City: HEGEWISCH State: IL
Customer Data Searching through Pattern MatchingIn conjunction with parsing, you can feed all possible patterns in which data could be stored. The tools OR the queries, which you can run then scan the data for each pattern and produce the following output: - The data parsed as per any pattern, which fits the data. For example one telephone could be parsed using a pattern 00-XXX (Country code)-YYY (Area Code) -ZZZ-AAAA and the other could be XXX-YYY-ZZZ-AAAA OR the third one (YYY)-ZZZ-AAAA.
- State on which and how much of the data is meeting which kind of pattern.
- What are the patterns in which the data is stored?.
Searching Customer Data through N-Gram IndexingAn n-gram is a set of 'n' consecutive characters extracted from a word OR code. Typical values for 'n' are 2 OR 3. These extracted n-grams are subsequently indexed for all names OR addresses in the database. At search time, the idea is that words OR codes that are similar between the search and file data will have a high proportion of n-grams in common. N-grams are particularly well suited to string and text searching; However, unless supported by extensive rule bases for phonetic and synonym variation, as well as for noise words, do not readily overcome the typical error and variation found in identity data, nor do they easily scale to very large data volumes. Match-Codes
We would consider it as a pattern matching, with a little difference. In the match code, there are the defined sequences in which data could reside. For example, you can have a code of first name+Last Name+address+ ZIP. This match code will not have a standard length of each component (as in pattern matching). Each component will be referring to domain rules (OR list of possible values).
As you start the match code program, it will pick-up the first code and apply it on customer record. If the match code is fully complied with, it is OK OR it will move onto the next match-code. Searching With WildcardsIts a conventional method of search, where the 'like' OR '%' OR '?' keywords are used. This can be used as a strong arm tactic to run queries and then keep on narrowing the results using various filters.
There is one more way of narrowing the outcome of results. One can create a data base of possible variations for names, cities etc, and place an additional condition on the outcomes to the system. |