Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Knowledge Discovery in Databases Methodology Data Mining Techniques- Predictive Modeling  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  KDD-Data Mining  →  CHAPTER -  KDD- Data Mining Overview  → 

Data Mining Techniques- Propensity Modeling

Data Mining Propensity modeling works on discovering a natural inclination or tendency across the variables. This group of techniques includes Cluster Analysis, Association Analysis and Sequential/Temporal patterns

This page is an extract from BIDS KDD Methodology authored by Kamlesh Mhashilkar-Head, Execution-MiH Services of Tata Consultancy Services

Cluster Analysis

In an unsupervised learning environment the system has to discover its own classes and one way in which it does this is to cluster the data in the database as shown in the following diagram. The first step is to discover subsets of related objects and then find descriptions e.g. D1, D2, D3 etc. which describe each of these subsets.

 

cluster analysis- income debt example

 

 

 

 

 

 

 

Clustering and segmentation basically partition the database so that each partition or group is similar according to some criteria or metric. Clustering according to similarity is a concept which appears in many disciplines. If a measure of similarity is available there are a number of techniques for forming clusters. Membership of groups can be based on the level of similarity between members and from this the rules of membership can be defined. Another approach is to build set functions that measure some property of partitions i.e., groups or subsets as functions of some parameter of the partition. This latter approach achieves what is known as optimal partitioning.

Many data mining applications make use of clustering according to similarity for example to segment a client/customer base. Clustering according to optimization of set functions is used in data analysis e.g. when setting insurance tariffs the customers can be segmented according to a number of parameters and the optimal tariff segmentation achieved.

Clustering/segmentation in databases are the processes of separating a data set into components that reflect a consistent pattern of behavior. Once the patterns have been established they can then be used to "deconstruct" data into more understandable subsets and also they provide sub-groups of a population for further analysis or action which is important when dealing with very large databases. For example a database could be used for profile generation for target marketing where previous response to mailing campaigns can be used to generate a profile of people who responded and this can be used to predict response and filter mailing lists to achieve the best response.

Associations

Given a collection of items and a set of records, each of which contain some number of items from the given collection, an association function is an operation against this set of records which return affinities or patterns that exist among the collection of items. These patterns can be expressed by rules such as "72% of all the records that contain items A, B and C also contain items D and E." The specific percentage of occurrences (in this case 72) is called the confidence factor of the rule. Also, in this rule, A,B and C are said to be on an opposite side of the rule to D and E. Associations can involve any number of items on either side of the rule.

association- grocery example

 

 

 

 

 

 

 

A typical application, identified by IBM that can be built using an association function is Market Basket Analysis. This is where a retailer runs an association operator over the point of sales transaction log, which contains among other information, transaction identifiers and product identifiers. The set of products identifiers listed under the same transaction identifier constitutes a record. The output of the association function is, in this case, a list of product affinities. Thus, by invoking an association function, the market basket analysis application can determine affinities such as "20% of the time that a specific brand toaster is sold, customers also buy a set of kitchen gloves and matching cover sets."

Another example of the use of associations is the analysis of the claim forms submitted by patients to a medical insurance company. Every claim form contains a set of medical procedures that were performed on a given patient during one visit. By defining the set of items to be the collection of all medical procedures that can be performed on a patient and the records to correspond to each claim form, the application can find, using the association function, relationships among medical procedures that are often performed together.

Sequential/Temporal patterns

Sequential/temporal pattern functions analyze a collection of records over a period of time for example to identify trends. Where the identity of a customer who made a purchase is known an analysis can be made of the collection of related records of the same structure (i.e. consisting of a number of items drawn from a given collection of items). The records are related by the identity of the customer who did the repeated purchases. Such a situation is typical of a direct mail application where for example a catalogue merchant has the information, for each customer, of the sets of products that the customer buys in every purchase order. A sequential pattern function will analyze such collections of related records and will detect frequently occurring patterns of products bought over time. A sequential pattern operator could also be used to discover for example the set of purchases that frequently precedes the purchase of a microwave oven.

Sequential pattern mining functions are quite powerful and can be used to detect the set of customers associated with some frequent buying patterns. Use of these functions on for example a set of insurance claims can lead to the identification of frequently occurring sequences of medical procedures applied to patients which can help identify good medical practices as well as to potentially detect some medical insurance fraud.

Note- BIDS Solutions encompass the proprietary solutions from TCS covering Business Intelligence and Data Warehousing landscape.

 

   Knowledge Discovery in Databases Methodology Data Mining Techniques- Predictive Modeling  
 
All Topics in: "KDD- Data Mining Overview" Chapter
 What is KDD- Data Mining? →  Knowledge Discovery in Databases Program →  Knowledge Discovery in Databases Process →  Data Mining Technology →  KDD- Data Mining Issues & Challenges →  Knowledge Discovery in Databases Methodology →  Data Mining Techniques- Propensity Modeling →  Data Mining Techniques- Predictive Modeling → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
KDD-Data Mining
Featured Pages
Data Warehouse Performance Management
Audit dimensions in the Fact table
De-normalized DW- Data Warehouse vs. Data mart
Data Min-Max Analysis

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Time Trending Data Analysis
Avoid Pure MOLAP
Object Level Data Quality Tracking- BAU
Stakeholder test for Scorecard