Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Knowledge Discovery in Databases Methodology Data Mining Techniques- Predictive Modeling  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  KDD-Data Mining  →   -  KDD- Data Mining Overview  → 

Data Mining Techniques- Propensity Modeling

Data Mining Propensity modeling works on discovering a natural inclination or tendency across the variables. This group of techniques includes Cluster Analysis, Association Analysis and Sequential/Temporal patterns

Cluster Analysis

In an unsupervised learning environment the system has to discover its own classes and one way in which it does this is to cluster the data in the database as shown in the following diagram. The first step is to discover subsets of related objects and then find descriptions e.g. D1, D2, D3 etc. which describe each of these subsets.

 

cluster analysis- income debt example

 

 

 

 

 

 

 

Clustering and segmentation basically partition the database so that each partition or group is similar according to some criteria or metric. Clustering according to similarity is a concept which appears in many disciplines. If a measure of similarity is available there are a number of techniques for forming clusters. Membership of groups can be based on the level of similarity between members and from this the rules of membership can be defined. Another approach is to build set functions that measure some property of partitions i.e., groups or subsets as functions of some parameter of the partition. This latter approach achieves what is known as optimal partitioning.

Many data mining applications make use of clustering according to similarity for example to segment a client/customer base. Clustering according to optimization of set functions is used in data analysis e.g. when setting insurance tariffs the customers can be segmented according to a number of parameters and the optimal tariff segmentation achieved.

Clustering/segmentation in databases are the processes of separating a data set into components that reflect a consistent pattern of behavior. Once the patterns have been established they can then be used to "deconstruct" data into more understandable subsets and also they provide sub-groups of a population for further analysis or action which is important when dealing with very large databases. For example a database could be used for profile generation for target marketing where previous response to mailing campaigns can be used to generate a profile of people who responded and this can be used to predict response and filter mailing lists to achieve the best response.

Associations

Given a collection of items and a set of records, each of which contain some number of items from the given collection, an association function is an operation against this set of records which return affinities or patterns that exist among the collection of items. These patterns can be expressed by rules such as "72% of all the records that contain items A, B and C also contain items D and E." The specific percentage of occurrences (in this case 72) is called the confidence factor of the rule. Also, in this rule, A,B and C are said to be on an opposite side of the rule to D and E. Associations can involve any number of items on either side of the rule.

association- grocery example

 

 

 

 

 

 

 

A typical application, identified by IBM that can be built using an association function is Market Basket Analysis. This is where a retailer runs an association operator over the point of sales transaction log, which contains among other information, transaction identifiers and product identifiers. The set of products identifiers listed under the same transaction identifier constitutes a record. The output of the association function is, in this case, a list of product affinities. Thus, by invoking an association function, the market basket analysis application can determine affinities such as "20% of the time that a specific brand toaster is sold, customers also buy a set of kitchen gloves and matching cover sets."

Another example of the use of associations is the analysis of the claim forms submitted by patients to a medical insurance company. Every claim form contains a set of medical procedures that were performed on a given patient during one visit. By defining the set of items to be the collection of all medical procedures that can be performed on a patient and the records to correspond to each claim form, the application can find, using the association function, relationships among medical procedures that are often performed together.

Sequential/Temporal patterns

Sequential/temporal pattern functions analyze a collection of records over a period of time for example to identify trends. Where the identity of a customer who made a purchase is known an analysis can be made of the collection of related records of the same structure (i.e. consisting of a number of items drawn from a given collection of items). The records are related by the identity of the customer who did the repeated purchases. Such a situation is typical of a direct mail application where for example a catalogue merchant has the information, for each customer, of the sets of products that the customer buys in every purchase order. A sequential pattern function will analyze such collections of related records and will detect frequently occurring patterns of products bought over time. A sequential pattern operator could also be used to discover for example the set of purchases that frequently precedes the purchase of a microwave oven.

Sequential pattern mining functions are quite powerful and can be used to detect the set of customers associated with some frequent buying patterns. Use of these functions on for example a set of insurance claims can lead to the identification of frequently occurring sequences of medical procedures applied to patients which can help identify good medical practices as well as to potentially detect some medical insurance fraud.

 

   Knowledge Discovery in Databases Methodology Data Mining Techniques- Predictive Modeling  
 
 

Was this page helpful?
 
 
More on KDD- Data Mining Overview
What is KDD- Data Mining?
Knowledge Discovery in Databases Program
Knowledge Discovery in Databases Process
Data Mining Technology
KDD- Data Mining Issues & Challenges
Knowledge Discovery in Databases Methodology
Data Mining Techniques- Predictive Modeling
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators

Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Compensation components
Sales Channel Management Overview
Sales Channel Management System
Sales Campaign Business Intelligence
Sales Leads Allocation and Distribution
Read more...
  Customer Relationship Management
Customer Segmentation Parameters
Customer Value and Profitability Tips and Actions
Supply Chain for Customer Service and Support
Customer Value and Profitability Data Management
Customer Service and Support - Strategic Role
Read more...
  Human Resources & Leadership
People become the way you treat them
Feedback does not mean only negative feedback
Deliver Results
Lead Change
Act with Decisiveness
Read more...
 
 
Business Performance & Planning
Individual goal Sheet
3-4 hours in reviewing a scorecard.
Shifting the mind-set to leading Indicators- KPIs
Dashboard Health Checklist
Scorecard Health Checklist
Read more...
  Business Intelligence & Data Quality
Object Level Data Quality Tracking
Dimensional model scalability
Business Case for BI Investments
KDD- Data Mining Issues & Challenges
Technical Metadata for IT
Read more...
  IT Vendors & Tools Management
Data Integration Metadata Management
Data Quality Tools Integration
Vendor Management strength Evaluation
OLAP Security
Scalability Technical Evaluation
Read more...