Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   KDD- Data Mining Issues & Challenges Data Mining Techniques- Propensity Modeling  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  KDD-Data Mining  →   -  KDD- Data Mining Overview  → 

Knowledge Discovery in Databases Methodology

Knowledge Discovery in Databases (KDD) methodology comprises five phases- Definition (vision/ business case/tools), specification (requirements/techniques/data analysis), design (data preparation/business case model), build and deploy.

KDD process is aligned to the BIDS™ Methodology ensuring a robust framework for delivering KDD programs. This chapter gives details of the tasks to be executed during various modules of BIDS™ methodology for KDD program.

Definition Phase

This phase focuses on analyzing the readiness of the organization and developing data mining application. The data mining application framework definition involves analyzing the current state of business and decision support processes and developing a blueprint for data mining application.

Strategic Vision

During the definition of the data mining, identify and define the strategic vision of the organization. This will form the high level goal of the organization. In the process of defining the strategic vision, identify the organizations current decision support processes and its benefits. Identify the organizations awareness of data mining and its process. Define how the data mining will fulfill the strategic vision of the organization.

Business Cases/Problems

After defining the strategic vision of the organization, capture the ‘AS IS’ Business and IT processes (including the organizational structures, the business and IT architecture, operational and analytical processes, decision support programs and interfaces to the external environment). The business analysts involved in this process will also analyze the state of the organization.

The focus while arriving at the ‘TO BE’ processes will be to base the processes on the Company Vision and by gathering and consolidating the high level business cases/problems across the organizational functions / Units / Departments.

The ‘AS IS’ process will be then compared to the ‘TO BE’ processes. The comparison will help in defining the business cases/problems with respect to the various processes, architectures and structures currently in place and the high level requirements as identified for this initiative.

The examples of the business cases/problems for data mining are:

  • Increasing business unit and overall profitability
  • Understanding customer desires and needs
  • Identifying profitable customers and acquiring new ones
  • Retaining customers and increasing loyalty
  • Increasing ROI and reducing costs on promotions
  • Cross-selling and up-selling
  • Detecting fraud, waste and abuse
  • Determining credit risks
  • Increasing Web site profitability
  • Increasing store traffic and optimizing layouts for increased sales
  • Monitoring business performance

Tools and Technologies

Technology and tools evaluation and recommendation will also be taken up at this stage. A functional Proof of Concept (POC) development may optionally be performed to validate the technology recommendation.
 
A Data Mining roadmap will be formulated, that will include identification of opportunities to provide results as a series of deliverables achieved within 90 days cycles and prioritization of the subsequent phases of the overall initiative. This will clearly state the high-level work plan for the complete data mining project cycle.

Specifications

The Specifications phase focuses on the capturing detailed requirements for the data mining application for each business case. This phase expects that the framework be clearly defined for the Data Mining Application, which includes

  • High level business cases/problems
  • Gap analysis with respect to the current business processes and IT systems,
  • Framework for the complete Data Mining Application Processes.

The typical steps during the Specifications phase are shown in the following flow chart.

Requirements Analysis

The functional and business requirements will be captured for each business case/problem captured during Framework Definition phase. Business Requirements Specifications (BRS) document will be prepared for each business case and the BRS will be categorized by the business case/problem.

  • Visualization: The results of the data mining task need formatting, conversion to user understandable form and reporting to the user. The visualization specifications need to be defined in this phase to cater to provide user defined formats of the out files generated by the respective Data Mining techniques. The user requirement specifications will be captured for the flexibility to view and analyze the results of Data Mining including displaying the report, charts and smart reports.
  • Exception Handling: In case of any error in the data mining process the error will be reported.

Data Mining Techniques

Based on the business case/problem, the suitable data mining techniques will be identified. The data mining tasks includes the following:

  • Associations
  • Segmentation (clustering)
  • Classification
  • Rule discovery
  • Regression
  • Deviation Detection

Data Analysis

  • Data Quality : Data quality is critical for the data mining application. Data Mining has a critical dependency on clean, well-maintained data. Hence, in the absence of a data warehouse, some amount of pre-processing of data would be needed before deploying data mining. Data quality will affect the data preparation required for the data mining application. If the data is not extracted from the data warehouse, the data quality assessment will be conducted to identify the data quality issues.
  • Metadata Assessment: Metadata is important for data mining. Based on the metadata information, the data preparation will be carried out for data mining tasks. Meta data assessment will be conducted to measure the completeness of the metadata information. In case the metadata information is not available, a data dictionary will be prepared with high-level metadata information.

The data mining application requires the different templates to conduct data analysis. In the specifications phase these templates specifications will be captured.

  • Data Dictionary/Template>: At the onset of the Data Mining task, it is essential to identify the appropriate data elements that need to be analyzed. There has to be a Data Dictionary/template T made available to the user as a pre-requisite for selecting data elements from the available data sources. The Data Dictionary/template T contains the information about the available data elements in the respective database/views.
The Meta Data M of the selected data elements is captured from the Meta Data in the data server. And this Meta Data M is made available to the user.
  • Data Preparation : Data preparation is the key to any data mining application. It is a prerequisite for any data mining technique is preparing a set of selective, clean and transformed data. It is estimated that more than 50% of work in data mining is in preparing data to the point where data mining algorithms/techniques can actually start running. The line of business must have the understanding of what preparations are necessary for a data mining analysis

Based on the BRS, the data elements required for each business case/problem will be identified. The date preparation requirements will be identified based on the categorizations and classifications of data elements for each business case/problem.

  • Specification: A specification file S is required detailing how to transform data as per the user requirement.  The specification file will contain the information of selected data elements, Data Mining Task and user categorization/itemization of the each data element. This specification information will be send to Data Mining Application.
  • Data Mining Input Parameters: Based on the data mining task need to be executed, the required input parameters for data mining task should be defined in the Data Mining Input Parameters Specifications .

Design

The Design phase addresses the technical design phase of the data mining application. In order to start this phase, it is essential that the detailed business and technical are complete and approved and in addition the technology platform is finalized.

Application design focuses on developing detailed System Design Specifications (SDS), comprising the following:

Data Preparation Design

  • Data Manager Layer : Once the specification S is available, the mining task can be performed. The data manager layer will be designed to extract the data in the database and make it available to the Data Mining Task. The data manager layer will be designed to extract the data in specified format as per the Data Mining Task Input Parameter Specification.
  • Manage Data : Based on the business cases, the data may need to be extracted from the database in the required format of the Data Mining Task and the way in which the Data Mining Task requires. In such scenario, separate database will be created and the data and loaded into database for the Data Mining Application (Optional).

Business Case Model Design

Based on the business case, process model will be designed. The process will have following steps:

  • Manage Data Process : In manage data process; the design of extracting data will defined based the format.
  • Data Mining Process : In this process, the design will be focused on performing identified data mining tasks for the business case.
  • Visualization Process : In this process, the primary focus will be on designing to generate reports and Alerts if required based on user defined patterns

Development/refinement of the application prototype is an optional activity within the design phase. This prototype will involve development of components, which will be reused during the development activities. The development of prototype helps in gaining appreciation from business users in functional, technical and reusability perspectives.

In preparation for the build phase, standards for various development components (coding, testing etc.) will be prepared and reviewed.

Build

The ‘Build’ phase addresses the development and testing phase of the KDD Program.

This module will essentially concentrate on development, unit testing and system testing of the Data Mining application. In addition to activities related to coding and unit testing, quality assurance activities such as code review, test plan review will be included.
 
At end of building the data mining application, the results of the data mining application will be validated. Based on the validation, the data mining application may go to specifications phase to change the business and data understanding and subsequently if required the data mining model will be changed. The build process of the data mining process is iterative in nature and it will go through lot of iterations before going to deployment.

Preparation and review of the Unit Test Plan (UTP) and System Integration Test Plan (SITP) and executing the same will be activities in this module. The UTP will comprise of the test cases for each individual component of the application. During unit testing, it’s necessary to have the complete system integration view so that errors are minimized during the SITP. This is done by highlighting various components and test cases in UTP, which will play a role in the SITP.

The SITP will comprise of the system test cases. Also the test cases can be based on the comparison with existing reporting / DSS systems and reconciliation requirements with respect to the source systems. SITP will form the basis for system testing.

User Acceptance Testing (UAT) criteria and plan will also be prepared in this module. Wherever applicable, stress testing (this includes volume testing and performance testing) will be addressed during the System Testing and UAT.

User related documentation, such as the Operations Guide and the User Manual will be prepared during this module.

The training plan addressing details such as the training modules, the contents, schedules will be prepared. The plan will also state pre-requisites for each of the sessions. Training material including case studies will be a further set of deliverables from this module.

The site rollout plan will be prepared to state the activities involved during the deployment of the data mining application. It will also highlight the risks involved and precautions to be taken during rollout activities.

To summarize, the key deliverable from this module will be a system-tested application that is ready for user acceptance testing.

Deployment Phase

The ‘Deployment’ phase focuses on the effective implementation of the Data Mining Application in order to provide an easy information access to the end users. In order to start this phase, it is essential that the Data Mining Application has undergone System Testing. The data mining application will move from development environment to test (UAT) environment to Production environment.

The test environment will be set and configured for UAT. The test cases in the UAT plan, prepared in the Application Build phase along with the SITP, will be used during UAT. The UAT will comprise of three main tests.

Volume Testing

This will consist of Volume Testing and Performance Testing. Volume testing will be carried out to test the scalability and durability of the data mining application, with respect to the base (historical) data volume and the growth in data volume.

Stress testing will be carried out in the testing environment provided the environment is scalable to the production environment.

End User Acceptance Testing

Business users will carry out the report and navigational testing. They will primarily test the data, which will be displayed in the presentation layer
.
Users will also carry out performance checks and tuning to ensure the expected performance, robustness and scalability over a period of time.

Once the users accept the developed data mining application, the training programs and Site Rollout activities will begin in parallel. The users who conduct the acceptance testing might need to be trained prior to the acceptance testing. The training program will be customized as per the user requirements. It can be tool specific training or data mining application specific training or mix of both. Various training sessions will be carried out for different types of users. E.g. Administrators, Operational Process Monitoring Staff, Application / Component Developers, Forecasting Reports and separate Statistical Tools.

While rolling out the data mining application to the production environment, first the setup will be completed to ensure that the application can be effectively deployed. The environment set-up includes the hardware setup, server configurations, operating systems, software installation and configuration. Utmost care needs to be taken, if the production environment is already live with other applications. The impact of the changes in the production system due to this application needs to be monitored and controlled.

Once the production environment setup is complete the various application software and codes will be deployed and made operational.

Data visualization makes it possible for the analyst to gain a deeper, more intuitive understanding of the data and as such can work well along side data mining. Data Mining allows the analyst to focus on certain patterns and trends and explore in-depth using visualization. On its own data visualization can be overwhelmed by the volume of data in a database, but in conjunction with data mining can help with exploration.

 

   KDD- Data Mining Issues & Challenges Data Mining Techniques- Propensity Modeling  
 
 

Was this page helpful?
 
 
More on KDD- Data Mining Overview
What is KDD- Data Mining?
Knowledge Discovery in Databases Program
Knowledge Discovery in Databases Process
Data Mining Technology
KDD- Data Mining Issues & Challenges
Data Mining Techniques- Propensity Modeling
Data Mining Techniques- Predictive Modeling
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators

Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Leads Generation through Point of Sale
Sales productivity
Sales Compensation Data Management
Sales Channel SWOT
telemarketing Sales Lead Generation
Read more...
  Customer Relationship Management
Customer Service and Support - Strategic Role
Customer Knowledge and Organizational Knowledge
Customer Value and Profitability-Overview
Customer Segmentation Actions
Customer Service and Support Overview
Read more...
  Human Resources & Leadership
Develop Self and Others
What is Leadership?
Deliver Results
Business and Financial Acumen
Lead Change
Read more...
 
 
Business Performance & Planning
Creating Strategy Blueprint
For important KPIs- Install first & Fix later
Shifting the mind-set to leading Indicators- KPIs
Financial Business Plan
Scorecards need manual finish
Read more...
  Business Intelligence & Data Quality
Always Use Conformed Dimensions
OLAP in Business Intelligence- What is OLAP?
Follow 70-20-10 development plan
Business Rules Definition
BI Cost-Reduction- Consolidate Marts
Read more...
  IT Vendors & Tools Management
Report objects for Enterprise Reporting
Vendor Delivery Evaluation Training
Business Intelligence Vendor Evaluation
Vendor Commercial Evaluation- pre Implementation
Data Quality Tools Integration
Read more...