Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Data Mining Technology Knowledge Discovery in Databases Methodology  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  KDD-Data Mining  →   -  KDD- Data Mining Overview  → 

KDD- Data Mining Issues & Challenges

Key issues around KDD-Data Mining are around limited information, noisy & missing data, level of uncertainty and dynamically & fast-changing data reference.

Data Mining applications rely on databases to supply the raw data for input. The issues in the databases / data (e.g. volatility, incompleteness, noise, and volume) augment the issues by the time it reaches Data Mining task. Other problems arise as a result of the adequacy and relevance of the information stored.

Limited Information

A database is often designed for purposes different from data mining and sometimes the properties or attributes that would simplify the learning task are not present nor can they be requested from the real world. Inconclusive data causes problems because if some attributes essential to knowledge about the application domain are not present in the data it may be impossible to discover significant knowledge about a given domain. For example cannot diagnose malaria from a patient database if that database does not contain the patient’s red blood cell count.

Noise and missing values

Databases are usually contaminated by errors so it cannot be assumed that the data they contain is entirely correct. Attributes which rely on subjective or measurement judgments can give rise to errors such that some examples may even be mis-classified. Errors in either the values of attributes or class information are known as noise. Obviously where possible it is desirable to eliminate noise from the classification information as this affects the overall accuracy of the generated rules.

Missing data can be treated by discovery systems in a number of ways such as;

•           simply disregard missing values
•           omit the corresponding records
•           infer missing values from known values
•           treat missing data as a special value to be included additionally in the attribute domain
•           or average over the missing values using Bayesian techniques.

Noisy data in the sense of being imprecise is characteristic of all data collection and typically fit a regular statistical distribution such as Gaussian while wrong values are data entry errors. Statistical methods can treat problems of noisy data, and separate different types of noise.

Uncertainty

Uncertainty refers to the severity of the error and the degree of noise in the data. Data precision is an important consideration in a discovery system.

Size, updates, and irrelevant fields

Databases tend to be large and dynamic in that their contents are ever-changing as information is added, modified or removed. The problem with this from the data mining perspective is how to ensure that the rules are up-to-date and consistent with the most current information. Also the learning system has to be time-sensitive as some data values vary over time and the discovery system is affected by the `timeliness' of the data.

Another issue is the relevance or irrelevance of the fields in the database to the current focus of discovery for example post codes are fundamental to any studies trying to establish a geographical connection to an item of interest such as the sales of a product.

 

   Data Mining Technology Knowledge Discovery in Databases Methodology  
 
 

Was this page helpful?
 
 
More on KDD- Data Mining Overview
What is KDD- Data Mining?
Knowledge Discovery in Databases Program
Knowledge Discovery in Databases Process
Data Mining Technology
Knowledge Discovery in Databases Methodology
Data Mining Techniques- Propensity Modeling
Data Mining Techniques- Predictive Modeling
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators

Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Leads Classification and prioritization
Sales Process Management
Data Management in Sales Campaign
Variable Sales Cost
Sales Channel Retention, Support and Engagement
Read more...
  Customer Relationship Management
Customer Service and Support - Strategic Role
Customer Segmentation Data Management
Customer Value and Profitability-Overview
Customer Satisfaction and Retention- Overview
Customer Value and Profitability Tips and Actions
Read more...
  Human Resources & Leadership
Maximize the output first and then the potential
Be straight and blunt, till you team gets used to it
Give feedback closer to the observation
Feedback does not mean only negative feedback
Lead Change
Read more...
 
 
Business Performance & Planning
Strategic Planning Business Themes
Financial Business Plan
3-4 hours in reviewing a scorecard.
Dashboard Health Checklist
Creating Strategy Blueprint
Read more...
  Business Intelligence & Data Quality
Knowledge Discovery in Databases Process
Sequence in performance management
Sponsor for a Data Quality Program
Data Warehouse Project plan
Data Mining Techniques- Predictive Modeling
Read more...
  IT Vendors & Tools Management
Business Intelligence Vendor Evaluation
OLAP Performance Management
Technical Evaluation- Interoperability
Vendor Delivery Project Evaluation
BI Tool Vendor Evaluation
Read more...