Knowledge Discovery in Databases Program
Knowledge-Base Enterprise Intelligence KDD-Data Mining KDD- Data Mining Overview
What is KDD- Data Mining?
Data Mining term is used interchangeably with KDD. In reality, it is one of the steps in the whole process of knowledge discovery in databases. Data Mining needs a well defined business case and a diligent data preparation and has to be followed with a detailed evaluation of the discovery results.

Overview

It was recognized that information is at the heart of business operations and that decision-makers could make use of the data stored to gain valuable insight into the business. Database Management systems gave access to the data stored but this was only a small part of what could be gained from the data. Traditional OLTP systems are good at putting data into databases quickly, safely and efficiently but are not good at delivering meaningful analysis in return. Analyzing data can provide further knowledge about a business by going beyond the data explicitly stored to derive knowledge about the business. This is where Knowledge Discovery in Database (KDD) has obvious benefits for any enterprise. It involves processes like Business Case Definition, Data Preparation, Data Mining and Evaluation.

The term data mining has been stretched beyond its limits to apply to any form of data analysis and is used interchangeably with KDD. But in true sense data mining is just a step in KDD process focusing on data analysis with minimum user intervention. Some of the numerous definitions of Data Mining are:

  •  “Data mining is the search for relationships and global patterns that exist in large databases but are hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and the objects in the database and, if the database is a faithful mirror, of the real world registered by the database.” Marcel Holshemier and Arno Siebes (1994).
  • The analogy with the mining process is described as “Data mining refers to ‘using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is useful’." Clementine User Guide, a data mining toolkit from SPSS.
  • “Data Mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. This encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency net works, analyzing changes, and detecting anomalies.” William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus.

Basically Data Mining is concerned with the analysis of data and the use of tools and techniques for finding patterns and regularities in sets of data. It is the computer system, which is responsible for finding the patterns by identifying the underlying rules and features in the data. The idea is that it is possible to strike gold in unexpected places as the system mines deep into the data to extract patterns not previously discernable or so obvious that no one has noticed them before. It is not simple queries for validating facts. The objective is to find patterns and rules automatically with minimal user input.

In the evolution from business data to business information to business knowledge, each new step has built upon the previous one. For example, dynamic data access is critical for drill-through in data navigation applications, and the ability to store large databases is critical to data mining. From the user’s point of view, the four steps, listed in the table below, were revolutionary because they allowed new business questions to be answered accurately and quickly.

Evolutionary Step

Business Question

Enabling Technologies

Product Providers

Characteristics

Data Collection
(1960s)

"What was my total revenue in the last five years?"

Computers, tapes, disks

IBM, CDC

Retrospective, static data delivery

Data Access
(1980s)

"What were unit sales in New England last March?"

RDBMS, SQL, ODBC

Oracle, IBM, Microsoft

Retrospective, dynamic data delivery at record level

Data Warehousing
(1990s)

"What were unit sales in New England last March? Drill down to Boston."

Relational Data Warehouse, OLAP, MDDB

NCR, Business Objects, COGNOS, Hyperion

Retrospective, dynamic data delivery at multiple levels

Data Mining
(2000s)

"What’s likely to happen to Boston unit sales next month? Why?"

Advanced algorithms, Very Large Databases

SAS, SPSS, IBM, Oracle, NCR

Prospective, proactive information as well as knowledge delivery

Knowledge Discovery in Databases Program


All Topics in :- " KDD- Data Mining Overview "+ Chapter


About Us       Contact Us       Privacy Policy       Terms and Conditions       Copyright        Disclaimer        Site Map