|
Induction
A database is a store of information but more important is the information, which can be inferred from it. There are two main inference techniques available i.e. deduction and induction.
- Deduction is a technique to infer information that is a logical consequence of the information in the database e.g. the join operator applied to two relational tables where the first concerns employees and departments and the second departments and managers infers a relation between employee and managers.
- Induction has been described earlier as the technique to infer information that is generalized from the database as in the example mentioned above to infer that each employee has a manager. This is higher level information or knowledge in that it is a general statement about objects in the database. The database is searched for patterns or regularities.
Induction has been used in the following ways within data mining.
Decision Tree:
Decision trees are simple knowledge representation and they classify examples to a finite number of classes, the nodes are labeled with attribute names, the edges are labeled with possible values for this attribute and the leaves labeled with different classes. Objects are classified by following a path down the tree, by taking the edges, corresponding to the values of the attributes in an object.
The following is an example of objects that describe the weather at a given time. The objects contain information on the outlook, humidity etc. Some objects are positive examples denote by P and others are negative i.e. N. Classification is in this case the construction of a tree structure, illustrated in the following diagram, which can be used to classify all the objects correctly.

Rule Induction
A data mine system has to infer a model from the database that is it may define classes such that the database contains one or more attributes that denote the class of a tuple i.e. the predicted attributes while the remaining attributes are the predicting attributes. Class can then be defined by condition on the attributes. When the classes are defined the system should be able to infer the rules that govern classification, in other words the system should find the description of each class.
Production rules have been widely used to represent knowledge in expert systems and they have the advantage of being easily interpreted by human experts because of their modularity i.e. a single rule can be understood in isolation and doesn't need reference to other rules. The propositional like structure of such rules has been described earlier but can summed up as if-then rules.
Classification Analysis
Data mining tools have to infer a model from the database, and in the case of supervised learning this requires the user to define one or more classes. The database contains one or more attributes that denote the class of a tuple and these are known as predicted attributes whereas the remaining attributes are called predicting attributes. A combination of values for the predicted attributes defines a class.
When learning classification rules the system has to find the rules that predict the class from the predicting attributes so firstly the user has to define conditions for each class, the data mine system then constructs descriptions for the classes. Basically the system should given a case or tuple with certain known attribute values be able to predict what class this case belongs to.
Once classes are defined the system should infer rules that govern the classification therefore the system should be able to find the description of each class. The descriptions should only refer to the predicting attributes of the training set so that the positive examples should satisfy the description and none of the negative. A rule said to be correct if its description covers all the positive examples and none of the negative examples of a class.
A rule is generally presented as, if the left hand side (LHS) then the right hand side (RHS), so that in all instances where LHS is true then RHS is also true, are very probable. The categories of rules are:
- exact rule - permits no exceptions so each object of LHS must be an element of RHS
- strong rule - allows some exceptions, but the exceptions have a given limit
- probabilistic rule - relates the conditional probability P(RHS|LHS) to the probability P(RHS)
Other types of rules are classification rules where LHS is a sufficient condition to classify objects as belonging to the concept referred to in the RHS.

In simplified form, Map data into one of several predefined classes. Define rule, boundary, function, etc. that separates classes. Use rule, boundary, function, etc. to classify new samples so as to predict to which class they belong
e.g. Classification of samples into "No-loan granted" & "Loan granted"-class. Boundary line is based on historical records of loan/debt mix vs. loan granted or not. Boundary line can be used to automatically decide whether future loan applicants will be given loan or not.
Regression Analysis
Map data to real-valued prediction variable e.g.
- Predict next day value of share X on stock exchange, given historical records of its evolution
- Estimate probability that drug X will cure patient
- Predict customer demand for new product X as a function of advertising expenditure
- Predict when customer X will change GSM operator (cellular phone) given its calling behavior over last 3 weeks debt.

Neural networks
Neural networks are an approach to computing that involves developing mathematical structures with the ability to learn. The methods are the result of academic investigations to model nervous system learning. Neural networks have the remarkable ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.

Neural networks have broad applicability to real world business problems and have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:
- sales forecasting
- industrial process control
- customer research
- data validation
- risk management
- target marketing etc.
Neural networks use a set of processing elements (or nodes) analogous to neurons in the brain. These processing elements are interconnected in a network that can then identify patterns in data once it is exposed to the data, i.e. the network learns from experience just as people do. This distinguishes neural networks from traditional computing programs that simply follow instructions in a fixed sequential order.
The structure of a neural network looks something like the following:
The bottom layer represents the input layer, in this case with 5 inputs labels X1 through X5. In the middle is something called the hidden layer, with a variable number of nodes? It is the hidden layer that performs much of the work of the network. The output layer in this case has two nodes, Z1 and Z2 representing output values we are trying to determine from the inputs. For example, predict sales (output) based on past sales, price and season (input).
Each node in the hidden layer is fully connected to the inputs which means that what is learned in a hidden node is based on all the inputs taken together. Statisticians maintain that the network can pick up the interdependencies in the model. The following diagram provides some detail into what goes on inside a hidden node.
Simply speaking a weighted sum is performed: X1 times W1 plus X2 times W2 on through X5 and W5. This weighted sum is performed for each hidden node and each output node and is how interactions are represented in the network.
The issue of where the network get the weights from is important but suffice to say that the network learns to reduce error in its prediction of events already known (i.e., past history).
The problems of using neural networks have been summed by Arun Swami of Silicon Graphics Computer Systems. Neural networks have been used successfully for classification but suffer somewhat in that the resulting network is viewed as a black box and no explanation of the results is given. This lack of explanation inhibits confidence, acceptance and application of results. He also notes as a problem the fact that neural networks suffered from long learning times, which become worse as the volume of data grows.
|