Mining — A process, and the process
An nice article with abstract discussion for the practical principles of data mining. No gold data this time any more 🙂
I made a brief abstraction. I especially like the Data preparation, Patterns, Retrospection and changes parts.
# 1st Law of Data Mining – “Business Goals Law”:
Business objectives are the origin of every data mining solution
Data mining is not primarily a technology; it is a process, which has one or more business objectives at its heart.
# 2nd Law of Data Mining – “Business Knowledge Law”:
Business knowledge is central to every step of the data mining process
In summary, without business knowledge, not a single step of the data mining process can be effective; there are no “purely technical” steps.
# 3rd Law of Data Mining – “Data Preparation Law”:
Data preparation is more than half of every data mining process
Every change to the data of any sort (including cleaning, large and small transformations, and augmentation) means a change to the problem space which the analysis must explore. The reason that data preparation is important, and forms such a large proportion of data mining effort, is that the data miner is deliberately manipulating the problem space to make it easier for their analytical techniques to find a solution.
# 4th Law of Data Mining – “NFL-DM”:
The right model for a given application can only be discovered by experiment
or “There is No Free Lunch for the Data Miner”
Wolpert’s “No Free Lunch” (NFL) theorem, as applied to machine learning, states that no one bias (as embodied in an algorithm) will be better than any other when averaged across all possible problems (datasets). This is because, if we consider all possible problems, their solutions are evenly distributed, so that an algorithm (or bias) which is advantageous for one subset will be disadvantageous for another.
# 5th Law of Data Mining – “Watkins’ Law”: There are always patterns
the purpose of the data mining process is to reveal the domain rules by combining pattern-discovery technology (data mining algorithms) with the business knowledge required to interpret the results of the algorithms in terms of the domain.
# 6th Law of Data Mining – “Insight Law”:
Data mining amplifies perception in the business domain
Data mining algorithms provide a capability to detect patterns beyond normal human capabilities. The data mining process allows data miners and business experts to integrate this capability into their own problem solving and into business processes.
What, then, is “prediction” in this sense? What do classification, regression, clustering and association algorithms and their resultant models have in common? The answer lies in “scoring”, that is the application of a predictive model to a new example. The model produces a prediction, or score, which is a new piece of information about the example. The available information about the example in question has been increased, locally, on the basis of the patterns found by the algorithm and embodied in the model, that is on the basis of generalisation or induction. It is important to remember that this new information is not “data”, in the sense of a “given”; it is information only in the statistical sense.
# 8th Law of Data Mining – “Value Law”:
The value of data mining results is not determined by the accuracy or stability
of predictive models
In short, the value of a predictive model is not determined by any technical measure.
# 9th Law of Data Mining – “Law of Change”: All patterns are subject to change
Patterns are not simply regularities which exist in the world and are reflected in the data – these regularities may indeed be static in some domains. Rather, the patterns discovered by data mining are part of a perceptual process, an active process in which data mining mediates between the world as described by the data and the understanding of the observer or business expert
All patterns are subject to change because they reflect not only a changing world but also our changing understanding.