The potential business advantages of data mining are well documented in publications for executives and managers. However, developers implementing major data-mining systems need concrete information about the underlying technical principles―and their practical manifestations―in order to either integrate commercially available tools or write data-mining programs from scratch. This book is the first technical guide to provide a complete, generalized roadmap for developing data-mining applications, together with advice on performing these large-scale, open-ended analyses for real-world data warehouses.
Note: If you already own Predictive Data Mining: A Practical Guide, please see ISBN 1-55860-477-4 to order the accompanying software. To order the book/software package, please see ISBN 1-55860-478-2.
+ Focuses on the preparation and organization of data and the development of an overall strategy for data mining.
+ Reviews sophisticated prediction methods that search for patterns in big data.
+ Describes how to accurately estimate future performance of proposed solutions.
+ Illustrates the data-mining process and its potential pitfalls through real-life case studies.
Data mining is a hot technology, and this short, authoritative guide shows how it works and why it is gaining ground in the worlds of finance, manufacturing, marketing, and health care. The book begins by exploring the links between "big data"--the data warehouse built up of multiple databases--and traditional statistics. (The authors defend the methods of big data against traditional statistics, which has usually relied on smaller samples. However, they also look at the sources of error in both disciplines.)
The authors then look at the nuts and bolts of the data-mining process. They show how data must be prepared--sometimes reduced--in order to be manageable, and they define the important features. They show how the actual analysis of data mining can be as simple as adding up scores for selected features or how it can use statistical methods or even neural networks. (For some problems, the features themselves aren't known ahead of time; data mining can be used to discover these features automatically.) The authors then discuss how to interpret the results of analysis so that predictions can be made for new cases based on old ones.
The book concludes with short scenarios of how data mining can be applied, with examples drawn from manufacturing, health care, marketing, and publishing. The authors show the strengths--and limits--of data mining and argue that faster hardware and greater database storage capabilities will make this technology more widely used. Though it is written by two researchers in the field, Predictive Data Mining is suitable for general readers who are interested in the topic. --Richard V. Dragan