Synopsis
Most books on data mining focus on principles and furnish few instructions on how to carry out a data mining project. Data Mining Using SAS Applications not only introduces the key concepts but also enables readers to understand and successfully apply data mining methods using powerful yet user-friendly SAS macro-call files. These methods stress the use of visualization to thoroughly study the structure of data and check the validity of statistical models fitted to data.
Learn how to convert PC databases to SAS data
Discover sampling techniques to create training and validation samples
Understand frequency data analysis for categorical data
Explore supervised and unsupervised learning
Master exploratory graphical techniques
Acquire model validation techniques in regression and classification
The text furnishes 13 easy-to-use SAS data mining macros designed to work with the standard SAS modules. No additional modules or previous experience in SAS programming is required. The author shows how to perform complete predictive modeling, including data exploration, model fitting, assumption checks, validation, and scoring new data, on SAS datasets in less than ten minutes!
Review
The macros integrate nicely with SAS's output delivery system ... . [T]his is a book that could serve as an easy-to read introduction to some classical statistical techniques that are used in data mining, and, with the associated macros, provide an opportunity to see those techniques in action.
- Journal of the American Statistical Association, June 2004, Vol. 99, No. 466
Read how Christopher Ross of the US Bureau of Land Management uses the SAS macros featured in this book:
Report: Use of SAS macros in the analysis of population dynamics and changes in Curlleaf Mountain Mahogany in adjacent Sierran and Great Basin mountain ranges in the western United States.
Mountain Mahogany is a very long-lived, broad leaf evergreen tree in the Rose family. Because of its importance to big game habitat, its disappearance in parts of its range over the past 50 years has been of great concern to land managers and sportsmen.
I converted very large data sets (over 1,000,000 observations) derived from Geographic Information System analyses to SAS data sets using the EXCELSAS macro. I used UNIVAR SAS macro to conduct data exploration and identify problem observations and distributions for correction. Using the macro REGDIAG I examined the relation between changes in mahogany distribution over time (response) and topographic slope, aspect, and elevation and cross products and quadratic interactions of these (predictors) The logistic model was refined through examination of the variety of goodness of fit criteria and measures of association offered by the LOGISTIC SAS macro. The results showed strong correlations of tree distribution with geographic factors, and a trend in changes over time. Use of custom odds ratios allowed prediction of changes in probability of finding trees at different combinations of variable values. I then appended a hypothetical data set with missing response variable to obtain predicted probabilities for mahogany at all combinations of slope, elevation, and aspect. These results have been used to prioritize areas for habitat restoration.
I used the LOGISTIC macro (with field data) to demonstrate that bird damage by sapsuckers was strongly related to distance from nearest riparian area, but not to distance to conifer food sources or nest habitat. Another logistic regression analysis confirmed that bird damage was confined to specific age classes in the population.
I also compared population age class parameters between the two mountain ranges to demonstrate that the desert range has a significantly different (bimodal) age class distribution from the normally distributed Sierra range population using the FREQ SAS macro.
Use of these data mining SAS macros facilitated reliable conversion, examination, and analysis of the data, and selection of best statistical models despite the great size of the data sets. The results of this research have been used extensively by land management agencies and private landowners in order to maximize the effectiveness of habitat restoration efforts in these important game areas.
-Christopher Ross, PhD.
Reclamation Scientist/Natural Resource Specialist
Bureau of Land Management, U.S. Department of Interior
Reno, Nevada 89520 0006
"About this title" may belong to another edition of this title.