DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO


References on data mining and analytics


To keep current with what is happening in the world of data mining, subscribe (free) to the KDnuggets newsletter.

For a business perspective on data mining and analytics, without technical detail, see Competing on Analytics: The New Science of Winning by
Thomas H. Davenport and Jeanne G. Harris.  For the table of contents see http://www.amazon.com/gp/reader/1422103323/

Machine learning is the name of the principal research area underlying data mining.  The best undergraduate-level textbook in this area is Introduction to Machine Learning by Ethem Alpaydin.  For the detailed table of contents, see here.

The best graduate-level textbook on machine learning is Pattern Recognition and Machine Learning by Christopher M. Bishop.  For the table of contents see http://www.amazon.com/gp/reader/0387310738

Web Analytics: An Hour a Day by Avinash Kaushik is a best-seller on the fastest-growing application area of data mining, namely data mining applied to web sites.  For the table of contents see http://www.amazon.com/gp/reader/0470130652

Understanding Complex Datasets: Data Mining with Matrix Decompositions by David Skillicorn is a good specialized book on a growing technical subfield, namely matrix methods applied to modeling two-dimensional data.  For the table of contents see http://www.amazon.com/gp/reader/1584888326/

It is conventional wisdom that 80% of the effort in a data mining project is devoted to data acquisition and cleaning.  A highly recommended book on this topic is Data Preparation for Data Mining by Dorian Pyle.  Here is the table of contents.

 

Most recently updated on March 17, 2009 by Charles Elkan, elkan@cs.ucsd.edu