DEPARTMENT OF COMPUTER SCIENCE AND
UNIVERSITY OF CALIFORNIA, SAN DIEGO
References on data mining and analytics
To keep current with what is happening in the world of data mining,
subscribe (free) to the KDnuggets
For a business perspective on data mining and analytics, without
technical detail, see Competing
on Analytics: The New Science of Winning by Thomas H.
Davenport and Jeanne G. Harris. For the table of contents see http://www.amazon.com/gp/reader/1422103323/
Machine learning is the name of the principal research area
underlying data mining. One of the best undergraduate-level
textbooks in this area is Introduction
to Machine Learning by Ethem Alpaydin. For the detailed table
of contents, see here.
This book is recommended for students who have no previous
experience with machine learning. Read the relevant sections as the
corresponding topics arise in 255. Feel free to ask questions on
Piazza about which sections to read. Do also ask questions on Piazza
about anything that is not clear in the book.
The R system, with the RStudio frontend, is recommended for
assignments. There are dozens
of books available on R; choose one that you like. The
Art of R Programming: A Tour of Statistical Software Design is
by a good computer science author, Norman Matloff.
The interactive data mining environment Rattle is also recommended.
Its author has written a good hands-on guide,
Mining with Rattle and R. Because this book is published by
Springer and UCSD has a subscription, its full
text is available online from campus IP addresses. For access
from off campus, use a VPN.
The most well-known graduate-level textbook on machine learning is Pattern
Recognition and Machine Learning by Christopher M.
Bishop. For the table of contents see http://www.amazon.com/gp/reader/0387310738.
The full texts of two good newer books are available free: Introduction to
Machine Learning by Alex Smola and S.V.N. Vishwanathan, and Bayesian
Reasoning and Machine Learning by David Barber.
Datasets: Data Mining with Matrix Decompositions by David
Skillicorn is a good specialized book on a growing technical
subfield, namely matrix methods applied to modeling two-dimensional
data. For the table of contents see
It is conventional wisdom that 80% of the effort in a data mining
project is devoted to data acquisition and cleaning. A recommended
book on this topic is Data
Preparation for Data Mining by Dorian Pyle. Here is the table
of contents. Unfortunately this book is out of print and
online sellers now are price gouging.
Two good and up-to-date books in related areas are also available
Linguistic Data: A practical introduction to statistics by R.
H. Baayen, which includes an introduction to R, and Introduction to
Information Retrieval by Christopher D. Manning, Prabhakar
Raghavan and Hinrich Schütze.
Most recently updated on April 2, 2013 by Charles Elkan, email@example.com