布雷特·兰茨(Brett Lantz),在应用创新的数据方法来理解人类的行为方面有10余年经验。他最初是一名社会学家,在学习一个青少年社交网站分布的大型数据库时,他就开始陶醉于机器学习。从那时起,他致力于移动电话、医疗账单数据和公益活动等交叉学科的研究。
Preface
Chapter 1: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
Machine learning successes
The limits of machine learning
Machine learning ethics
How machines learn
Data storage
Abstraction
Generalization
Evaluation
Machine learning in practice
Types of input data
Types of machine learning algorithms
Matching input data to algorithms
Machine learning with R
Installing R packages
Loading and unloading R packages
Summary
Chapter 2: Managing and Understanding Data
R data structures
Vectors
Factors
Lists
Data frames
Matrixes and arrays
Managing data with R
Saving, loading, and removing R data structures
Importing and saving data from CSV files
Exploring and understanding data
Exploring the structure of data
Exploring numeric variables
Measuring the central tendency- mean and median
Measuring spread - quartiles and the five-number summary
Visualizing numeric variables - boxplots
Visualizing numeric variables - histograms
Understanding numeric data - uniform and normal distributions
Measuring spread - variance and standard deviation
Exploring categorical variables
Measuring the central tendency - the mode
Exploring relationships between variables
Visualizing relationships - scatterplots
Examining relationships - two-way cross-tabulations
Summary
Chapter 3: Lazy Learning - Classification Using Nearest Neighbors
Understanding nearest neighbor classification
The k-NN algorithm
Measuring similarity with distance
Choosing an appropriate k
Preparing data for use with k-NN
Why is the k-NN algorithm lazy?
Example - diagnosing breast cancer with the k-NN algorithm
Step 1 - collecting data
Step 2 - exploring and preparing the data
Transformation - normalizing numeric data
Data preparation - creating training and test datasets
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 -improving model performance
Transformation - z-score standardization
Testing alternative values of k
Summary
Chapter 4: Probabilistic Learning - Classification Using Naive Bayes
Understanding Naive Bayes
Basic concepts of Bayesian methods
Understanding probability
Understanding joint probability
Computing conditional probability with Bayes' theorem
The Naive Bayes algorithm
Classification with Naive Bayes
The Laplace estimator
Using numeric features with Naive Bayes
Example - filtering mobile phone spam with the
Naive Bayes algorithm
Step 1 - collecting data
Step 2 - exploring and preparing the data
Data preparation - cleaning and standardizing text data
Data preparation - splitting text documents into words
Data preparation - creating training and test datasets
Visualizing text data - word clouds
Data preparation - creating indicator features for frequent words
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 -improving model performance
Summary
Chapter 5: Divide and Conquer - Classification Using Decision Trees and Rules
Chapter 6: Forecasting Numeric Data - Regression Methods
Chapter 7: Black Box Methods - Neural Networks and Support Vector Machines
Chapter 8: Finding Patterns - Market Basket Analysis Using Association Rules
Chapter 9: Finding Groups of Data - Clustering with k-means
Chapter 10: Evaluating Model Performance
Chapter 11: Improving Model Performance
Chapter 12: Specialized Machine Learning Topics
Index