Brad Boehmke, Brandon M. Greenwell (University of Cincinnati, Cincinnati, USA)
Hands-On Machine Learning with R
Brad Boehmke, Brandon M. Greenwell (University of Cincinnati, Cincinnati, USA)
Hands-On Machine Learning with R
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
This book is designed to introduce the concept of advanced business analytic approaches and would the first to cover the gamut of how to use the R programming language to apply descriptive, predictive, and prescriptive analytic methodologies for problem solving.
Andere Kunden interessierten sich auch für
- Enrique Garcia CejaBehavior Analysis with Machine Learning Using R103,99 €
- Alice Y.C. Te (Hong Kong University of Wales Trinity Saint David)Data Science and Machine Learning117,99 €
- G. David GarsonData Analytics for the Social Sciences114,99 €
- Benjamin S. Baumer (Smith College, Northhampton, MA)Modern Data Science with R114,99 €
- Christopher K. WikleSpatio-Temporal Statistics with R66,99 €
- Paul GeertsemaMachine Learning for Managers29,99 €
- Data Science and Innovations for Intelligent Systems201,99 €
-
-
-
This book is designed to introduce the concept of advanced business analytic approaches and would the first to cover the gamut of how to use the R programming language to apply descriptive, predictive, and prescriptive analytic methodologies for problem solving.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Chapman & Hall/CRC The R Series
- Verlag: Taylor & Francis Ltd
- Seitenzahl: 484
- Erscheinungstermin: 11. November 2019
- Englisch
- Abmessung: 241mm x 157mm x 29mm
- Gewicht: 1056g
- ISBN-13: 9781138495685
- ISBN-10: 1138495689
- Artikelnr.: 58315356
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
- Chapman & Hall/CRC The R Series
- Verlag: Taylor & Francis Ltd
- Seitenzahl: 484
- Erscheinungstermin: 11. November 2019
- Englisch
- Abmessung: 241mm x 157mm x 29mm
- Gewicht: 1056g
- ISBN-13: 9781138495685
- ISBN-10: 1138495689
- Artikelnr.: 58315356
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
Brad Boehmke is a data scientist at 84.51° where he wears both software developer and machine learning engineer hats. He is an Adjunct Professor at the University of Cincinnati, author of Data Wrangling with R, and creator of multiple public and private enterprise R packages. Brandon Greenwell is a data scientist at 84.51° where he works on a diverse team to enable, empower, and encourage others to successfully apply machine learning to solve real business problems. He's part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, and the author of several R packages available on CRAN.
I FUNDAMENTALS 1. Introduction to Machine Learning 1.1 Supervised learning
1.1.1 Regression problems 1.1.2 Classification problems 1.2 Unsupervised
learning 1.3 Roadmap 1.4 The data sets 2. Modeling Process 2.1
Prerequisites 2.2 Data splitting 2.2.1 Simple random sampling 2.2.2
Stratified sampling 2.2.3 Class imbalances 2.3 Creating models in R 2.3.1
Many formula interfaces 2.3.2 Many engines 2.4 Resampling methods 2.4.1 k
-fold cross validation 2.4.2 Bootstrapping 2.4.3 Alternatives 2.5 Bias
variance trade-off 2.5.1 Bias 2.5.2 Variance 2.5.3 Hyperparameter tuning
2.6 Model evaluation 2.6.1 Regression models 2.6.2 Classification models
2.7 Putting the processes together 3. Feature & Target Engineering 3.1
Prerequisites 3.2 Target engineering 3.3 Dealing with missingness 3.3.1
Visualizing missing values 3.3.2 Imputation 3.4 Feature filtering 3.5
Numeric feature engineering 3.5.1 Skewness 3.5.2 Standardization 3.6
Categorical feature engineering 3.6.1 Lumping 3.6.2 One-hot & dummy
encoding 3.6.3 Label encoding 3.6.4 Alternatives 3.7 Dimension reduction
3.8 Proper implementation 3.8.1 Sequential steps 3.8.2 Data leakage 3.8.3
Putting the process together II SUPERVISED LEARNING 4. Linear Regression
4.1 Prerequisites 4.2 Simple linear regression 4.2.1 Estimation 4.2.2
Inference 4.3 Multiple linear regression 4.4 Assessing model accuracy 4.5
Model concerns 4.6 Principal component regression 4.7 Partial least squares
4.8 Feature interpretation 4.9 Final thoughts 5. Logistic Regression 5.1
Prerequisites 5.2 Why logistic regression 5.3 Simple logistic regression
5.4 Multiple logistic regression 5.5 Assessing model accuracy 5.6 Model
concerns 5.7 Feature interpretation 5.8 Final thoughts 6. Regularized
Regression 6.1 Prerequisites 6.2 Why regularize? 6.2.1 Ridge penalty 6.2.2
Lasso penalty 6.2.3 Elastic nets 6.3 Implementation 6.4 Tuning 6.5 Feature
interpretation 6.6 Attrition data 6.7 Final thoughts 7. Multivariate
Adaptive Regression Splines 7.1 Prerequisites 7.2 The basic idea 7.2.1
Multivariate regression splines 7.3 Fitting a basic MARS model 7.4 Tuning
7.5 Feature interpretation 7.6 Attrition data 7.7 Final thoughts 8. K
-Nearest Neighbors 8.1 Prerequisites 8.2 Measuring similarity 8.2.1
Distance measures 8.2.2 Pre-processing 8.3 Choosing k 8.4 MNIST example 8.5
Final thoughts 9 Decision Trees 9.1 Prerequisites 9.2 Structure 9.3
Partitioning 9.4 How deep? 9.4.1 Early stopping 9.4.2 Pruning 9.5 Ames
housing example 9.6 Feature interpretation 9.7 Final thoughts 10. Bagging
10.1 Prerequisites 10.2 Why and when bagging works 10.3 Implementation 10.4
Easily parallelize 10.5 Feature interpretation 10.6 Final thoughts 11.
Random Forests 11.1 Prerequisites 11.2 Extending bagging 11.3
Out-of-the-box performance 11.4 Hyperparameters 11.4.1 Number of trees
11.4.2 mtry 11.4.3 Tree complexity 11.4.4 Sampling scheme 11.4.5 Split rule
11.5 Tuning strategies 11.6 Feature interpretation 11.7 Final thoughts 12.
Gradient Boosting 12.1 Prerequisites 12.2 How boosting works 12.2.1 A
sequential ensemble approach 12.2.2 Gradient descent 12.3 Basic GBM 12.3.1
Hyperparameters 12.3.2 Implementation 12.3.3 General tuning strategy 12.4
Stochastic GBMs 12.4.1 Stochastic hyperparameters 12.4.2 Implementation
12.5 XGBoost 12.5.1 XGBoost hyperparameters 12.5.2 Tuning strategy 12.6
Feature interpretation 12.7 Final thoughts 13. Deep Learning 13.1
Prerequisites 13.2 Why deep learning 13.3 Feedforward DNNs 13.4 Network
architecture 13.4.1 Layers and nodes 13.4.2 Activation 13.5 Backpropagation
13.6 Model training 13.7 Model tuning 13.7.1 Model capacity 13.7.2 Batch
normalization 13.7.3 Regularization 13.7.4 Adjust learning rate 13.8 Grid
Search 13.9 Final thoughts 14. Support Vector Machines 14.1 Prerequisites
14.2 Optimal separating hyperplanes 14.2.1 The hard margin classifier
14.2.2 The soft margin classifier 14.3 The support vector machine 14.3.1
More than two classes 14.3.2 Support vector regression 14.4 Job attrition
example 14.4.1 Class weights 14.4.2 Class probabilities 14.5 Feature
interpretation 14.6 Final thoughts 15. Stacked Models 15.1 Prerequisites
15.2 The Idea 15.2.1 Common ensemble methods 15.2.2 Super learner algorithm
15.2.3 Available packages 15.3 Stacking existing models 15.4 Stacking a
grid search 15.5 Automated machine learning 15.6 Final thoughts 16.
Interpretable Machine Learning 16.1 Prerequisites 16.2 The idea 16.2.1
Global interpretation 16.2.2 Local interpretation 16.2.3 Model-specific vs.
model-agnostic 16.3 Permutation-based feature importance 16.3.1 Concept
16.3.2 Implementation 16.4 Partial dependence 16.4.1 Concept 16.4.2
Implementation 16.4.3 Alternative uses 16.5 Individual conditional
expectation 16.5.1 Concept 16.5.2 Implementation 16.6 Feature interactions
16.6.1 Concept 16.6.2 Implementation 16.6.3 Alternatives 16.7 Local
interpretable model-agnostic explanations 16.7.1 Concept 16.7.2
Implementation 16.7.3 Tuning 16.7.4 Alternative uses 16.8 Shapley values
16.8.1 Concept 16.8.2 Implementation 16.8.3 XGBoost and built-in Shapley
values 16.9 Localized step-wise procedure 16.9.1 Concept 16.9.2
Implementation 16.10Final thoughts III DIMENSION REDUCTION 17. Principal
Components Analysis 17.1 Prerequisites 17.2 The idea 17.3 Finding principal
components 17.4 Performing PCA in R 17.5 Selecting the number of principal
components 17.5.1 Eigenvalue criterion 17.5.2 Proportion of variance
explained criterion 17.5.3 Scree plot criterion 17.6 Final thoughts 18.
Generalized Low Rank Models 18.1 Prerequisites 18.2 The idea 18.3 Finding
the lower ranks 18.3.1 Alternating minimization 18.3.2 Loss functions
18.3.3 Regularization 18.3.4 Selecting k 18.4 Fitting GLRMs in R 18.4.1
Basic GLRM model 18.4.2 Tuning to optimize for unseen data 18.5 Final
thoughts 19. Autoencoders 19.1 Prerequisites 19.2 Undercomplete
autoencoders 19.2.1 Comparing PCA to an autoencoder 19.2.2 Stacked
autoencoders 19.2.3 Visualizing the reconstruction 19.3 Sparse autoencoders
19.4 Denoising autoencoders 19.5 Anomaly detection 19.6 Final thoughts IV
Clustering 20. K-means Clustering 20.1 Prerequisites 20.2 Distance measures
20.3 Defining clusters 20.4 k-means algorithm 20.5 Clustering digits 20.6
How many clusters? 20.7 Clustering with mixed data 20.8 Alternative
partitioning methods 20.9 Final thoughts 21. Hierarchical Clustering 21.1
Prerequisites 21.2 Hierarchical clustering algorithms 21.3 Hierarchical
clustering in R 21.3.1 Agglomerative hierarchical clustering 21.3.2
Divisive hierarchical clustering 21.4 Determining optimal clusters 21.5
Working with dendrograms 21.6 Final thoughts 22. Model-based Clustering
22.1 Prerequisites 22.2 Measuring probability and uncertainty 22.3
Covariance types 22.4 Model selection 22.5 My basket example 22.6 Final
thoughts Bibliography Index
1.1.1 Regression problems 1.1.2 Classification problems 1.2 Unsupervised
learning 1.3 Roadmap 1.4 The data sets 2. Modeling Process 2.1
Prerequisites 2.2 Data splitting 2.2.1 Simple random sampling 2.2.2
Stratified sampling 2.2.3 Class imbalances 2.3 Creating models in R 2.3.1
Many formula interfaces 2.3.2 Many engines 2.4 Resampling methods 2.4.1 k
-fold cross validation 2.4.2 Bootstrapping 2.4.3 Alternatives 2.5 Bias
variance trade-off 2.5.1 Bias 2.5.2 Variance 2.5.3 Hyperparameter tuning
2.6 Model evaluation 2.6.1 Regression models 2.6.2 Classification models
2.7 Putting the processes together 3. Feature & Target Engineering 3.1
Prerequisites 3.2 Target engineering 3.3 Dealing with missingness 3.3.1
Visualizing missing values 3.3.2 Imputation 3.4 Feature filtering 3.5
Numeric feature engineering 3.5.1 Skewness 3.5.2 Standardization 3.6
Categorical feature engineering 3.6.1 Lumping 3.6.2 One-hot & dummy
encoding 3.6.3 Label encoding 3.6.4 Alternatives 3.7 Dimension reduction
3.8 Proper implementation 3.8.1 Sequential steps 3.8.2 Data leakage 3.8.3
Putting the process together II SUPERVISED LEARNING 4. Linear Regression
4.1 Prerequisites 4.2 Simple linear regression 4.2.1 Estimation 4.2.2
Inference 4.3 Multiple linear regression 4.4 Assessing model accuracy 4.5
Model concerns 4.6 Principal component regression 4.7 Partial least squares
4.8 Feature interpretation 4.9 Final thoughts 5. Logistic Regression 5.1
Prerequisites 5.2 Why logistic regression 5.3 Simple logistic regression
5.4 Multiple logistic regression 5.5 Assessing model accuracy 5.6 Model
concerns 5.7 Feature interpretation 5.8 Final thoughts 6. Regularized
Regression 6.1 Prerequisites 6.2 Why regularize? 6.2.1 Ridge penalty 6.2.2
Lasso penalty 6.2.3 Elastic nets 6.3 Implementation 6.4 Tuning 6.5 Feature
interpretation 6.6 Attrition data 6.7 Final thoughts 7. Multivariate
Adaptive Regression Splines 7.1 Prerequisites 7.2 The basic idea 7.2.1
Multivariate regression splines 7.3 Fitting a basic MARS model 7.4 Tuning
7.5 Feature interpretation 7.6 Attrition data 7.7 Final thoughts 8. K
-Nearest Neighbors 8.1 Prerequisites 8.2 Measuring similarity 8.2.1
Distance measures 8.2.2 Pre-processing 8.3 Choosing k 8.4 MNIST example 8.5
Final thoughts 9 Decision Trees 9.1 Prerequisites 9.2 Structure 9.3
Partitioning 9.4 How deep? 9.4.1 Early stopping 9.4.2 Pruning 9.5 Ames
housing example 9.6 Feature interpretation 9.7 Final thoughts 10. Bagging
10.1 Prerequisites 10.2 Why and when bagging works 10.3 Implementation 10.4
Easily parallelize 10.5 Feature interpretation 10.6 Final thoughts 11.
Random Forests 11.1 Prerequisites 11.2 Extending bagging 11.3
Out-of-the-box performance 11.4 Hyperparameters 11.4.1 Number of trees
11.4.2 mtry 11.4.3 Tree complexity 11.4.4 Sampling scheme 11.4.5 Split rule
11.5 Tuning strategies 11.6 Feature interpretation 11.7 Final thoughts 12.
Gradient Boosting 12.1 Prerequisites 12.2 How boosting works 12.2.1 A
sequential ensemble approach 12.2.2 Gradient descent 12.3 Basic GBM 12.3.1
Hyperparameters 12.3.2 Implementation 12.3.3 General tuning strategy 12.4
Stochastic GBMs 12.4.1 Stochastic hyperparameters 12.4.2 Implementation
12.5 XGBoost 12.5.1 XGBoost hyperparameters 12.5.2 Tuning strategy 12.6
Feature interpretation 12.7 Final thoughts 13. Deep Learning 13.1
Prerequisites 13.2 Why deep learning 13.3 Feedforward DNNs 13.4 Network
architecture 13.4.1 Layers and nodes 13.4.2 Activation 13.5 Backpropagation
13.6 Model training 13.7 Model tuning 13.7.1 Model capacity 13.7.2 Batch
normalization 13.7.3 Regularization 13.7.4 Adjust learning rate 13.8 Grid
Search 13.9 Final thoughts 14. Support Vector Machines 14.1 Prerequisites
14.2 Optimal separating hyperplanes 14.2.1 The hard margin classifier
14.2.2 The soft margin classifier 14.3 The support vector machine 14.3.1
More than two classes 14.3.2 Support vector regression 14.4 Job attrition
example 14.4.1 Class weights 14.4.2 Class probabilities 14.5 Feature
interpretation 14.6 Final thoughts 15. Stacked Models 15.1 Prerequisites
15.2 The Idea 15.2.1 Common ensemble methods 15.2.2 Super learner algorithm
15.2.3 Available packages 15.3 Stacking existing models 15.4 Stacking a
grid search 15.5 Automated machine learning 15.6 Final thoughts 16.
Interpretable Machine Learning 16.1 Prerequisites 16.2 The idea 16.2.1
Global interpretation 16.2.2 Local interpretation 16.2.3 Model-specific vs.
model-agnostic 16.3 Permutation-based feature importance 16.3.1 Concept
16.3.2 Implementation 16.4 Partial dependence 16.4.1 Concept 16.4.2
Implementation 16.4.3 Alternative uses 16.5 Individual conditional
expectation 16.5.1 Concept 16.5.2 Implementation 16.6 Feature interactions
16.6.1 Concept 16.6.2 Implementation 16.6.3 Alternatives 16.7 Local
interpretable model-agnostic explanations 16.7.1 Concept 16.7.2
Implementation 16.7.3 Tuning 16.7.4 Alternative uses 16.8 Shapley values
16.8.1 Concept 16.8.2 Implementation 16.8.3 XGBoost and built-in Shapley
values 16.9 Localized step-wise procedure 16.9.1 Concept 16.9.2
Implementation 16.10Final thoughts III DIMENSION REDUCTION 17. Principal
Components Analysis 17.1 Prerequisites 17.2 The idea 17.3 Finding principal
components 17.4 Performing PCA in R 17.5 Selecting the number of principal
components 17.5.1 Eigenvalue criterion 17.5.2 Proportion of variance
explained criterion 17.5.3 Scree plot criterion 17.6 Final thoughts 18.
Generalized Low Rank Models 18.1 Prerequisites 18.2 The idea 18.3 Finding
the lower ranks 18.3.1 Alternating minimization 18.3.2 Loss functions
18.3.3 Regularization 18.3.4 Selecting k 18.4 Fitting GLRMs in R 18.4.1
Basic GLRM model 18.4.2 Tuning to optimize for unseen data 18.5 Final
thoughts 19. Autoencoders 19.1 Prerequisites 19.2 Undercomplete
autoencoders 19.2.1 Comparing PCA to an autoencoder 19.2.2 Stacked
autoencoders 19.2.3 Visualizing the reconstruction 19.3 Sparse autoencoders
19.4 Denoising autoencoders 19.5 Anomaly detection 19.6 Final thoughts IV
Clustering 20. K-means Clustering 20.1 Prerequisites 20.2 Distance measures
20.3 Defining clusters 20.4 k-means algorithm 20.5 Clustering digits 20.6
How many clusters? 20.7 Clustering with mixed data 20.8 Alternative
partitioning methods 20.9 Final thoughts 21. Hierarchical Clustering 21.1
Prerequisites 21.2 Hierarchical clustering algorithms 21.3 Hierarchical
clustering in R 21.3.1 Agglomerative hierarchical clustering 21.3.2
Divisive hierarchical clustering 21.4 Determining optimal clusters 21.5
Working with dendrograms 21.6 Final thoughts 22. Model-based Clustering
22.1 Prerequisites 22.2 Measuring probability and uncertainty 22.3
Covariance types 22.4 Model selection 22.5 My basket example 22.6 Final
thoughts Bibliography Index
I FUNDAMENTALS 1. Introduction to Machine Learning 1.1 Supervised learning
1.1.1 Regression problems 1.1.2 Classification problems 1.2 Unsupervised
learning 1.3 Roadmap 1.4 The data sets 2. Modeling Process 2.1
Prerequisites 2.2 Data splitting 2.2.1 Simple random sampling 2.2.2
Stratified sampling 2.2.3 Class imbalances 2.3 Creating models in R 2.3.1
Many formula interfaces 2.3.2 Many engines 2.4 Resampling methods 2.4.1 k
-fold cross validation 2.4.2 Bootstrapping 2.4.3 Alternatives 2.5 Bias
variance trade-off 2.5.1 Bias 2.5.2 Variance 2.5.3 Hyperparameter tuning
2.6 Model evaluation 2.6.1 Regression models 2.6.2 Classification models
2.7 Putting the processes together 3. Feature & Target Engineering 3.1
Prerequisites 3.2 Target engineering 3.3 Dealing with missingness 3.3.1
Visualizing missing values 3.3.2 Imputation 3.4 Feature filtering 3.5
Numeric feature engineering 3.5.1 Skewness 3.5.2 Standardization 3.6
Categorical feature engineering 3.6.1 Lumping 3.6.2 One-hot & dummy
encoding 3.6.3 Label encoding 3.6.4 Alternatives 3.7 Dimension reduction
3.8 Proper implementation 3.8.1 Sequential steps 3.8.2 Data leakage 3.8.3
Putting the process together II SUPERVISED LEARNING 4. Linear Regression
4.1 Prerequisites 4.2 Simple linear regression 4.2.1 Estimation 4.2.2
Inference 4.3 Multiple linear regression 4.4 Assessing model accuracy 4.5
Model concerns 4.6 Principal component regression 4.7 Partial least squares
4.8 Feature interpretation 4.9 Final thoughts 5. Logistic Regression 5.1
Prerequisites 5.2 Why logistic regression 5.3 Simple logistic regression
5.4 Multiple logistic regression 5.5 Assessing model accuracy 5.6 Model
concerns 5.7 Feature interpretation 5.8 Final thoughts 6. Regularized
Regression 6.1 Prerequisites 6.2 Why regularize? 6.2.1 Ridge penalty 6.2.2
Lasso penalty 6.2.3 Elastic nets 6.3 Implementation 6.4 Tuning 6.5 Feature
interpretation 6.6 Attrition data 6.7 Final thoughts 7. Multivariate
Adaptive Regression Splines 7.1 Prerequisites 7.2 The basic idea 7.2.1
Multivariate regression splines 7.3 Fitting a basic MARS model 7.4 Tuning
7.5 Feature interpretation 7.6 Attrition data 7.7 Final thoughts 8. K
-Nearest Neighbors 8.1 Prerequisites 8.2 Measuring similarity 8.2.1
Distance measures 8.2.2 Pre-processing 8.3 Choosing k 8.4 MNIST example 8.5
Final thoughts 9 Decision Trees 9.1 Prerequisites 9.2 Structure 9.3
Partitioning 9.4 How deep? 9.4.1 Early stopping 9.4.2 Pruning 9.5 Ames
housing example 9.6 Feature interpretation 9.7 Final thoughts 10. Bagging
10.1 Prerequisites 10.2 Why and when bagging works 10.3 Implementation 10.4
Easily parallelize 10.5 Feature interpretation 10.6 Final thoughts 11.
Random Forests 11.1 Prerequisites 11.2 Extending bagging 11.3
Out-of-the-box performance 11.4 Hyperparameters 11.4.1 Number of trees
11.4.2 mtry 11.4.3 Tree complexity 11.4.4 Sampling scheme 11.4.5 Split rule
11.5 Tuning strategies 11.6 Feature interpretation 11.7 Final thoughts 12.
Gradient Boosting 12.1 Prerequisites 12.2 How boosting works 12.2.1 A
sequential ensemble approach 12.2.2 Gradient descent 12.3 Basic GBM 12.3.1
Hyperparameters 12.3.2 Implementation 12.3.3 General tuning strategy 12.4
Stochastic GBMs 12.4.1 Stochastic hyperparameters 12.4.2 Implementation
12.5 XGBoost 12.5.1 XGBoost hyperparameters 12.5.2 Tuning strategy 12.6
Feature interpretation 12.7 Final thoughts 13. Deep Learning 13.1
Prerequisites 13.2 Why deep learning 13.3 Feedforward DNNs 13.4 Network
architecture 13.4.1 Layers and nodes 13.4.2 Activation 13.5 Backpropagation
13.6 Model training 13.7 Model tuning 13.7.1 Model capacity 13.7.2 Batch
normalization 13.7.3 Regularization 13.7.4 Adjust learning rate 13.8 Grid
Search 13.9 Final thoughts 14. Support Vector Machines 14.1 Prerequisites
14.2 Optimal separating hyperplanes 14.2.1 The hard margin classifier
14.2.2 The soft margin classifier 14.3 The support vector machine 14.3.1
More than two classes 14.3.2 Support vector regression 14.4 Job attrition
example 14.4.1 Class weights 14.4.2 Class probabilities 14.5 Feature
interpretation 14.6 Final thoughts 15. Stacked Models 15.1 Prerequisites
15.2 The Idea 15.2.1 Common ensemble methods 15.2.2 Super learner algorithm
15.2.3 Available packages 15.3 Stacking existing models 15.4 Stacking a
grid search 15.5 Automated machine learning 15.6 Final thoughts 16.
Interpretable Machine Learning 16.1 Prerequisites 16.2 The idea 16.2.1
Global interpretation 16.2.2 Local interpretation 16.2.3 Model-specific vs.
model-agnostic 16.3 Permutation-based feature importance 16.3.1 Concept
16.3.2 Implementation 16.4 Partial dependence 16.4.1 Concept 16.4.2
Implementation 16.4.3 Alternative uses 16.5 Individual conditional
expectation 16.5.1 Concept 16.5.2 Implementation 16.6 Feature interactions
16.6.1 Concept 16.6.2 Implementation 16.6.3 Alternatives 16.7 Local
interpretable model-agnostic explanations 16.7.1 Concept 16.7.2
Implementation 16.7.3 Tuning 16.7.4 Alternative uses 16.8 Shapley values
16.8.1 Concept 16.8.2 Implementation 16.8.3 XGBoost and built-in Shapley
values 16.9 Localized step-wise procedure 16.9.1 Concept 16.9.2
Implementation 16.10Final thoughts III DIMENSION REDUCTION 17. Principal
Components Analysis 17.1 Prerequisites 17.2 The idea 17.3 Finding principal
components 17.4 Performing PCA in R 17.5 Selecting the number of principal
components 17.5.1 Eigenvalue criterion 17.5.2 Proportion of variance
explained criterion 17.5.3 Scree plot criterion 17.6 Final thoughts 18.
Generalized Low Rank Models 18.1 Prerequisites 18.2 The idea 18.3 Finding
the lower ranks 18.3.1 Alternating minimization 18.3.2 Loss functions
18.3.3 Regularization 18.3.4 Selecting k 18.4 Fitting GLRMs in R 18.4.1
Basic GLRM model 18.4.2 Tuning to optimize for unseen data 18.5 Final
thoughts 19. Autoencoders 19.1 Prerequisites 19.2 Undercomplete
autoencoders 19.2.1 Comparing PCA to an autoencoder 19.2.2 Stacked
autoencoders 19.2.3 Visualizing the reconstruction 19.3 Sparse autoencoders
19.4 Denoising autoencoders 19.5 Anomaly detection 19.6 Final thoughts IV
Clustering 20. K-means Clustering 20.1 Prerequisites 20.2 Distance measures
20.3 Defining clusters 20.4 k-means algorithm 20.5 Clustering digits 20.6
How many clusters? 20.7 Clustering with mixed data 20.8 Alternative
partitioning methods 20.9 Final thoughts 21. Hierarchical Clustering 21.1
Prerequisites 21.2 Hierarchical clustering algorithms 21.3 Hierarchical
clustering in R 21.3.1 Agglomerative hierarchical clustering 21.3.2
Divisive hierarchical clustering 21.4 Determining optimal clusters 21.5
Working with dendrograms 21.6 Final thoughts 22. Model-based Clustering
22.1 Prerequisites 22.2 Measuring probability and uncertainty 22.3
Covariance types 22.4 Model selection 22.5 My basket example 22.6 Final
thoughts Bibliography Index
1.1.1 Regression problems 1.1.2 Classification problems 1.2 Unsupervised
learning 1.3 Roadmap 1.4 The data sets 2. Modeling Process 2.1
Prerequisites 2.2 Data splitting 2.2.1 Simple random sampling 2.2.2
Stratified sampling 2.2.3 Class imbalances 2.3 Creating models in R 2.3.1
Many formula interfaces 2.3.2 Many engines 2.4 Resampling methods 2.4.1 k
-fold cross validation 2.4.2 Bootstrapping 2.4.3 Alternatives 2.5 Bias
variance trade-off 2.5.1 Bias 2.5.2 Variance 2.5.3 Hyperparameter tuning
2.6 Model evaluation 2.6.1 Regression models 2.6.2 Classification models
2.7 Putting the processes together 3. Feature & Target Engineering 3.1
Prerequisites 3.2 Target engineering 3.3 Dealing with missingness 3.3.1
Visualizing missing values 3.3.2 Imputation 3.4 Feature filtering 3.5
Numeric feature engineering 3.5.1 Skewness 3.5.2 Standardization 3.6
Categorical feature engineering 3.6.1 Lumping 3.6.2 One-hot & dummy
encoding 3.6.3 Label encoding 3.6.4 Alternatives 3.7 Dimension reduction
3.8 Proper implementation 3.8.1 Sequential steps 3.8.2 Data leakage 3.8.3
Putting the process together II SUPERVISED LEARNING 4. Linear Regression
4.1 Prerequisites 4.2 Simple linear regression 4.2.1 Estimation 4.2.2
Inference 4.3 Multiple linear regression 4.4 Assessing model accuracy 4.5
Model concerns 4.6 Principal component regression 4.7 Partial least squares
4.8 Feature interpretation 4.9 Final thoughts 5. Logistic Regression 5.1
Prerequisites 5.2 Why logistic regression 5.3 Simple logistic regression
5.4 Multiple logistic regression 5.5 Assessing model accuracy 5.6 Model
concerns 5.7 Feature interpretation 5.8 Final thoughts 6. Regularized
Regression 6.1 Prerequisites 6.2 Why regularize? 6.2.1 Ridge penalty 6.2.2
Lasso penalty 6.2.3 Elastic nets 6.3 Implementation 6.4 Tuning 6.5 Feature
interpretation 6.6 Attrition data 6.7 Final thoughts 7. Multivariate
Adaptive Regression Splines 7.1 Prerequisites 7.2 The basic idea 7.2.1
Multivariate regression splines 7.3 Fitting a basic MARS model 7.4 Tuning
7.5 Feature interpretation 7.6 Attrition data 7.7 Final thoughts 8. K
-Nearest Neighbors 8.1 Prerequisites 8.2 Measuring similarity 8.2.1
Distance measures 8.2.2 Pre-processing 8.3 Choosing k 8.4 MNIST example 8.5
Final thoughts 9 Decision Trees 9.1 Prerequisites 9.2 Structure 9.3
Partitioning 9.4 How deep? 9.4.1 Early stopping 9.4.2 Pruning 9.5 Ames
housing example 9.6 Feature interpretation 9.7 Final thoughts 10. Bagging
10.1 Prerequisites 10.2 Why and when bagging works 10.3 Implementation 10.4
Easily parallelize 10.5 Feature interpretation 10.6 Final thoughts 11.
Random Forests 11.1 Prerequisites 11.2 Extending bagging 11.3
Out-of-the-box performance 11.4 Hyperparameters 11.4.1 Number of trees
11.4.2 mtry 11.4.3 Tree complexity 11.4.4 Sampling scheme 11.4.5 Split rule
11.5 Tuning strategies 11.6 Feature interpretation 11.7 Final thoughts 12.
Gradient Boosting 12.1 Prerequisites 12.2 How boosting works 12.2.1 A
sequential ensemble approach 12.2.2 Gradient descent 12.3 Basic GBM 12.3.1
Hyperparameters 12.3.2 Implementation 12.3.3 General tuning strategy 12.4
Stochastic GBMs 12.4.1 Stochastic hyperparameters 12.4.2 Implementation
12.5 XGBoost 12.5.1 XGBoost hyperparameters 12.5.2 Tuning strategy 12.6
Feature interpretation 12.7 Final thoughts 13. Deep Learning 13.1
Prerequisites 13.2 Why deep learning 13.3 Feedforward DNNs 13.4 Network
architecture 13.4.1 Layers and nodes 13.4.2 Activation 13.5 Backpropagation
13.6 Model training 13.7 Model tuning 13.7.1 Model capacity 13.7.2 Batch
normalization 13.7.3 Regularization 13.7.4 Adjust learning rate 13.8 Grid
Search 13.9 Final thoughts 14. Support Vector Machines 14.1 Prerequisites
14.2 Optimal separating hyperplanes 14.2.1 The hard margin classifier
14.2.2 The soft margin classifier 14.3 The support vector machine 14.3.1
More than two classes 14.3.2 Support vector regression 14.4 Job attrition
example 14.4.1 Class weights 14.4.2 Class probabilities 14.5 Feature
interpretation 14.6 Final thoughts 15. Stacked Models 15.1 Prerequisites
15.2 The Idea 15.2.1 Common ensemble methods 15.2.2 Super learner algorithm
15.2.3 Available packages 15.3 Stacking existing models 15.4 Stacking a
grid search 15.5 Automated machine learning 15.6 Final thoughts 16.
Interpretable Machine Learning 16.1 Prerequisites 16.2 The idea 16.2.1
Global interpretation 16.2.2 Local interpretation 16.2.3 Model-specific vs.
model-agnostic 16.3 Permutation-based feature importance 16.3.1 Concept
16.3.2 Implementation 16.4 Partial dependence 16.4.1 Concept 16.4.2
Implementation 16.4.3 Alternative uses 16.5 Individual conditional
expectation 16.5.1 Concept 16.5.2 Implementation 16.6 Feature interactions
16.6.1 Concept 16.6.2 Implementation 16.6.3 Alternatives 16.7 Local
interpretable model-agnostic explanations 16.7.1 Concept 16.7.2
Implementation 16.7.3 Tuning 16.7.4 Alternative uses 16.8 Shapley values
16.8.1 Concept 16.8.2 Implementation 16.8.3 XGBoost and built-in Shapley
values 16.9 Localized step-wise procedure 16.9.1 Concept 16.9.2
Implementation 16.10Final thoughts III DIMENSION REDUCTION 17. Principal
Components Analysis 17.1 Prerequisites 17.2 The idea 17.3 Finding principal
components 17.4 Performing PCA in R 17.5 Selecting the number of principal
components 17.5.1 Eigenvalue criterion 17.5.2 Proportion of variance
explained criterion 17.5.3 Scree plot criterion 17.6 Final thoughts 18.
Generalized Low Rank Models 18.1 Prerequisites 18.2 The idea 18.3 Finding
the lower ranks 18.3.1 Alternating minimization 18.3.2 Loss functions
18.3.3 Regularization 18.3.4 Selecting k 18.4 Fitting GLRMs in R 18.4.1
Basic GLRM model 18.4.2 Tuning to optimize for unseen data 18.5 Final
thoughts 19. Autoencoders 19.1 Prerequisites 19.2 Undercomplete
autoencoders 19.2.1 Comparing PCA to an autoencoder 19.2.2 Stacked
autoencoders 19.2.3 Visualizing the reconstruction 19.3 Sparse autoencoders
19.4 Denoising autoencoders 19.5 Anomaly detection 19.6 Final thoughts IV
Clustering 20. K-means Clustering 20.1 Prerequisites 20.2 Distance measures
20.3 Defining clusters 20.4 k-means algorithm 20.5 Clustering digits 20.6
How many clusters? 20.7 Clustering with mixed data 20.8 Alternative
partitioning methods 20.9 Final thoughts 21. Hierarchical Clustering 21.1
Prerequisites 21.2 Hierarchical clustering algorithms 21.3 Hierarchical
clustering in R 21.3.1 Agglomerative hierarchical clustering 21.3.2
Divisive hierarchical clustering 21.4 Determining optimal clusters 21.5
Working with dendrograms 21.6 Final thoughts 22. Model-based Clustering
22.1 Prerequisites 22.2 Measuring probability and uncertainty 22.3
Covariance types 22.4 Model selection 22.5 My basket example 22.6 Final
thoughts Bibliography Index