Dean Abbott
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
Schade – dieser Artikel ist leider ausverkauft. Sobald wir wissen, ob und wann der Artikel wieder verfügbar ist, informieren wir Sie an dieser Stelle.
Dean Abbott
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
- Broschiertes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Written by one of the leading experts on predictive analytics, Applied Predictive Analytics shows tech-savvy business managers and data analysts how to use the sophisticated techniques of predictive analytics that mine Big Data to solve practical business problems. This cutting-edge text teaches readers the methods, principles, and techniques for conducting predictive analytics projects, from start to finish, with a focus on best practices, including tips and tricks, that are essential for successful predictive modeling.
Learn the art and science of predictive analytics -- techniques that…mehr
Andere Kunden interessierten sich auch für
- Gordon S. LinoffData Mining Techniques53,99 €
- Claudia ImhoffMastering Data Warehouse Design37,99 €
- Matt CastersPentaho Kettle Solutions42,99 €
- Rod StephensBeginning Database Design Solutions38,99 €
- George SpoffordMDX Solutions73,99 €
- Michael LeeMastering SQL Server 200863,99 €
- Len SilverstonThe Data Model Resource Book, Volume 282,99 €
-
-
Written by one of the leading experts on predictive analytics, Applied Predictive Analytics shows tech-savvy business managers and data analysts how to use the sophisticated techniques of predictive analytics that mine Big Data to solve practical business problems. This cutting-edge text teaches readers the methods, principles, and techniques for conducting predictive analytics projects, from start to finish, with a focus on best practices, including tips and tricks, that are essential for successful predictive modeling.
Learn the art and science of predictive analytics -- techniques that get results
Predictive analytics is what translates big data into meaningful, usable business information. Written by a leading expert in the field, this guide examines the science of the underlying algorithms as well as the principles and best practices that govern the art of predictive analytics. It clearly explains the theory behind predictive analytics, teaches the methods, principles, and techniques for conducting predictive analytics projects, and offers tips and tricks that are essential for successful predictive modeling. Hands-on examples and case studies are included.
The ability to successfully apply predictive analytics enables businesses to effectively interpret big data; essential for competition today
This guide teaches not only the principles of predictive analytics, but also how to apply them to achieve real, pragmatic solutions
Explains methods, principles, and techniques for conducting predictive analytics projects from start to finish
Illustrates each technique with hands-on examples and includes as series of in-depth case studies that apply predictive analytics to common business scenarios
A companion website provides all the data sets used to generate the examples as well as a free trial version of software
Applied Predictive Analytics arms data and business analysts and business managers with the tools they need to interpret and capitalize on big data.
Learn the art and science of predictive analytics -- techniques that get results
Predictive analytics is what translates big data into meaningful, usable business information. Written by a leading expert in the field, this guide examines the science of the underlying algorithms as well as the principles and best practices that govern the art of predictive analytics. It clearly explains the theory behind predictive analytics, teaches the methods, principles, and techniques for conducting predictive analytics projects, and offers tips and tricks that are essential for successful predictive modeling. Hands-on examples and case studies are included.
The ability to successfully apply predictive analytics enables businesses to effectively interpret big data; essential for competition today
This guide teaches not only the principles of predictive analytics, but also how to apply them to achieve real, pragmatic solutions
Explains methods, principles, and techniques for conducting predictive analytics projects from start to finish
Illustrates each technique with hands-on examples and includes as series of in-depth case studies that apply predictive analytics to common business scenarios
A companion website provides all the data sets used to generate the examples as well as a free trial version of software
Applied Predictive Analytics arms data and business analysts and business managers with the tools they need to interpret and capitalize on big data.
Produktdetails
- Produktdetails
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 464
- Erscheinungstermin: 14. April 2014
- Englisch
- Abmessung: 235mm x 187mm x 28mm
- Gewicht: 778g
- ISBN-13: 9781118727966
- ISBN-10: 1118727967
- Artikelnr.: 39359786
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 464
- Erscheinungstermin: 14. April 2014
- Englisch
- Abmessung: 235mm x 187mm x 28mm
- Gewicht: 778g
- ISBN-13: 9781118727966
- ISBN-10: 1118727967
- Artikelnr.: 39359786
DEAN ABBOTT is President of Abbott Analytics, Inc. (San Diego). He is an internationally recognized data mining and predictive analytics expert with over two decades experience in fraud detection, risk modeling, text mining, personality assessment, planned giving, toxicology, and other applications. He is also Chief Scientist of SmarterRemarketer, a company focusing on behaviorally- and data-driven marketing and web analytics.
Introduction xxi Chapter 1 Overview of Predictive Analytics 1 What Is
Analytics? 3 What Is Predictive Analytics? 3 Supervised vs. Unsupervised
Learning 5 Parametric vs. Non-Parametric Models 6 Business Intelligence 6
Predictive Analytics vs. Business Intelligence 8 Do Predictive Models Just
State the Obvious? 9 Similarities between Business Intelligence and
Predictive Analytics 9 Predictive Analytics vs. Statistics 10 Statistics
and Analytics 11 Predictive Analytics and Statistics Contrasted 12
Predictive Analytics vs. Data Mining 13 Who Uses Predictive Analytics? 13
Challenges in Using Predictive Analytics 14 Obstacles in Management 14
Obstacles with Data 14 Obstacles with Modeling 15 Obstacles in Deployment
16 What Educational Background Is Needed to Become a Predictive Modeler? 16
Chapter 2 Setting Up the Problem 19 Predictive Analytics Processing Steps:
CRISP-DM 19 Business Understanding 21 The Three-Legged Stool 22 Business
Objectives 23 Defining Data for Predictive Modeling 25 Defining the Columns
as Measures 26 Defining the Unit of Analysis 27 Which Unit of Analysis? 28
Defining the Target Variable 29 Temporal Considerations for Target Variable
31 Defining Measures of Success for Predictive Models 32 Success Criteria
for Classifi cation 32 Success Criteria for Estimation 33 Other Customized
Success Criteria 33 Doing Predictive Modeling Out of Order 34 Building
Models First 34 Early Model Deployment 35 Case Study: Recovering Lapsed
Donors 35 Overview 36 Business Objectives 36 Data for the Competition 36
The Target Variables 36 Modeling Objectives 37 Model Selection and
Evaluation Criteria 38 Model Deployment 39 Case Study: Fraud Detection 39
Overview 39 Business Objectives 39 Data for the Project 40 The Target
Variables 40 Modeling Objectives 41 Model Selection and Evaluation Criteria
41 Model Deployment 41 Summary 42 Chapter 3 Data Understanding 43 What the
Data Looks Like 44 Single Variable Summaries 44 Mean 45 Standard Deviation
45 The Normal Distribution 45 Uniform Distribution 46 Applying Simple
Statistics in Data Understanding 47 Skewness 49 Kurtosis 51 Rank-Ordered
Statistics 52 Categorical Variable Assessment 55 Data Visualization in One
Dimension 58 Histograms 59 Multiple Variable Summaries 64 Hidden Value in
Variable Interactions: Simpson's Paradox 64 The Combinatorial Explosion of
Interactions 65 Correlations 66 Spurious Correlations 66 Back to
Correlations 67 Crosstabs 68 Data Visualization, Two or Higher Dimensions
69 Scatterplots 69 Anscombe's Quartet 71 Scatterplot Matrices 75 Overlaying
the Target Variable in Summary 76 Scatterplots in More Than Two Dimensions
78 The Value of Statistical Signifi cance 80 Pulling It All Together into a
Data Audit 81 Summary 82 Chapter 4 Data Preparation 83 Variable Cleaning 84
Incorrect Values 84 Consistency in Data Formats 85 Outliers 85
Multidimensional Outliers 89 Missing Values 90 Fixing Missing Data 91
Feature Creation 98 Simple Variable Transformations 98 Fixing Skew 99
Binning Continuous Variables 103 Numeric Variable Scaling 104 Nominal
Variable Transformation 107 Ordinal Variable Transformations 108 Date and
Time Variable Features 109 ZIP Code Features 110 Which Version of a
Variable Is Best? 110 Multidimensional Features 112 Variable Selection
Prior to Modeling 117 Sampling 123 Example: Why Normalization Matters for
K-Means Clustering 139 Summary 143 Chapter 5 Itemsets and Association Rules
145 Terminology 146 Condition 147 Left-Hand-Side, Antecedent(s) 148
Right-Hand-Side, Consequent, Output, Conclusion 148 Rule (Item Set) 148
Support 149 Antecedent Support 149 Confi dence, Accuracy 150 Lift 150
Parameter Settings 151 How the Data Is Organized 151 Standard Predictive
Modeling Data Format 151 Transactional Format 152 Measures of Interesting
Rules 154 Deploying Association Rules 156 Variable Selection 157
Interaction Variable Creation 157 Problems with Association Rules 158
Redundant Rules 158 Too Many Rules 158 Too Few Rules 159 Building
Classification Rules from Association Rules 159 Summary 161 Chapter 6
Descriptive Modeling 163 Data Preparation Issues with Descriptive Modeling
164 Principal Component Analysis 165 The PCA Algorithm 165 Applying PCA to
New Data 169 PCA for Data Interpretation 171 Additional Considerations
before Using PCA 172 The Effect of Variable Magnitude on PCA Models 174
Clustering Algorithms 177 The K-Means Algorithm 178 Data Preparation for
K-Means 183 Selecting the Number of Clusters 185 The Kohonen SOM Algorithm
192 Visualizing Kohonen Maps 194 Similarities with K-Means 196 Summary 197
Chapter 7 Interpreting Descriptive Models 199 Standard Cluster Model
Interpretation 199 Problems with Interpretation Methods 202 Identifying Key
Variables in Forming Cluster Models 203 Cluster Prototypes 209 Cluster
Outliers 210 Summary 212 Chapter 8 Predictive Modeling 213 Decision Trees
214 The Decision Tree Landscape 215 Building Decision Trees 218 Decision
Tree Splitting Metrics 221 Decision Tree Knobs and Options 222 Reweighting
Records: Priors 224 Reweighting Records: Misclassifi cation Costs 224 Other
Practical Considerations for Decision Trees 229 Logistic Regression 230
Interpreting Logistic Regression Models 233 Other Practical Considerations
for Logistic Regression 235 Neural Networks 240 Building Blocks: The Neuron
242 Neural Network Training 244 The Flexibility of Neural Networks 247
Neural Network Settings 249 Neural Network Pruning 251 Interpreting Neural
Networks 252 Neural Network Decision Boundaries 253 Other Practical
Considerations for Neural Networks 253 K-Nearest Neighbor 254 The k-NN
Learning Algorithm 254 Distance Metrics for k-NN 258 Other Practical
Considerations for k-NN 259 Naïve Bayes 264 Bayes' Theorem 264 The Naïve
Bayes Classifier 268 Interpreting Naïve Bayes Classifi ers 268 Other
Practical Considerations for Naïve Bayes 269 Regression Models 270 Linear
Regression 271 Linear Regression Assumptions 274 Variable Selection in
Linear Regression 276 Interpreting Linear Regression Models 278 Using
Linear Regression for Classification 279 Other Regression Algorithms 280
Summary 281 Chapter 9 Assessing Predictive Models 283 Batch Approach to
Model Assessment 284 Percent Correct Classifi cation 284 Rank-Ordered
Approach to Model Assessment 293 Assessing Regression Models 301 Summary
304 Chapter 10 Model Ensembles 307 Motivation for Ensembles 307 The Wisdom
of Crowds 308 Bias Variance Tradeoff 309 Bagging 311 Boosting 316
Improvements to Bagging and Boosting 320 Random Forests 320 Stochastic
Gradient Boosting 321 Heterogeneous Ensembles 321 Model Ensembles and
Occam's Razor 323 Interpreting Model Ensembles 323 Summary 326 Chapter 11
Text Mining 327 Motivation for Text Mining 328 A Predictive Modeling
Approach to Text Mining 329 Structured vs. Unstructured Data 329 Why Text
Mining Is Hard 330 Text Mining Applications 332 Data Sources for Text
Mining 333 Data Preparation Steps 333 POS Tagging 333 Tokens 336 Stop Word
and Punctuation Filters 336 Character Length and Number Filters 337
Stemming 337 Dictionaries 338 The Sentiment Polarity Movie Data Set 339
Text Mining Features 340 Term Frequency 341 Inverse Document Frequency 344
TF-IDF 344 Cosine Similarity 346 Multi-Word Features: N-Grams 346 Reducing
Keyword Features 347 Grouping Terms 347 Modeling with Text Mining Features
347 Regular Expressions 349 Uses of Regular Expressions in Text Mining 351
Summary 352 Chapter 12 Model Deployment 353 General Deployment
Considerations 354 Deployment Steps 355 Summary 375 Chapter 13 Case Studies
377 Survey Analysis Case Study: Overview 377 Business Understanding:
Defining the Problem 378 Data Understanding 380 Data Preparation 381
Modeling 385 Deployment: "What-If" Analysis 391 Revisit Models 392
Deployment 401 Summary and Conclusions 401 Help Desk Case Study 402 Data
Understanding: Defining the Data 403 Data Preparation 403 Modeling 405
Revisit Business Understanding 407 Deployment 409 Summary and Conclusions
411 Index 413
Analytics? 3 What Is Predictive Analytics? 3 Supervised vs. Unsupervised
Learning 5 Parametric vs. Non-Parametric Models 6 Business Intelligence 6
Predictive Analytics vs. Business Intelligence 8 Do Predictive Models Just
State the Obvious? 9 Similarities between Business Intelligence and
Predictive Analytics 9 Predictive Analytics vs. Statistics 10 Statistics
and Analytics 11 Predictive Analytics and Statistics Contrasted 12
Predictive Analytics vs. Data Mining 13 Who Uses Predictive Analytics? 13
Challenges in Using Predictive Analytics 14 Obstacles in Management 14
Obstacles with Data 14 Obstacles with Modeling 15 Obstacles in Deployment
16 What Educational Background Is Needed to Become a Predictive Modeler? 16
Chapter 2 Setting Up the Problem 19 Predictive Analytics Processing Steps:
CRISP-DM 19 Business Understanding 21 The Three-Legged Stool 22 Business
Objectives 23 Defining Data for Predictive Modeling 25 Defining the Columns
as Measures 26 Defining the Unit of Analysis 27 Which Unit of Analysis? 28
Defining the Target Variable 29 Temporal Considerations for Target Variable
31 Defining Measures of Success for Predictive Models 32 Success Criteria
for Classifi cation 32 Success Criteria for Estimation 33 Other Customized
Success Criteria 33 Doing Predictive Modeling Out of Order 34 Building
Models First 34 Early Model Deployment 35 Case Study: Recovering Lapsed
Donors 35 Overview 36 Business Objectives 36 Data for the Competition 36
The Target Variables 36 Modeling Objectives 37 Model Selection and
Evaluation Criteria 38 Model Deployment 39 Case Study: Fraud Detection 39
Overview 39 Business Objectives 39 Data for the Project 40 The Target
Variables 40 Modeling Objectives 41 Model Selection and Evaluation Criteria
41 Model Deployment 41 Summary 42 Chapter 3 Data Understanding 43 What the
Data Looks Like 44 Single Variable Summaries 44 Mean 45 Standard Deviation
45 The Normal Distribution 45 Uniform Distribution 46 Applying Simple
Statistics in Data Understanding 47 Skewness 49 Kurtosis 51 Rank-Ordered
Statistics 52 Categorical Variable Assessment 55 Data Visualization in One
Dimension 58 Histograms 59 Multiple Variable Summaries 64 Hidden Value in
Variable Interactions: Simpson's Paradox 64 The Combinatorial Explosion of
Interactions 65 Correlations 66 Spurious Correlations 66 Back to
Correlations 67 Crosstabs 68 Data Visualization, Two or Higher Dimensions
69 Scatterplots 69 Anscombe's Quartet 71 Scatterplot Matrices 75 Overlaying
the Target Variable in Summary 76 Scatterplots in More Than Two Dimensions
78 The Value of Statistical Signifi cance 80 Pulling It All Together into a
Data Audit 81 Summary 82 Chapter 4 Data Preparation 83 Variable Cleaning 84
Incorrect Values 84 Consistency in Data Formats 85 Outliers 85
Multidimensional Outliers 89 Missing Values 90 Fixing Missing Data 91
Feature Creation 98 Simple Variable Transformations 98 Fixing Skew 99
Binning Continuous Variables 103 Numeric Variable Scaling 104 Nominal
Variable Transformation 107 Ordinal Variable Transformations 108 Date and
Time Variable Features 109 ZIP Code Features 110 Which Version of a
Variable Is Best? 110 Multidimensional Features 112 Variable Selection
Prior to Modeling 117 Sampling 123 Example: Why Normalization Matters for
K-Means Clustering 139 Summary 143 Chapter 5 Itemsets and Association Rules
145 Terminology 146 Condition 147 Left-Hand-Side, Antecedent(s) 148
Right-Hand-Side, Consequent, Output, Conclusion 148 Rule (Item Set) 148
Support 149 Antecedent Support 149 Confi dence, Accuracy 150 Lift 150
Parameter Settings 151 How the Data Is Organized 151 Standard Predictive
Modeling Data Format 151 Transactional Format 152 Measures of Interesting
Rules 154 Deploying Association Rules 156 Variable Selection 157
Interaction Variable Creation 157 Problems with Association Rules 158
Redundant Rules 158 Too Many Rules 158 Too Few Rules 159 Building
Classification Rules from Association Rules 159 Summary 161 Chapter 6
Descriptive Modeling 163 Data Preparation Issues with Descriptive Modeling
164 Principal Component Analysis 165 The PCA Algorithm 165 Applying PCA to
New Data 169 PCA for Data Interpretation 171 Additional Considerations
before Using PCA 172 The Effect of Variable Magnitude on PCA Models 174
Clustering Algorithms 177 The K-Means Algorithm 178 Data Preparation for
K-Means 183 Selecting the Number of Clusters 185 The Kohonen SOM Algorithm
192 Visualizing Kohonen Maps 194 Similarities with K-Means 196 Summary 197
Chapter 7 Interpreting Descriptive Models 199 Standard Cluster Model
Interpretation 199 Problems with Interpretation Methods 202 Identifying Key
Variables in Forming Cluster Models 203 Cluster Prototypes 209 Cluster
Outliers 210 Summary 212 Chapter 8 Predictive Modeling 213 Decision Trees
214 The Decision Tree Landscape 215 Building Decision Trees 218 Decision
Tree Splitting Metrics 221 Decision Tree Knobs and Options 222 Reweighting
Records: Priors 224 Reweighting Records: Misclassifi cation Costs 224 Other
Practical Considerations for Decision Trees 229 Logistic Regression 230
Interpreting Logistic Regression Models 233 Other Practical Considerations
for Logistic Regression 235 Neural Networks 240 Building Blocks: The Neuron
242 Neural Network Training 244 The Flexibility of Neural Networks 247
Neural Network Settings 249 Neural Network Pruning 251 Interpreting Neural
Networks 252 Neural Network Decision Boundaries 253 Other Practical
Considerations for Neural Networks 253 K-Nearest Neighbor 254 The k-NN
Learning Algorithm 254 Distance Metrics for k-NN 258 Other Practical
Considerations for k-NN 259 Naïve Bayes 264 Bayes' Theorem 264 The Naïve
Bayes Classifier 268 Interpreting Naïve Bayes Classifi ers 268 Other
Practical Considerations for Naïve Bayes 269 Regression Models 270 Linear
Regression 271 Linear Regression Assumptions 274 Variable Selection in
Linear Regression 276 Interpreting Linear Regression Models 278 Using
Linear Regression for Classification 279 Other Regression Algorithms 280
Summary 281 Chapter 9 Assessing Predictive Models 283 Batch Approach to
Model Assessment 284 Percent Correct Classifi cation 284 Rank-Ordered
Approach to Model Assessment 293 Assessing Regression Models 301 Summary
304 Chapter 10 Model Ensembles 307 Motivation for Ensembles 307 The Wisdom
of Crowds 308 Bias Variance Tradeoff 309 Bagging 311 Boosting 316
Improvements to Bagging and Boosting 320 Random Forests 320 Stochastic
Gradient Boosting 321 Heterogeneous Ensembles 321 Model Ensembles and
Occam's Razor 323 Interpreting Model Ensembles 323 Summary 326 Chapter 11
Text Mining 327 Motivation for Text Mining 328 A Predictive Modeling
Approach to Text Mining 329 Structured vs. Unstructured Data 329 Why Text
Mining Is Hard 330 Text Mining Applications 332 Data Sources for Text
Mining 333 Data Preparation Steps 333 POS Tagging 333 Tokens 336 Stop Word
and Punctuation Filters 336 Character Length and Number Filters 337
Stemming 337 Dictionaries 338 The Sentiment Polarity Movie Data Set 339
Text Mining Features 340 Term Frequency 341 Inverse Document Frequency 344
TF-IDF 344 Cosine Similarity 346 Multi-Word Features: N-Grams 346 Reducing
Keyword Features 347 Grouping Terms 347 Modeling with Text Mining Features
347 Regular Expressions 349 Uses of Regular Expressions in Text Mining 351
Summary 352 Chapter 12 Model Deployment 353 General Deployment
Considerations 354 Deployment Steps 355 Summary 375 Chapter 13 Case Studies
377 Survey Analysis Case Study: Overview 377 Business Understanding:
Defining the Problem 378 Data Understanding 380 Data Preparation 381
Modeling 385 Deployment: "What-If" Analysis 391 Revisit Models 392
Deployment 401 Summary and Conclusions 401 Help Desk Case Study 402 Data
Understanding: Defining the Data 403 Data Preparation 403 Modeling 405
Revisit Business Understanding 407 Deployment 409 Summary and Conclusions
411 Index 413
Introduction xxi Chapter 1 Overview of Predictive Analytics 1 What Is
Analytics? 3 What Is Predictive Analytics? 3 Supervised vs. Unsupervised
Learning 5 Parametric vs. Non-Parametric Models 6 Business Intelligence 6
Predictive Analytics vs. Business Intelligence 8 Do Predictive Models Just
State the Obvious? 9 Similarities between Business Intelligence and
Predictive Analytics 9 Predictive Analytics vs. Statistics 10 Statistics
and Analytics 11 Predictive Analytics and Statistics Contrasted 12
Predictive Analytics vs. Data Mining 13 Who Uses Predictive Analytics? 13
Challenges in Using Predictive Analytics 14 Obstacles in Management 14
Obstacles with Data 14 Obstacles with Modeling 15 Obstacles in Deployment
16 What Educational Background Is Needed to Become a Predictive Modeler? 16
Chapter 2 Setting Up the Problem 19 Predictive Analytics Processing Steps:
CRISP-DM 19 Business Understanding 21 The Three-Legged Stool 22 Business
Objectives 23 Defining Data for Predictive Modeling 25 Defining the Columns
as Measures 26 Defining the Unit of Analysis 27 Which Unit of Analysis? 28
Defining the Target Variable 29 Temporal Considerations for Target Variable
31 Defining Measures of Success for Predictive Models 32 Success Criteria
for Classifi cation 32 Success Criteria for Estimation 33 Other Customized
Success Criteria 33 Doing Predictive Modeling Out of Order 34 Building
Models First 34 Early Model Deployment 35 Case Study: Recovering Lapsed
Donors 35 Overview 36 Business Objectives 36 Data for the Competition 36
The Target Variables 36 Modeling Objectives 37 Model Selection and
Evaluation Criteria 38 Model Deployment 39 Case Study: Fraud Detection 39
Overview 39 Business Objectives 39 Data for the Project 40 The Target
Variables 40 Modeling Objectives 41 Model Selection and Evaluation Criteria
41 Model Deployment 41 Summary 42 Chapter 3 Data Understanding 43 What the
Data Looks Like 44 Single Variable Summaries 44 Mean 45 Standard Deviation
45 The Normal Distribution 45 Uniform Distribution 46 Applying Simple
Statistics in Data Understanding 47 Skewness 49 Kurtosis 51 Rank-Ordered
Statistics 52 Categorical Variable Assessment 55 Data Visualization in One
Dimension 58 Histograms 59 Multiple Variable Summaries 64 Hidden Value in
Variable Interactions: Simpson's Paradox 64 The Combinatorial Explosion of
Interactions 65 Correlations 66 Spurious Correlations 66 Back to
Correlations 67 Crosstabs 68 Data Visualization, Two or Higher Dimensions
69 Scatterplots 69 Anscombe's Quartet 71 Scatterplot Matrices 75 Overlaying
the Target Variable in Summary 76 Scatterplots in More Than Two Dimensions
78 The Value of Statistical Signifi cance 80 Pulling It All Together into a
Data Audit 81 Summary 82 Chapter 4 Data Preparation 83 Variable Cleaning 84
Incorrect Values 84 Consistency in Data Formats 85 Outliers 85
Multidimensional Outliers 89 Missing Values 90 Fixing Missing Data 91
Feature Creation 98 Simple Variable Transformations 98 Fixing Skew 99
Binning Continuous Variables 103 Numeric Variable Scaling 104 Nominal
Variable Transformation 107 Ordinal Variable Transformations 108 Date and
Time Variable Features 109 ZIP Code Features 110 Which Version of a
Variable Is Best? 110 Multidimensional Features 112 Variable Selection
Prior to Modeling 117 Sampling 123 Example: Why Normalization Matters for
K-Means Clustering 139 Summary 143 Chapter 5 Itemsets and Association Rules
145 Terminology 146 Condition 147 Left-Hand-Side, Antecedent(s) 148
Right-Hand-Side, Consequent, Output, Conclusion 148 Rule (Item Set) 148
Support 149 Antecedent Support 149 Confi dence, Accuracy 150 Lift 150
Parameter Settings 151 How the Data Is Organized 151 Standard Predictive
Modeling Data Format 151 Transactional Format 152 Measures of Interesting
Rules 154 Deploying Association Rules 156 Variable Selection 157
Interaction Variable Creation 157 Problems with Association Rules 158
Redundant Rules 158 Too Many Rules 158 Too Few Rules 159 Building
Classification Rules from Association Rules 159 Summary 161 Chapter 6
Descriptive Modeling 163 Data Preparation Issues with Descriptive Modeling
164 Principal Component Analysis 165 The PCA Algorithm 165 Applying PCA to
New Data 169 PCA for Data Interpretation 171 Additional Considerations
before Using PCA 172 The Effect of Variable Magnitude on PCA Models 174
Clustering Algorithms 177 The K-Means Algorithm 178 Data Preparation for
K-Means 183 Selecting the Number of Clusters 185 The Kohonen SOM Algorithm
192 Visualizing Kohonen Maps 194 Similarities with K-Means 196 Summary 197
Chapter 7 Interpreting Descriptive Models 199 Standard Cluster Model
Interpretation 199 Problems with Interpretation Methods 202 Identifying Key
Variables in Forming Cluster Models 203 Cluster Prototypes 209 Cluster
Outliers 210 Summary 212 Chapter 8 Predictive Modeling 213 Decision Trees
214 The Decision Tree Landscape 215 Building Decision Trees 218 Decision
Tree Splitting Metrics 221 Decision Tree Knobs and Options 222 Reweighting
Records: Priors 224 Reweighting Records: Misclassifi cation Costs 224 Other
Practical Considerations for Decision Trees 229 Logistic Regression 230
Interpreting Logistic Regression Models 233 Other Practical Considerations
for Logistic Regression 235 Neural Networks 240 Building Blocks: The Neuron
242 Neural Network Training 244 The Flexibility of Neural Networks 247
Neural Network Settings 249 Neural Network Pruning 251 Interpreting Neural
Networks 252 Neural Network Decision Boundaries 253 Other Practical
Considerations for Neural Networks 253 K-Nearest Neighbor 254 The k-NN
Learning Algorithm 254 Distance Metrics for k-NN 258 Other Practical
Considerations for k-NN 259 Naïve Bayes 264 Bayes' Theorem 264 The Naïve
Bayes Classifier 268 Interpreting Naïve Bayes Classifi ers 268 Other
Practical Considerations for Naïve Bayes 269 Regression Models 270 Linear
Regression 271 Linear Regression Assumptions 274 Variable Selection in
Linear Regression 276 Interpreting Linear Regression Models 278 Using
Linear Regression for Classification 279 Other Regression Algorithms 280
Summary 281 Chapter 9 Assessing Predictive Models 283 Batch Approach to
Model Assessment 284 Percent Correct Classifi cation 284 Rank-Ordered
Approach to Model Assessment 293 Assessing Regression Models 301 Summary
304 Chapter 10 Model Ensembles 307 Motivation for Ensembles 307 The Wisdom
of Crowds 308 Bias Variance Tradeoff 309 Bagging 311 Boosting 316
Improvements to Bagging and Boosting 320 Random Forests 320 Stochastic
Gradient Boosting 321 Heterogeneous Ensembles 321 Model Ensembles and
Occam's Razor 323 Interpreting Model Ensembles 323 Summary 326 Chapter 11
Text Mining 327 Motivation for Text Mining 328 A Predictive Modeling
Approach to Text Mining 329 Structured vs. Unstructured Data 329 Why Text
Mining Is Hard 330 Text Mining Applications 332 Data Sources for Text
Mining 333 Data Preparation Steps 333 POS Tagging 333 Tokens 336 Stop Word
and Punctuation Filters 336 Character Length and Number Filters 337
Stemming 337 Dictionaries 338 The Sentiment Polarity Movie Data Set 339
Text Mining Features 340 Term Frequency 341 Inverse Document Frequency 344
TF-IDF 344 Cosine Similarity 346 Multi-Word Features: N-Grams 346 Reducing
Keyword Features 347 Grouping Terms 347 Modeling with Text Mining Features
347 Regular Expressions 349 Uses of Regular Expressions in Text Mining 351
Summary 352 Chapter 12 Model Deployment 353 General Deployment
Considerations 354 Deployment Steps 355 Summary 375 Chapter 13 Case Studies
377 Survey Analysis Case Study: Overview 377 Business Understanding:
Defining the Problem 378 Data Understanding 380 Data Preparation 381
Modeling 385 Deployment: "What-If" Analysis 391 Revisit Models 392
Deployment 401 Summary and Conclusions 401 Help Desk Case Study 402 Data
Understanding: Defining the Data 403 Data Preparation 403 Modeling 405
Revisit Business Understanding 407 Deployment 409 Summary and Conclusions
411 Index 413
Analytics? 3 What Is Predictive Analytics? 3 Supervised vs. Unsupervised
Learning 5 Parametric vs. Non-Parametric Models 6 Business Intelligence 6
Predictive Analytics vs. Business Intelligence 8 Do Predictive Models Just
State the Obvious? 9 Similarities between Business Intelligence and
Predictive Analytics 9 Predictive Analytics vs. Statistics 10 Statistics
and Analytics 11 Predictive Analytics and Statistics Contrasted 12
Predictive Analytics vs. Data Mining 13 Who Uses Predictive Analytics? 13
Challenges in Using Predictive Analytics 14 Obstacles in Management 14
Obstacles with Data 14 Obstacles with Modeling 15 Obstacles in Deployment
16 What Educational Background Is Needed to Become a Predictive Modeler? 16
Chapter 2 Setting Up the Problem 19 Predictive Analytics Processing Steps:
CRISP-DM 19 Business Understanding 21 The Three-Legged Stool 22 Business
Objectives 23 Defining Data for Predictive Modeling 25 Defining the Columns
as Measures 26 Defining the Unit of Analysis 27 Which Unit of Analysis? 28
Defining the Target Variable 29 Temporal Considerations for Target Variable
31 Defining Measures of Success for Predictive Models 32 Success Criteria
for Classifi cation 32 Success Criteria for Estimation 33 Other Customized
Success Criteria 33 Doing Predictive Modeling Out of Order 34 Building
Models First 34 Early Model Deployment 35 Case Study: Recovering Lapsed
Donors 35 Overview 36 Business Objectives 36 Data for the Competition 36
The Target Variables 36 Modeling Objectives 37 Model Selection and
Evaluation Criteria 38 Model Deployment 39 Case Study: Fraud Detection 39
Overview 39 Business Objectives 39 Data for the Project 40 The Target
Variables 40 Modeling Objectives 41 Model Selection and Evaluation Criteria
41 Model Deployment 41 Summary 42 Chapter 3 Data Understanding 43 What the
Data Looks Like 44 Single Variable Summaries 44 Mean 45 Standard Deviation
45 The Normal Distribution 45 Uniform Distribution 46 Applying Simple
Statistics in Data Understanding 47 Skewness 49 Kurtosis 51 Rank-Ordered
Statistics 52 Categorical Variable Assessment 55 Data Visualization in One
Dimension 58 Histograms 59 Multiple Variable Summaries 64 Hidden Value in
Variable Interactions: Simpson's Paradox 64 The Combinatorial Explosion of
Interactions 65 Correlations 66 Spurious Correlations 66 Back to
Correlations 67 Crosstabs 68 Data Visualization, Two or Higher Dimensions
69 Scatterplots 69 Anscombe's Quartet 71 Scatterplot Matrices 75 Overlaying
the Target Variable in Summary 76 Scatterplots in More Than Two Dimensions
78 The Value of Statistical Signifi cance 80 Pulling It All Together into a
Data Audit 81 Summary 82 Chapter 4 Data Preparation 83 Variable Cleaning 84
Incorrect Values 84 Consistency in Data Formats 85 Outliers 85
Multidimensional Outliers 89 Missing Values 90 Fixing Missing Data 91
Feature Creation 98 Simple Variable Transformations 98 Fixing Skew 99
Binning Continuous Variables 103 Numeric Variable Scaling 104 Nominal
Variable Transformation 107 Ordinal Variable Transformations 108 Date and
Time Variable Features 109 ZIP Code Features 110 Which Version of a
Variable Is Best? 110 Multidimensional Features 112 Variable Selection
Prior to Modeling 117 Sampling 123 Example: Why Normalization Matters for
K-Means Clustering 139 Summary 143 Chapter 5 Itemsets and Association Rules
145 Terminology 146 Condition 147 Left-Hand-Side, Antecedent(s) 148
Right-Hand-Side, Consequent, Output, Conclusion 148 Rule (Item Set) 148
Support 149 Antecedent Support 149 Confi dence, Accuracy 150 Lift 150
Parameter Settings 151 How the Data Is Organized 151 Standard Predictive
Modeling Data Format 151 Transactional Format 152 Measures of Interesting
Rules 154 Deploying Association Rules 156 Variable Selection 157
Interaction Variable Creation 157 Problems with Association Rules 158
Redundant Rules 158 Too Many Rules 158 Too Few Rules 159 Building
Classification Rules from Association Rules 159 Summary 161 Chapter 6
Descriptive Modeling 163 Data Preparation Issues with Descriptive Modeling
164 Principal Component Analysis 165 The PCA Algorithm 165 Applying PCA to
New Data 169 PCA for Data Interpretation 171 Additional Considerations
before Using PCA 172 The Effect of Variable Magnitude on PCA Models 174
Clustering Algorithms 177 The K-Means Algorithm 178 Data Preparation for
K-Means 183 Selecting the Number of Clusters 185 The Kohonen SOM Algorithm
192 Visualizing Kohonen Maps 194 Similarities with K-Means 196 Summary 197
Chapter 7 Interpreting Descriptive Models 199 Standard Cluster Model
Interpretation 199 Problems with Interpretation Methods 202 Identifying Key
Variables in Forming Cluster Models 203 Cluster Prototypes 209 Cluster
Outliers 210 Summary 212 Chapter 8 Predictive Modeling 213 Decision Trees
214 The Decision Tree Landscape 215 Building Decision Trees 218 Decision
Tree Splitting Metrics 221 Decision Tree Knobs and Options 222 Reweighting
Records: Priors 224 Reweighting Records: Misclassifi cation Costs 224 Other
Practical Considerations for Decision Trees 229 Logistic Regression 230
Interpreting Logistic Regression Models 233 Other Practical Considerations
for Logistic Regression 235 Neural Networks 240 Building Blocks: The Neuron
242 Neural Network Training 244 The Flexibility of Neural Networks 247
Neural Network Settings 249 Neural Network Pruning 251 Interpreting Neural
Networks 252 Neural Network Decision Boundaries 253 Other Practical
Considerations for Neural Networks 253 K-Nearest Neighbor 254 The k-NN
Learning Algorithm 254 Distance Metrics for k-NN 258 Other Practical
Considerations for k-NN 259 Naïve Bayes 264 Bayes' Theorem 264 The Naïve
Bayes Classifier 268 Interpreting Naïve Bayes Classifi ers 268 Other
Practical Considerations for Naïve Bayes 269 Regression Models 270 Linear
Regression 271 Linear Regression Assumptions 274 Variable Selection in
Linear Regression 276 Interpreting Linear Regression Models 278 Using
Linear Regression for Classification 279 Other Regression Algorithms 280
Summary 281 Chapter 9 Assessing Predictive Models 283 Batch Approach to
Model Assessment 284 Percent Correct Classifi cation 284 Rank-Ordered
Approach to Model Assessment 293 Assessing Regression Models 301 Summary
304 Chapter 10 Model Ensembles 307 Motivation for Ensembles 307 The Wisdom
of Crowds 308 Bias Variance Tradeoff 309 Bagging 311 Boosting 316
Improvements to Bagging and Boosting 320 Random Forests 320 Stochastic
Gradient Boosting 321 Heterogeneous Ensembles 321 Model Ensembles and
Occam's Razor 323 Interpreting Model Ensembles 323 Summary 326 Chapter 11
Text Mining 327 Motivation for Text Mining 328 A Predictive Modeling
Approach to Text Mining 329 Structured vs. Unstructured Data 329 Why Text
Mining Is Hard 330 Text Mining Applications 332 Data Sources for Text
Mining 333 Data Preparation Steps 333 POS Tagging 333 Tokens 336 Stop Word
and Punctuation Filters 336 Character Length and Number Filters 337
Stemming 337 Dictionaries 338 The Sentiment Polarity Movie Data Set 339
Text Mining Features 340 Term Frequency 341 Inverse Document Frequency 344
TF-IDF 344 Cosine Similarity 346 Multi-Word Features: N-Grams 346 Reducing
Keyword Features 347 Grouping Terms 347 Modeling with Text Mining Features
347 Regular Expressions 349 Uses of Regular Expressions in Text Mining 351
Summary 352 Chapter 12 Model Deployment 353 General Deployment
Considerations 354 Deployment Steps 355 Summary 375 Chapter 13 Case Studies
377 Survey Analysis Case Study: Overview 377 Business Understanding:
Defining the Problem 378 Data Understanding 380 Data Preparation 381
Modeling 385 Deployment: "What-If" Analysis 391 Revisit Models 392
Deployment 401 Summary and Conclusions 401 Help Desk Case Study 402 Data
Understanding: Defining the Data 403 Data Preparation 403 Modeling 405
Revisit Business Understanding 407 Deployment 409 Summary and Conclusions
411 Index 413