Antonios Chorianopoulos
Effective Crm Using Predictive Analytics
Antonios Chorianopoulos
Effective Crm Using Predictive Analytics
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
A step-by-step guide to data mining applications in CRM. Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques. The book is organized into three parts. Part one provides a methodological roadmap, covering both the business and the technical aspects. The data mining process is presented in detail along with specific guidelines for the development of optimized acquisition, cross/ deep/ up selling and retention campaigns, as well as effective customer…mehr
Andere Kunden interessierten sich auch für
- Konstantinos TsiptsisData Mining Techniques in Crm117,99 €
- Walter W. PiegorschStatistical Data Analytics130,99 €
- Randall MatignonData Mining Using SAS Enterprise Miner160,99 €
- Andrea Ahlemeyer-StubbeA Practical Guide to Data Mining for Business and Industry100,99 €
- Marcello D'OrazioStatistical Matching154,99 €
- Wolfgang JankModeling Online Auctions163,99 €
- Ajit C. TamhanePredictive Analytics115,99 €
-
-
-
A step-by-step guide to data mining applications in CRM. Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques. The book is organized into three parts. Part one provides a methodological roadmap, covering both the business and the technical aspects. The data mining process is presented in detail along with specific guidelines for the development of optimized acquisition, cross/ deep/ up selling and retention campaigns, as well as effective customer segmentation schemes. In part two, some of the most useful data mining algorithms are explained in a simple and comprehensive way for business users with no technical expertise. Part three is packed with real world case studies which employ the use of three leading data mining tools: IBM SPSS Modeler, RapidMiner and Data Mining for Excel. Case studies from industries including banking, retail and telecommunications are presented in detail so as to serve as templates for developing similar applications. Key Features: * Includes numerous real-world case studies which are presented step by step, demystifying the usage of data mining models and clarifying all the methodological issues. * Topics are presented with the use of three leading data mining tools: IBM SPSS Modeler, RapidMiner and Data Mining for Excel. * Accompanied by a website featuring material from each case study, including datasets and relevant code. Combining data mining and business knowledge, this practical book provides all the necessary information for designing, setting up, executing and deploying data mining techniques in CRM. Effective CRM using Predictive Analytics will benefit data mining practitioners and consultants, data analysts, statisticians, and CRM officers. The book will also be useful to academics and students interested in applied data mining.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: John Wiley & Sons / Wiley
- Seitenzahl: 400
- Erscheinungstermin: 25. Dezember 2015
- Englisch
- Abmessung: 250mm x 175mm x 25mm
- Gewicht: 849g
- ISBN-13: 9781119011552
- ISBN-10: 1119011558
- Artikelnr.: 42774549
- Verlag: John Wiley & Sons / Wiley
- Seitenzahl: 400
- Erscheinungstermin: 25. Dezember 2015
- Englisch
- Abmessung: 250mm x 175mm x 25mm
- Gewicht: 849g
- ISBN-13: 9781119011552
- ISBN-10: 1119011558
- Artikelnr.: 42774549
Antonios Chorianopoulos, Alpha Bank Greece.
Preface xiii Acknowledgments xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the data 1 1.1 The applications 1 1.2 The methodology 4 1.3 The algorithms 6 1.3.1 Supervised models 6 1.3.1.1 Classification models 7 1.3.1.2 Estimation (regression) models 9 1.3.1.3 Feature selection (field screening) 10 1.3.2 Unsupervised models 10 1.3.2.1 Cluster models 11 1.3.2.2 Association (affinity) and sequence models 12 1.3.2.3 Dimensionality reduction models 14 1.3.2.4 Record screening models 14 1.4 The data 15 1.4.1 The mining datamart 16 1.4.2 The required data per industry 16 1.4.3 The customer "signature": from the mining datamart to the enriched, marketing reference table 16 1.5 Summary 20 Part I The Methodology 21 2 Classification modeling methodology 23 2.1 An overview of the methodology for classification modeling 23 2.2 Business understanding and design of the process 24 2.2.1 Definition of the business objective 24 2.2.2 Definition of the mining approach and of the data model 26 2.2.3 Design of the modeling process 27 2.2.3.1 Defining the modeling population 27 2.2.3.2 Determining the modeling (analysis) level 28 2.2.3.3 Definition of the target event and population 28 2.2.3.4 Deciding on time frames 29 2.3 Data understanding, preparation, and enrichment 33 2.3.1 Investigation of data sources 34 2.3.2 Selecting the data sources to be used 34 2.3.3 Data integration and aggregation 35 2.3.4 Data exploration, validation, and cleaning 35 2.3.5 Data transformations and enrichment 38 2.3.6 Applying a validation technique 40 2.3.6.1 Split or Holdout validation 40 2.3.6.2 Cross or n
fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test-control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high
value customers 96 2.9 Cross
selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep
selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up
selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of "premium" product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we've learned so far: it's not about the tool or the modeling algorithm. It's about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 "Technical" evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine
tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi
square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K
means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card
level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field's distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross
selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross
selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross
selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross
sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross
selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross
selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross
selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K
means cluster 354 8.6.1 Clustering with the K
means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362
fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test-control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high
value customers 96 2.9 Cross
selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep
selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up
selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of "premium" product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we've learned so far: it's not about the tool or the modeling algorithm. It's about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 "Technical" evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine
tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi
square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K
means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card
level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field's distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross
selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross
selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross
selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross
sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross
selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross
selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross
selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K
means cluster 354 8.6.1 Clustering with the K
means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362
Preface xiii Acknowledgments xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the data 1 1.1 The applications 1 1.2 The methodology 4 1.3 The algorithms 6 1.3.1 Supervised models 6 1.3.1.1 Classification models 7 1.3.1.2 Estimation (regression) models 9 1.3.1.3 Feature selection (field screening) 10 1.3.2 Unsupervised models 10 1.3.2.1 Cluster models 11 1.3.2.2 Association (affinity) and sequence models 12 1.3.2.3 Dimensionality reduction models 14 1.3.2.4 Record screening models 14 1.4 The data 15 1.4.1 The mining datamart 16 1.4.2 The required data per industry 16 1.4.3 The customer "signature": from the mining datamart to the enriched, marketing reference table 16 1.5 Summary 20 Part I The Methodology 21 2 Classification modeling methodology 23 2.1 An overview of the methodology for classification modeling 23 2.2 Business understanding and design of the process 24 2.2.1 Definition of the business objective 24 2.2.2 Definition of the mining approach and of the data model 26 2.2.3 Design of the modeling process 27 2.2.3.1 Defining the modeling population 27 2.2.3.2 Determining the modeling (analysis) level 28 2.2.3.3 Definition of the target event and population 28 2.2.3.4 Deciding on time frames 29 2.3 Data understanding, preparation, and enrichment 33 2.3.1 Investigation of data sources 34 2.3.2 Selecting the data sources to be used 34 2.3.3 Data integration and aggregation 35 2.3.4 Data exploration, validation, and cleaning 35 2.3.5 Data transformations and enrichment 38 2.3.6 Applying a validation technique 40 2.3.6.1 Split or Holdout validation 40 2.3.6.2 Cross or n
fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test-control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high
value customers 96 2.9 Cross
selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep
selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up
selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of "premium" product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we've learned so far: it's not about the tool or the modeling algorithm. It's about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 "Technical" evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine
tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi
square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K
means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card
level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field's distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross
selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross
selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross
selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross
sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross
selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross
selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross
selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K
means cluster 354 8.6.1 Clustering with the K
means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362
fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test-control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high
value customers 96 2.9 Cross
selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep
selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up
selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of "premium" product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we've learned so far: it's not about the tool or the modeling algorithm. It's about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 "Technical" evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine
tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi
square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K
means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card
level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field's distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross
selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross
selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross
selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross
sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross
selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross
selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross
selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K
means cluster 354 8.6.1 Clustering with the K
means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362