An essential guide to two burgeoning topics in machine learning - classification trees and ensemble learning Ensemble Classification Methods with Applications in R introduces the concepts and principles of ensemble classifiers methods and includes a review of the most commonly used techniques. This important resource shows how ensemble classification has become an extension of the individual classifiers. The text puts the emphasis on two areas of machine learning: classification trees and ensemble learning. The authors explore ensemble classification methods' basic characteristics and explain…mehr
An essential guide to two burgeoning topics in machine learning - classification trees and ensemble learning Ensemble Classification Methods with Applications in R introduces the concepts and principles of ensemble classifiers methods and includes a review of the most commonly used techniques. This important resource shows how ensemble classification has become an extension of the individual classifiers. The text puts the emphasis on two areas of machine learning: classification trees and ensemble learning. The authors explore ensemble classification methods' basic characteristics and explain the types of problems that can emerge in its application. Written by a team of noted experts in the field, the text is divided into two main sections. The first section outlines the theoretical underpinnings of the topic and the second section is designed to include examples of practical applications. The book contains a wealth of illustrative cases of business failure prediction, zoology, ecology and others. This vital guide: * Offers an important text that has been tested both in the classroom and at tutorials at conferences * Contains authoritative information written by leading experts in the field * Presents a comprehensive text that can be applied to courses in machine learning, data mining and artificial intelligence * Combines in one volume two of the most intriguing topics in machine learning: ensemble learning and classification trees Written for researchers from many fields such as biostatistics, economics, environment, zoology, as well as students of data mining and machine learning, Ensemble Classification Methods with Applications in R puts the focus on two topics in machine learning: classification trees and ensemble learning.Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
ESTEBAN ALFARO, MATÍAS GÁMEZ AND NOELIA GARCÍA are Associate Professors at the Applied Economics Department (Statistics), Faculty of Economics and Business of Albacete, and researchers at the Regional Development Institute (IDR), University of Castilla-La Mancha. Together they have published several papers in prestigious journals on topics such as applications of ensemble trees to corporate bankruptcy, credit scoring and statistical quality control with the most notable in Journal of Statistical Software, Vol 54.
Inhaltsangabe
List of Contributors ix List of Tables xi List of Figures xv Preface xvii 1 Introduction 1 Esteban Alfaro, Matías Gámez, and Noelia García 1.1 Introduction 1 1.2 Definition 1 1.3 Taxonomy of Supervised Classification Methods 2 1.4 Estimation of the Accuracy of a Classification System 3 1.4.1 The Apparent Error Rate 4 1.4.2 Estimation of the True Error Rate 4 1.4.3 Error Rate Estimation Methods 4 1.4.4 The Standard Error 6 1.5 Classification Trees 7 1.5.1 Classification Tree Building 8 1.5.2 Splitting Rule 9 1.5.3 Splitting Criteria 10 1.5.4 Goodness of a Split 10 1.5.5 The Impurity of a Tree 11 1.5.6 Stopping Criteria 11 1.5.7 Overfitting in Classification Trees 12 1.5.8 Pruning Rules 14 2 Limitation of the Individual Classifiers 19 Esteban Alfaro, Matías Gámez, and Noelia García 2.1 Introduction 19 2.2 Error Decomposition: Bias and Variance 20 2.3 Study of Classifier Instability 23 2.4 Advantages of Ensemble Classifiers 26 2.5 Bayesian Perspective of Ensemble Classifiers 28 3 Ensemble Classifiers Methods 31 Esteban Alfaro, Matías Gámez, and Noelia García 3.1 Introduction 31 3.2 Taxonomy of Ensemble Methods 32 3.2.1 Non-Generative Methods 33 3.2.2 Generative Methods 33 3.3 Bagging 34 3.4 Boosting 36 3.4.1 AdaBoost Training Error 40 3.4.2 AdaBoost and the MarginTheory 41 3.4.3 Other Boosting Versions 43 3.4.4 Comparing Bagging and Boosting 46 3.5 Random Forests 46 4 Classification with Individual and Ensemble Trees in R 51 Esteban Alfaro, Matías Gámez, and Noelia García 4.1 Introduction 51 4.2 adabag: An R Package for Classification with Boosting and Bagging 52 4.2.1 The bagging, predict.bagging, and bagging.cv Functions 56 4.2.2 The boosting, predict.boosting, and boosting.cv Functions 65 4.2.3 The margins, plot.margins, errorevol and plot.errorevol Functions 71 4.2.4 The MarginOrderedPruning.Bagging Function 75 4.3 The "German Credit" Example 79 4.3.1 Classification Tree 81 4.3.2 Combination using Bagging 85 4.3.3 Combination using Boosting 88 4.3.4 Combination using Random Forest 90 4.3.5 Cross-Validation Comparison 95 5 Bankruptcy Prediction Through Ensemble Trees 97 Esteban Alfaro, Matías Gámez, and Noelia García 5.1 Introduction 97 5.2 Problem Description 97 5.3 Applications 99 5.3.1 The Dichotomous Case 99 5.3.2 TheThree-Class Case 111 5.4 Conclusions 117 6 Experiments with Adabag in Biology Classification Tasks 119 M. Fernández-Delgado, E. Cernadas, and M. Pérez-Ortiz 6.1 Classification of Color Texture Feature Patterns Extracted From Cells in Histological Images of Fish Ovary 119 6.2 Direct Kernel Perceptron: Ultra-Fast Kernel ELM-Based Classification with Non-Iterative Closed-Form Weight Calculation 122 6.3 Do We Need Hundreds of Classifiers to Solve Real-World Classification Problems? 125 6.4 On the use of Nominal and Ordinal Classifiers for the Discrimination of Stages of Development in Fish Oocytes 129 7 Generalization Bounds for Ranking Algorithms 135 W. Rejchel 7.1 Introduction 135 7.2 Assumptions, Main Theorem, and Application 136 7.3 Experiments 138 7.4 Conclusions 139 8 Classification and Regression Trees for Analyzing Irrigation Decisions 141 S. Andriyas andM.McKee 8.1 Introduction 141 8.2 Theory 143 8.3 Case Study and Methods 144 8.3.1 Study Site and Data Available 144 8.3.2 Model, Specifications, and Performance Evaluation 146 8.4 Results and Discussion 147 8.5 Conclusions 153 9 Boosted Rule Learner and its Properties 155 M. Kubus 9.1 Introduction 155 9.2 Separate-and-Conquer 156 9.3 Boosting in Rule Induction 157 9.4 Experiments 158 9.5 Conclusions 161 10 Credit Scoring with Individuals and Ensemble Trees 163 M. Chrzanowska, E. Alfaro, and D.Witkowska 10.1 Introduction 163 10.2 Measures of Accuracy 164 10.3 Data Description 165 10.4 Classification of Borrowers Applying Ensemble Trees 168 10.5 Conclusions 173 11 An Overview of Multiple Classifier Systems Based on Generalized Additive Models 175 K.W. De Bock, K. Coussement, and D. Cielen 11.1 Introduction 175 11.2 Multiple Classifier Systems Based on GAMs 176 11.2.1 Generalized AdditiveModels 176 11.2.2 GAM-Based Multiple Classifier Systems 177 11.2.3 GAMensPlus: Extending GAMens for Advanced Interpretability 179 11.3 Experiments and Applications 180 11.3.1 A Multi-Domain Benchmark Study of GAM-Based Ensemble Classifiers 180 11.3.2 Benchmarking GAM-Based Ensemble Classifiers in Predictive Customer Analytics 181 11.3.3 A Case Study of GAMensPlus used for Customer Churn Prediction in Financial Services 183 11.4 Software Implementation in R: the GAMens Package 185 11.5 Conclusions 185 References 187 Index 197
List of Contributors ix List of Tables xi List of Figures xv Preface xvii 1 Introduction 1 Esteban Alfaro, Matías Gámez, and Noelia García 1.1 Introduction 1 1.2 Definition 1 1.3 Taxonomy of Supervised Classification Methods 2 1.4 Estimation of the Accuracy of a Classification System 3 1.4.1 The Apparent Error Rate 4 1.4.2 Estimation of the True Error Rate 4 1.4.3 Error Rate Estimation Methods 4 1.4.4 The Standard Error 6 1.5 Classification Trees 7 1.5.1 Classification Tree Building 8 1.5.2 Splitting Rule 9 1.5.3 Splitting Criteria 10 1.5.4 Goodness of a Split 10 1.5.5 The Impurity of a Tree 11 1.5.6 Stopping Criteria 11 1.5.7 Overfitting in Classification Trees 12 1.5.8 Pruning Rules 14 2 Limitation of the Individual Classifiers 19 Esteban Alfaro, Matías Gámez, and Noelia García 2.1 Introduction 19 2.2 Error Decomposition: Bias and Variance 20 2.3 Study of Classifier Instability 23 2.4 Advantages of Ensemble Classifiers 26 2.5 Bayesian Perspective of Ensemble Classifiers 28 3 Ensemble Classifiers Methods 31 Esteban Alfaro, Matías Gámez, and Noelia García 3.1 Introduction 31 3.2 Taxonomy of Ensemble Methods 32 3.2.1 Non-Generative Methods 33 3.2.2 Generative Methods 33 3.3 Bagging 34 3.4 Boosting 36 3.4.1 AdaBoost Training Error 40 3.4.2 AdaBoost and the MarginTheory 41 3.4.3 Other Boosting Versions 43 3.4.4 Comparing Bagging and Boosting 46 3.5 Random Forests 46 4 Classification with Individual and Ensemble Trees in R 51 Esteban Alfaro, Matías Gámez, and Noelia García 4.1 Introduction 51 4.2 adabag: An R Package for Classification with Boosting and Bagging 52 4.2.1 The bagging, predict.bagging, and bagging.cv Functions 56 4.2.2 The boosting, predict.boosting, and boosting.cv Functions 65 4.2.3 The margins, plot.margins, errorevol and plot.errorevol Functions 71 4.2.4 The MarginOrderedPruning.Bagging Function 75 4.3 The "German Credit" Example 79 4.3.1 Classification Tree 81 4.3.2 Combination using Bagging 85 4.3.3 Combination using Boosting 88 4.3.4 Combination using Random Forest 90 4.3.5 Cross-Validation Comparison 95 5 Bankruptcy Prediction Through Ensemble Trees 97 Esteban Alfaro, Matías Gámez, and Noelia García 5.1 Introduction 97 5.2 Problem Description 97 5.3 Applications 99 5.3.1 The Dichotomous Case 99 5.3.2 TheThree-Class Case 111 5.4 Conclusions 117 6 Experiments with Adabag in Biology Classification Tasks 119 M. Fernández-Delgado, E. Cernadas, and M. Pérez-Ortiz 6.1 Classification of Color Texture Feature Patterns Extracted From Cells in Histological Images of Fish Ovary 119 6.2 Direct Kernel Perceptron: Ultra-Fast Kernel ELM-Based Classification with Non-Iterative Closed-Form Weight Calculation 122 6.3 Do We Need Hundreds of Classifiers to Solve Real-World Classification Problems? 125 6.4 On the use of Nominal and Ordinal Classifiers for the Discrimination of Stages of Development in Fish Oocytes 129 7 Generalization Bounds for Ranking Algorithms 135 W. Rejchel 7.1 Introduction 135 7.2 Assumptions, Main Theorem, and Application 136 7.3 Experiments 138 7.4 Conclusions 139 8 Classification and Regression Trees for Analyzing Irrigation Decisions 141 S. Andriyas andM.McKee 8.1 Introduction 141 8.2 Theory 143 8.3 Case Study and Methods 144 8.3.1 Study Site and Data Available 144 8.3.2 Model, Specifications, and Performance Evaluation 146 8.4 Results and Discussion 147 8.5 Conclusions 153 9 Boosted Rule Learner and its Properties 155 M. Kubus 9.1 Introduction 155 9.2 Separate-and-Conquer 156 9.3 Boosting in Rule Induction 157 9.4 Experiments 158 9.5 Conclusions 161 10 Credit Scoring with Individuals and Ensemble Trees 163 M. Chrzanowska, E. Alfaro, and D.Witkowska 10.1 Introduction 163 10.2 Measures of Accuracy 164 10.3 Data Description 165 10.4 Classification of Borrowers Applying Ensemble Trees 168 10.5 Conclusions 173 11 An Overview of Multiple Classifier Systems Based on Generalized Additive Models 175 K.W. De Bock, K. Coussement, and D. Cielen 11.1 Introduction 175 11.2 Multiple Classifier Systems Based on GAMs 176 11.2.1 Generalized AdditiveModels 176 11.2.2 GAM-Based Multiple Classifier Systems 177 11.2.3 GAMensPlus: Extending GAMens for Advanced Interpretability 179 11.3 Experiments and Applications 180 11.3.1 A Multi-Domain Benchmark Study of GAM-Based Ensemble Classifiers 180 11.3.2 Benchmarking GAM-Based Ensemble Classifiers in Predictive Customer Analytics 181 11.3.3 A Case Study of GAMensPlus used for Customer Churn Prediction in Financial Services 183 11.4 Software Implementation in R: the GAMens Package 185 11.5 Conclusions 185 References 187 Index 197
Es gelten unsere Allgemeinen Geschäftsbedingungen: www.buecher.de/agb
Impressum
www.buecher.de ist ein Internetauftritt der buecher.de internetstores GmbH
Geschäftsführung: Monica Sawhney | Roland Kölbl | Günter Hilger
Sitz der Gesellschaft: Batheyer Straße 115 - 117, 58099 Hagen
Postanschrift: Bürgermeister-Wegele-Str. 12, 86167 Augsburg
Amtsgericht Hagen HRB 13257
Steuernummer: 321/5800/1497