- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Combined classifiers, which are central to the ubiquitous performance of pattern recognition and machine learning, are generally considered more accurate than single classifiers. In a didactic, detailed assessment, Combining Pattern Classifiers examines the basic theories and tactics of classifier combination while presenting the most recent research in the field. Among the pattern recognition tasks that this book explores are mail sorting, face recognition, signature verification, decoding brain fMRI images, identifying emotions, analyzing gene microarray data, and spotting patterns in…mehr
Andere Kunden interessierten sich auch für
- Wladyslaw HomendaPattern Recognition147,99 €
- Pradipta MajiRough-Fuzzy Pattern Recognition137,99 €
- Richard O. DudaPattern Classification204,99 €
- Jan FlusserMoments and Moment Invariants in Pattern Recognition135,99 €
- Amit KonarEmotion Recognition157,99 €
- Stephen TouVisualization of Fields and Applications in Engineering141,99 €
- Ulisses M. Braga NetoError Estimation for Pattern Recognition160,99 €
-
-
-
Combined classifiers, which are central to the ubiquitous performance of pattern recognition and machine learning, are generally considered more accurate than single classifiers. In a didactic, detailed assessment, Combining Pattern Classifiers examines the basic theories and tactics of classifier combination while presenting the most recent research in the field. Among the pattern recognition tasks that this book explores are mail sorting, face recognition, signature verification, decoding brain fMRI images, identifying emotions, analyzing gene microarray data, and spotting patterns in consumer preference. This updated second edition is equipped with the latest knowledge for academics, students, and practitioners involved in pattern recognition fields.
A unified, coherent treatment of current classifier ensemble methods, from fundamentals of pattern recognition to ensemble feature selection, now in its second edition
The art and science of combining pattern classifiers has flourished into a prolific discipline since the first edition of Combining Pattern Classifiers was published in 2004. Dr. Kuncheva has plucked from the rich landscape of recent classifier ensemble literature the topics, methods, and algorithms that will guide the reader toward a deeper understanding of the fundamentals, design, and applications of classifier ensemble methods.
Thoroughly updated, with MATLAB® code and practice data sets throughout, Combining Pattern Classifiers includes:
Coverage of Bayes decision theory and experimental comparison of classifiers
Essential ensemble methods such as Bagging, Random forest, AdaBoost, Random subspace, Rotation forest, Random oracle, and Error Correcting Output Code, among others
Chapters on classifier selection, diversity, and ensemble feature selection
With firm grounding in the fundamentals of pattern recognition, and featuring more than 140 illustrations, Combining Pattern Classifiers, Second Edition is a valuable reference for postgraduate students, researchers, and practitioners in computing and engineering.
A unified, coherent treatment of current classifier ensemble methods, from fundamentals of pattern recognition to ensemble feature selection, now in its second edition
The art and science of combining pattern classifiers has flourished into a prolific discipline since the first edition of Combining Pattern Classifiers was published in 2004. Dr. Kuncheva has plucked from the rich landscape of recent classifier ensemble literature the topics, methods, and algorithms that will guide the reader toward a deeper understanding of the fundamentals, design, and applications of classifier ensemble methods.
Thoroughly updated, with MATLAB® code and practice data sets throughout, Combining Pattern Classifiers includes:
Coverage of Bayes decision theory and experimental comparison of classifiers
Essential ensemble methods such as Bagging, Random forest, AdaBoost, Random subspace, Rotation forest, Random oracle, and Error Correcting Output Code, among others
Chapters on classifier selection, diversity, and ensemble feature selection
With firm grounding in the fundamentals of pattern recognition, and featuring more than 140 illustrations, Combining Pattern Classifiers, Second Edition is a valuable reference for postgraduate students, researchers, and practitioners in computing and engineering.
Produktdetails
- Produktdetails
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 1W118315230
- 2. Aufl.
- Seitenzahl: 384
- Erscheinungstermin: 9. September 2014
- Englisch
- Abmessung: 240mm x 161mm x 25mm
- Gewicht: 736g
- ISBN-13: 9781118315231
- ISBN-10: 1118315235
- Artikelnr.: 41239195
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 1W118315230
- 2. Aufl.
- Seitenzahl: 384
- Erscheinungstermin: 9. September 2014
- Englisch
- Abmessung: 240mm x 161mm x 25mm
- Gewicht: 736g
- ISBN-13: 9781118315231
- ISBN-10: 1118315235
- Artikelnr.: 41239195
Ludmila Kuncheva is a Professor of Computer Science at Bangor University, United Kingdom. She has received two IEEE Best Paper awards. In 2012, Dr. Kuncheva was awarded a Fellowship to the International Association for Pattern Recognition (IAPR) for her contributions to multiple classifier systems.
Preface xv Acknowledgements xxi 1 Fundamentals of Pattern Recognition 1 1.1
Basic Concepts: Class Feature Data Set 1 1.1.1 Classes and Class Labels 1
1.1.2 Features 2 1.1.3 Data Set 3 1.1.4 Generate Your Own Data 6 1.2
Classifier Discriminant Functions Classification Regions 9 1.3
Classification Error and Classification Accuracy 11 1.3.1 Where Does the
Error Come From? Bias and Variance 11 1.3.2 Estimation of the Error 13
1.3.3 Confusion Matrices and Loss Matrices 14 1.3.4 Training and Testing
Protocols 15 1.3.5 Overtraining and Peeking 17 1.4 Experimental Comparison
of Classifiers 19 1.4.1 Two Trained Classifiers and a Fixed Testing Set 20
1.4.2 Two Classifier Models and a Single Data Set 22 1.4.3 Two Classifier
Models and Multiple Data Sets 26 1.4.4 Multiple Classifier Models and
Multiple Data Sets 27 1.5 Bayes Decision Theory 30 1.5.1 Probabilistic
Framework 301.5.2 Discriminant Functions and Decision Boundaries 31 1.5.3
Bayes Error 33 1.6 Clustering and Feature Selection 35 1.6.1 Clustering 35
1.6.2 Feature Selection 37 1.7 Challenges of Real-Life Data 40 Appendix 41
1.A.1 Data Generation 41 1.A.2 Comparison of Classifiers 42 1.A.2.1 MATLAB
Functions for Comparing Classifiers 42 1.A.2.2 Critical Values for Wilcoxon
and Sign Test 45 1.A.3 Feature Selection 47 2 Base Classifiers 49 2.1
Linear and Quadratic Classifiers 49 2.1.1 Linear Discriminant Classifier 49
2.1.2 Nearest Mean Classifier 52 2.1.3 Quadratic Discriminant Classifier 52
2.1.4 Stability of LDC and QDC 53 2.2 Decision Tree Classifiers 55 2.2.1
Basics and Terminology 55 2.2.2 Training of Decision Tree Classifiers 57
2.2.3 Selection of the Feature for a Node 58 2.2.4 Stopping Criterion 60
2.2.5 Pruning of the Decision Tree 63 2.2.6 C4.5 and ID3 64 2.2.7
Instability of Decision Trees 64 2.2.8 Random Trees 65 2.3 The Nä?ve Bayes
Classifier 66 2.4 Neural Networks 68 2.4.1 Neurons 68 2.4.2 Rosenblatt's
Perceptron 70 2.4.3 Multi-Layer Perceptron 71 2.5 Support Vector Machines
73 2.5.1 Why Would It Work? 73 2.5.2 Classification Margins 74 2.5.3
Optimal Linear Boundary 76 2.5.4 Parameters and Classification Boundaries
of SVM 78 2.6 The k-Nearest Neighbor Classifier (k-nn) 80 2.7 Final Remarks
82 2.7.1 Simple or Complex Models? 82 2.7.2 The Triangle Diagram 83 2.7.3
Choosing a Base Classifier for Ensembles 85 Appendix 85 2.A.1 MATLAB Code
for the Fish Data 85 2.A.2 MATLAB Code for Individual Classifiers 86
2.A.2.1 Decision Tree 86 2.A.2.2 Nä?ve Bayes 89 2.A.2.3 Multi-Layer
Perceptron 90 2.A.2.4 1-nn Classifier 92 3 An Overview of the Field 94 3.1
Philosophy 94 3.2 Two Examples 98 3.2.1 The Wisdom of the "Classifier
Crowd" 98 3.2.2 The Power of Divide-and-Conquer 98 3.3 Structure of the
Area 100 3.3.1 Terminology 100 3.3.2 A Taxonomy of Classifier Ensemble
Methods 100 3.3.3 Classifier Fusion and Classifier Selection 104 3.4 Quo
Vadis? 105 3.4.1 Reinventing the Wheel? 105 3.4.2 The Illusion of Progress?
106 3.4.3 A Bibliometric Snapshot 107 4 Combining Label Outputs 111 4.1
Types of Classifier Outputs 111 4.2 A Probabilistic Framework for Combining
Label Outputs 112 4.3 Majority Vote 113 4.3.1 "Democracy" in Classifier
Combination 113 4.3.2 Accuracy of the Majority Vote 114 4.3.3 Limits on the
Majority Vote Accuracy: An Example 117 4.3.4 Patterns of Success and
Failure 119 4.3.5 Optimality of the Majority Vote Combiner 124 4.4 Weighted
Majority Vote 125 4.4.1 Two Examples 126 4.4.2 Optimality of the Weighted
Majority Vote Combiner 127 4.5 Nä?ve-Bayes Combiner 128 4.5.1 Optimality of
the Nä?ve Bayes Combiner 128 4.5.2 Implementation of the NB Combiner 130
4.6 Multinomial Methods 132 4.7 Comparison of Combination Methods for Label
Outputs 135 Appendix 137 4.A.1 Matan's Proof for the Limits on the Majority
Vote Accuracy 137 4.A.2 Selected MATLAB Code 139 5 Combining
Continuous-Valued Outputs 143 5.1 Decision Profile 143 5.2 How Do We Get
Probability Outputs? 144 5.2.1 Probabilities Based on Discriminant Scores
144 5.2.2 Probabilities Based on Counts: Laplace Estimator 147 5.3
Nontrainable (Fixed) Combination Rules 150 5.3.1 A Generic Formulation 150
5.3.2 Equivalence of Simple Combination Rules 152 5.3.3 Generalized Mean
Combiner 153 5.3.4 A Theoretical Comparison of Simple Combiners 156 5.3.5
Where Do They Come From? 160 5.4 The Weighted Average (Linear Combiner) 166
5.4.1 Consensus Theory 166 5.4.2 Added Error for the Weighted Mean
Combination 167 5.4.3 Linear Regression 168 5.5 A Classifier as a Combiner
172 5.5.1 The Supra Bayesian Approach 172 5.5.2 Decision Templates 173
5.5.3 A Linear Classifier 175 5.6 An Example of Nine Combiners for
Continuous-Valued Outputs 175 5.7 To Train or Not to Train? 176 Appendix
178 5.A.1 Theoretical Classification Error for the Simple Combiners 178
5.A.1.1 Set-up and Assumptions 178 5.A.1.2 Individual Error 180 5.A.1.3
Minimum and Maximum 180 5.A.1.4 Average (Sum) 181 5.A.1.5 Median and
Majority Vote 182 5.A.1.6 Oracle 183 5.A.2 Selected MATLAB Code 183 6
Ensemble Methods 186 6.1 Bagging 186 6.1.1 The Origins: Bagging Predictors
186 6.1.2 Why Does Bagging Work? 187 6.1.3 Out-of-bag Estimates 189 6.1.4
Variants of Bagging 190 6.2 Random Forests 190 6.3 AdaBoost 192 6.3.1 The
AdaBoost Algorithm 192 6.3.2 The arc-x4 Algorithm 194 6.3.3 Why Does
AdaBoost Work? 195 6.3.4 Variants of Boosting 199 6.3.5 A Famous
Application: AdaBoost for Face Detection 199 6.4 Random Subspace Ensembles
203 6.5 Rotation Forest 204 6.6 Random Linear Oracle 208 6.7 Error
Correcting Output Codes (ECOC) 211 6.7.1 Code Designs 212 6.7.2 Decoding
214 6.7.3 Ensembles of Nested Dichotomies 216 Appendix 218 6.A.1 Bagging
218 6.A.2 AdaBoost 220 6.A.3 Random Subspace 223 6.A.4 Rotation Forest 225
6.A.5 Random Linear Oracle 228 6.A.6 ECOC 229 7 Classifier Selection 230
7.1 Preliminaries 230 7.2 Why Classifier Selection Works 231 7.3 Estimating
Local Competence Dynamically 233 7.3.1 Decision-Independent Estimates 233
7.3.2 Decision-Dependent Estimates 238 7.4 Pre-Estimation of the Competence
Regions 239 7.4.1 Bespoke Classifiers 240 7.4.2 Clustering and Selection
241 7.5 Simultaneous Training of Regions and Classifiers 242 7.6 Cascade
Classifiers 244 Appendix: Selected MATLAB Code 244 7.A.1 Banana Data 244
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data
245 8 Diversity in Classifier Ensembles 247 8.1 What Is Diversity? 247
8.1.1 Diversity for a Point-Value Estimate 248 8.1.2 Diversity in Software
Engineering 248 8.1.3 Statistical Measures of Relationship 249 8.2
Measuring Diversity in Classifier Ensembles 250 8.2.1 Pairwise Measures 250
8.2.2 Nonpairwise Measures 251 8.3 Relationship Between Diversity and
Accuracy 256 8.3.1 An Example 256 8.3.2 Relationship Patterns 258 8.3.3 A
Caveat: Independent Outputs
Independent Errors 262 8.3.4 Independence Is
Not the Best Scenario 265 8.3.5 Diversity and Ensemble Margins 267 8.4
Using Diversity 270 8.4.1 Diversity for Finding Bounds and Theoretical
Relationships 270 8.4.2 Kappa-error Diagrams and Ensemble Maps 271 8.4.3
Overproduce and Select 275 8.5 Conclusions: Diversity of Diversity 279
Appendix 280 8.A.1 Derivation of Diversity Measures for Oracle Outputs 280
8.A.1.1 Correlation 280 8.A.1.2 Interrater Agreement 281 8.A.2 Diversity
Measure Equivalence 282 8.A.3 Independent Outputs
Independent Errors 284
8.A.4 A Bound on the Kappa-Error Diagram 286 8.A.5 Calculation of the
Pareto Frontier 287 9 Ensemble Feature Selection 290 9.1 Preliminaries 290
9.1.1 Right and Wrong Protocols 290 9.1.2 Ensemble Feature Selection
Approaches 294 9.1.3 Natural Grouping 294 9.2 Ranking by Decision Tree
Ensembles 295 9.2.1 Simple Count and Split Criterion 295 9.2.2 Permuted
Features or the "Noised-up" Method 297 9.3 Ensembles of Rankers 299 9.3.1
The Approach 299 9.3.2 Ranking Methods (Criteria) 300 9.4 Random Feature
Selection for the Ensemble 305 9.4.1 Random Subspace Revisited 305 9.4.2
Usability Coverage and Feature Diversity 306 9.4.3 Genetic Algorithms 312
9.5 Nonrandom Selection 315 9.5.1 The "Favorite Class" Model 315 9.5.2 The
Iterative Model 315 9.5.3 The Incremental Model 316 9.6 A Stability Index
317 9.6.1 Consistency Between a Pair of Subsets 317 9.6.2 A Stability Index
for K Sequences 319 9.6.3 An Example of Applying the Stability Index 320
Appendix 322 9.A.1 MATLAB Code for the Numerical Example of Ensemble
Ranking 322 9.A.2 MATLAB GA Nuggets 322 9.A.3 MATLAB Code for the Stability
Index 324 10 A Final Thought 326 References 327 Index 353
Basic Concepts: Class Feature Data Set 1 1.1.1 Classes and Class Labels 1
1.1.2 Features 2 1.1.3 Data Set 3 1.1.4 Generate Your Own Data 6 1.2
Classifier Discriminant Functions Classification Regions 9 1.3
Classification Error and Classification Accuracy 11 1.3.1 Where Does the
Error Come From? Bias and Variance 11 1.3.2 Estimation of the Error 13
1.3.3 Confusion Matrices and Loss Matrices 14 1.3.4 Training and Testing
Protocols 15 1.3.5 Overtraining and Peeking 17 1.4 Experimental Comparison
of Classifiers 19 1.4.1 Two Trained Classifiers and a Fixed Testing Set 20
1.4.2 Two Classifier Models and a Single Data Set 22 1.4.3 Two Classifier
Models and Multiple Data Sets 26 1.4.4 Multiple Classifier Models and
Multiple Data Sets 27 1.5 Bayes Decision Theory 30 1.5.1 Probabilistic
Framework 301.5.2 Discriminant Functions and Decision Boundaries 31 1.5.3
Bayes Error 33 1.6 Clustering and Feature Selection 35 1.6.1 Clustering 35
1.6.2 Feature Selection 37 1.7 Challenges of Real-Life Data 40 Appendix 41
1.A.1 Data Generation 41 1.A.2 Comparison of Classifiers 42 1.A.2.1 MATLAB
Functions for Comparing Classifiers 42 1.A.2.2 Critical Values for Wilcoxon
and Sign Test 45 1.A.3 Feature Selection 47 2 Base Classifiers 49 2.1
Linear and Quadratic Classifiers 49 2.1.1 Linear Discriminant Classifier 49
2.1.2 Nearest Mean Classifier 52 2.1.3 Quadratic Discriminant Classifier 52
2.1.4 Stability of LDC and QDC 53 2.2 Decision Tree Classifiers 55 2.2.1
Basics and Terminology 55 2.2.2 Training of Decision Tree Classifiers 57
2.2.3 Selection of the Feature for a Node 58 2.2.4 Stopping Criterion 60
2.2.5 Pruning of the Decision Tree 63 2.2.6 C4.5 and ID3 64 2.2.7
Instability of Decision Trees 64 2.2.8 Random Trees 65 2.3 The Nä?ve Bayes
Classifier 66 2.4 Neural Networks 68 2.4.1 Neurons 68 2.4.2 Rosenblatt's
Perceptron 70 2.4.3 Multi-Layer Perceptron 71 2.5 Support Vector Machines
73 2.5.1 Why Would It Work? 73 2.5.2 Classification Margins 74 2.5.3
Optimal Linear Boundary 76 2.5.4 Parameters and Classification Boundaries
of SVM 78 2.6 The k-Nearest Neighbor Classifier (k-nn) 80 2.7 Final Remarks
82 2.7.1 Simple or Complex Models? 82 2.7.2 The Triangle Diagram 83 2.7.3
Choosing a Base Classifier for Ensembles 85 Appendix 85 2.A.1 MATLAB Code
for the Fish Data 85 2.A.2 MATLAB Code for Individual Classifiers 86
2.A.2.1 Decision Tree 86 2.A.2.2 Nä?ve Bayes 89 2.A.2.3 Multi-Layer
Perceptron 90 2.A.2.4 1-nn Classifier 92 3 An Overview of the Field 94 3.1
Philosophy 94 3.2 Two Examples 98 3.2.1 The Wisdom of the "Classifier
Crowd" 98 3.2.2 The Power of Divide-and-Conquer 98 3.3 Structure of the
Area 100 3.3.1 Terminology 100 3.3.2 A Taxonomy of Classifier Ensemble
Methods 100 3.3.3 Classifier Fusion and Classifier Selection 104 3.4 Quo
Vadis? 105 3.4.1 Reinventing the Wheel? 105 3.4.2 The Illusion of Progress?
106 3.4.3 A Bibliometric Snapshot 107 4 Combining Label Outputs 111 4.1
Types of Classifier Outputs 111 4.2 A Probabilistic Framework for Combining
Label Outputs 112 4.3 Majority Vote 113 4.3.1 "Democracy" in Classifier
Combination 113 4.3.2 Accuracy of the Majority Vote 114 4.3.3 Limits on the
Majority Vote Accuracy: An Example 117 4.3.4 Patterns of Success and
Failure 119 4.3.5 Optimality of the Majority Vote Combiner 124 4.4 Weighted
Majority Vote 125 4.4.1 Two Examples 126 4.4.2 Optimality of the Weighted
Majority Vote Combiner 127 4.5 Nä?ve-Bayes Combiner 128 4.5.1 Optimality of
the Nä?ve Bayes Combiner 128 4.5.2 Implementation of the NB Combiner 130
4.6 Multinomial Methods 132 4.7 Comparison of Combination Methods for Label
Outputs 135 Appendix 137 4.A.1 Matan's Proof for the Limits on the Majority
Vote Accuracy 137 4.A.2 Selected MATLAB Code 139 5 Combining
Continuous-Valued Outputs 143 5.1 Decision Profile 143 5.2 How Do We Get
Probability Outputs? 144 5.2.1 Probabilities Based on Discriminant Scores
144 5.2.2 Probabilities Based on Counts: Laplace Estimator 147 5.3
Nontrainable (Fixed) Combination Rules 150 5.3.1 A Generic Formulation 150
5.3.2 Equivalence of Simple Combination Rules 152 5.3.3 Generalized Mean
Combiner 153 5.3.4 A Theoretical Comparison of Simple Combiners 156 5.3.5
Where Do They Come From? 160 5.4 The Weighted Average (Linear Combiner) 166
5.4.1 Consensus Theory 166 5.4.2 Added Error for the Weighted Mean
Combination 167 5.4.3 Linear Regression 168 5.5 A Classifier as a Combiner
172 5.5.1 The Supra Bayesian Approach 172 5.5.2 Decision Templates 173
5.5.3 A Linear Classifier 175 5.6 An Example of Nine Combiners for
Continuous-Valued Outputs 175 5.7 To Train or Not to Train? 176 Appendix
178 5.A.1 Theoretical Classification Error for the Simple Combiners 178
5.A.1.1 Set-up and Assumptions 178 5.A.1.2 Individual Error 180 5.A.1.3
Minimum and Maximum 180 5.A.1.4 Average (Sum) 181 5.A.1.5 Median and
Majority Vote 182 5.A.1.6 Oracle 183 5.A.2 Selected MATLAB Code 183 6
Ensemble Methods 186 6.1 Bagging 186 6.1.1 The Origins: Bagging Predictors
186 6.1.2 Why Does Bagging Work? 187 6.1.3 Out-of-bag Estimates 189 6.1.4
Variants of Bagging 190 6.2 Random Forests 190 6.3 AdaBoost 192 6.3.1 The
AdaBoost Algorithm 192 6.3.2 The arc-x4 Algorithm 194 6.3.3 Why Does
AdaBoost Work? 195 6.3.4 Variants of Boosting 199 6.3.5 A Famous
Application: AdaBoost for Face Detection 199 6.4 Random Subspace Ensembles
203 6.5 Rotation Forest 204 6.6 Random Linear Oracle 208 6.7 Error
Correcting Output Codes (ECOC) 211 6.7.1 Code Designs 212 6.7.2 Decoding
214 6.7.3 Ensembles of Nested Dichotomies 216 Appendix 218 6.A.1 Bagging
218 6.A.2 AdaBoost 220 6.A.3 Random Subspace 223 6.A.4 Rotation Forest 225
6.A.5 Random Linear Oracle 228 6.A.6 ECOC 229 7 Classifier Selection 230
7.1 Preliminaries 230 7.2 Why Classifier Selection Works 231 7.3 Estimating
Local Competence Dynamically 233 7.3.1 Decision-Independent Estimates 233
7.3.2 Decision-Dependent Estimates 238 7.4 Pre-Estimation of the Competence
Regions 239 7.4.1 Bespoke Classifiers 240 7.4.2 Clustering and Selection
241 7.5 Simultaneous Training of Regions and Classifiers 242 7.6 Cascade
Classifiers 244 Appendix: Selected MATLAB Code 244 7.A.1 Banana Data 244
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data
245 8 Diversity in Classifier Ensembles 247 8.1 What Is Diversity? 247
8.1.1 Diversity for a Point-Value Estimate 248 8.1.2 Diversity in Software
Engineering 248 8.1.3 Statistical Measures of Relationship 249 8.2
Measuring Diversity in Classifier Ensembles 250 8.2.1 Pairwise Measures 250
8.2.2 Nonpairwise Measures 251 8.3 Relationship Between Diversity and
Accuracy 256 8.3.1 An Example 256 8.3.2 Relationship Patterns 258 8.3.3 A
Caveat: Independent Outputs
Independent Errors 262 8.3.4 Independence Is
Not the Best Scenario 265 8.3.5 Diversity and Ensemble Margins 267 8.4
Using Diversity 270 8.4.1 Diversity for Finding Bounds and Theoretical
Relationships 270 8.4.2 Kappa-error Diagrams and Ensemble Maps 271 8.4.3
Overproduce and Select 275 8.5 Conclusions: Diversity of Diversity 279
Appendix 280 8.A.1 Derivation of Diversity Measures for Oracle Outputs 280
8.A.1.1 Correlation 280 8.A.1.2 Interrater Agreement 281 8.A.2 Diversity
Measure Equivalence 282 8.A.3 Independent Outputs
Independent Errors 284
8.A.4 A Bound on the Kappa-Error Diagram 286 8.A.5 Calculation of the
Pareto Frontier 287 9 Ensemble Feature Selection 290 9.1 Preliminaries 290
9.1.1 Right and Wrong Protocols 290 9.1.2 Ensemble Feature Selection
Approaches 294 9.1.3 Natural Grouping 294 9.2 Ranking by Decision Tree
Ensembles 295 9.2.1 Simple Count and Split Criterion 295 9.2.2 Permuted
Features or the "Noised-up" Method 297 9.3 Ensembles of Rankers 299 9.3.1
The Approach 299 9.3.2 Ranking Methods (Criteria) 300 9.4 Random Feature
Selection for the Ensemble 305 9.4.1 Random Subspace Revisited 305 9.4.2
Usability Coverage and Feature Diversity 306 9.4.3 Genetic Algorithms 312
9.5 Nonrandom Selection 315 9.5.1 The "Favorite Class" Model 315 9.5.2 The
Iterative Model 315 9.5.3 The Incremental Model 316 9.6 A Stability Index
317 9.6.1 Consistency Between a Pair of Subsets 317 9.6.2 A Stability Index
for K Sequences 319 9.6.3 An Example of Applying the Stability Index 320
Appendix 322 9.A.1 MATLAB Code for the Numerical Example of Ensemble
Ranking 322 9.A.2 MATLAB GA Nuggets 322 9.A.3 MATLAB Code for the Stability
Index 324 10 A Final Thought 326 References 327 Index 353
Preface xv Acknowledgements xxi 1 Fundamentals of Pattern Recognition 1 1.1
Basic Concepts: Class Feature Data Set 1 1.1.1 Classes and Class Labels 1
1.1.2 Features 2 1.1.3 Data Set 3 1.1.4 Generate Your Own Data 6 1.2
Classifier Discriminant Functions Classification Regions 9 1.3
Classification Error and Classification Accuracy 11 1.3.1 Where Does the
Error Come From? Bias and Variance 11 1.3.2 Estimation of the Error 13
1.3.3 Confusion Matrices and Loss Matrices 14 1.3.4 Training and Testing
Protocols 15 1.3.5 Overtraining and Peeking 17 1.4 Experimental Comparison
of Classifiers 19 1.4.1 Two Trained Classifiers and a Fixed Testing Set 20
1.4.2 Two Classifier Models and a Single Data Set 22 1.4.3 Two Classifier
Models and Multiple Data Sets 26 1.4.4 Multiple Classifier Models and
Multiple Data Sets 27 1.5 Bayes Decision Theory 30 1.5.1 Probabilistic
Framework 301.5.2 Discriminant Functions and Decision Boundaries 31 1.5.3
Bayes Error 33 1.6 Clustering and Feature Selection 35 1.6.1 Clustering 35
1.6.2 Feature Selection 37 1.7 Challenges of Real-Life Data 40 Appendix 41
1.A.1 Data Generation 41 1.A.2 Comparison of Classifiers 42 1.A.2.1 MATLAB
Functions for Comparing Classifiers 42 1.A.2.2 Critical Values for Wilcoxon
and Sign Test 45 1.A.3 Feature Selection 47 2 Base Classifiers 49 2.1
Linear and Quadratic Classifiers 49 2.1.1 Linear Discriminant Classifier 49
2.1.2 Nearest Mean Classifier 52 2.1.3 Quadratic Discriminant Classifier 52
2.1.4 Stability of LDC and QDC 53 2.2 Decision Tree Classifiers 55 2.2.1
Basics and Terminology 55 2.2.2 Training of Decision Tree Classifiers 57
2.2.3 Selection of the Feature for a Node 58 2.2.4 Stopping Criterion 60
2.2.5 Pruning of the Decision Tree 63 2.2.6 C4.5 and ID3 64 2.2.7
Instability of Decision Trees 64 2.2.8 Random Trees 65 2.3 The Nä?ve Bayes
Classifier 66 2.4 Neural Networks 68 2.4.1 Neurons 68 2.4.2 Rosenblatt's
Perceptron 70 2.4.3 Multi-Layer Perceptron 71 2.5 Support Vector Machines
73 2.5.1 Why Would It Work? 73 2.5.2 Classification Margins 74 2.5.3
Optimal Linear Boundary 76 2.5.4 Parameters and Classification Boundaries
of SVM 78 2.6 The k-Nearest Neighbor Classifier (k-nn) 80 2.7 Final Remarks
82 2.7.1 Simple or Complex Models? 82 2.7.2 The Triangle Diagram 83 2.7.3
Choosing a Base Classifier for Ensembles 85 Appendix 85 2.A.1 MATLAB Code
for the Fish Data 85 2.A.2 MATLAB Code for Individual Classifiers 86
2.A.2.1 Decision Tree 86 2.A.2.2 Nä?ve Bayes 89 2.A.2.3 Multi-Layer
Perceptron 90 2.A.2.4 1-nn Classifier 92 3 An Overview of the Field 94 3.1
Philosophy 94 3.2 Two Examples 98 3.2.1 The Wisdom of the "Classifier
Crowd" 98 3.2.2 The Power of Divide-and-Conquer 98 3.3 Structure of the
Area 100 3.3.1 Terminology 100 3.3.2 A Taxonomy of Classifier Ensemble
Methods 100 3.3.3 Classifier Fusion and Classifier Selection 104 3.4 Quo
Vadis? 105 3.4.1 Reinventing the Wheel? 105 3.4.2 The Illusion of Progress?
106 3.4.3 A Bibliometric Snapshot 107 4 Combining Label Outputs 111 4.1
Types of Classifier Outputs 111 4.2 A Probabilistic Framework for Combining
Label Outputs 112 4.3 Majority Vote 113 4.3.1 "Democracy" in Classifier
Combination 113 4.3.2 Accuracy of the Majority Vote 114 4.3.3 Limits on the
Majority Vote Accuracy: An Example 117 4.3.4 Patterns of Success and
Failure 119 4.3.5 Optimality of the Majority Vote Combiner 124 4.4 Weighted
Majority Vote 125 4.4.1 Two Examples 126 4.4.2 Optimality of the Weighted
Majority Vote Combiner 127 4.5 Nä?ve-Bayes Combiner 128 4.5.1 Optimality of
the Nä?ve Bayes Combiner 128 4.5.2 Implementation of the NB Combiner 130
4.6 Multinomial Methods 132 4.7 Comparison of Combination Methods for Label
Outputs 135 Appendix 137 4.A.1 Matan's Proof for the Limits on the Majority
Vote Accuracy 137 4.A.2 Selected MATLAB Code 139 5 Combining
Continuous-Valued Outputs 143 5.1 Decision Profile 143 5.2 How Do We Get
Probability Outputs? 144 5.2.1 Probabilities Based on Discriminant Scores
144 5.2.2 Probabilities Based on Counts: Laplace Estimator 147 5.3
Nontrainable (Fixed) Combination Rules 150 5.3.1 A Generic Formulation 150
5.3.2 Equivalence of Simple Combination Rules 152 5.3.3 Generalized Mean
Combiner 153 5.3.4 A Theoretical Comparison of Simple Combiners 156 5.3.5
Where Do They Come From? 160 5.4 The Weighted Average (Linear Combiner) 166
5.4.1 Consensus Theory 166 5.4.2 Added Error for the Weighted Mean
Combination 167 5.4.3 Linear Regression 168 5.5 A Classifier as a Combiner
172 5.5.1 The Supra Bayesian Approach 172 5.5.2 Decision Templates 173
5.5.3 A Linear Classifier 175 5.6 An Example of Nine Combiners for
Continuous-Valued Outputs 175 5.7 To Train or Not to Train? 176 Appendix
178 5.A.1 Theoretical Classification Error for the Simple Combiners 178
5.A.1.1 Set-up and Assumptions 178 5.A.1.2 Individual Error 180 5.A.1.3
Minimum and Maximum 180 5.A.1.4 Average (Sum) 181 5.A.1.5 Median and
Majority Vote 182 5.A.1.6 Oracle 183 5.A.2 Selected MATLAB Code 183 6
Ensemble Methods 186 6.1 Bagging 186 6.1.1 The Origins: Bagging Predictors
186 6.1.2 Why Does Bagging Work? 187 6.1.3 Out-of-bag Estimates 189 6.1.4
Variants of Bagging 190 6.2 Random Forests 190 6.3 AdaBoost 192 6.3.1 The
AdaBoost Algorithm 192 6.3.2 The arc-x4 Algorithm 194 6.3.3 Why Does
AdaBoost Work? 195 6.3.4 Variants of Boosting 199 6.3.5 A Famous
Application: AdaBoost for Face Detection 199 6.4 Random Subspace Ensembles
203 6.5 Rotation Forest 204 6.6 Random Linear Oracle 208 6.7 Error
Correcting Output Codes (ECOC) 211 6.7.1 Code Designs 212 6.7.2 Decoding
214 6.7.3 Ensembles of Nested Dichotomies 216 Appendix 218 6.A.1 Bagging
218 6.A.2 AdaBoost 220 6.A.3 Random Subspace 223 6.A.4 Rotation Forest 225
6.A.5 Random Linear Oracle 228 6.A.6 ECOC 229 7 Classifier Selection 230
7.1 Preliminaries 230 7.2 Why Classifier Selection Works 231 7.3 Estimating
Local Competence Dynamically 233 7.3.1 Decision-Independent Estimates 233
7.3.2 Decision-Dependent Estimates 238 7.4 Pre-Estimation of the Competence
Regions 239 7.4.1 Bespoke Classifiers 240 7.4.2 Clustering and Selection
241 7.5 Simultaneous Training of Regions and Classifiers 242 7.6 Cascade
Classifiers 244 Appendix: Selected MATLAB Code 244 7.A.1 Banana Data 244
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data
245 8 Diversity in Classifier Ensembles 247 8.1 What Is Diversity? 247
8.1.1 Diversity for a Point-Value Estimate 248 8.1.2 Diversity in Software
Engineering 248 8.1.3 Statistical Measures of Relationship 249 8.2
Measuring Diversity in Classifier Ensembles 250 8.2.1 Pairwise Measures 250
8.2.2 Nonpairwise Measures 251 8.3 Relationship Between Diversity and
Accuracy 256 8.3.1 An Example 256 8.3.2 Relationship Patterns 258 8.3.3 A
Caveat: Independent Outputs
Independent Errors 262 8.3.4 Independence Is
Not the Best Scenario 265 8.3.5 Diversity and Ensemble Margins 267 8.4
Using Diversity 270 8.4.1 Diversity for Finding Bounds and Theoretical
Relationships 270 8.4.2 Kappa-error Diagrams and Ensemble Maps 271 8.4.3
Overproduce and Select 275 8.5 Conclusions: Diversity of Diversity 279
Appendix 280 8.A.1 Derivation of Diversity Measures for Oracle Outputs 280
8.A.1.1 Correlation 280 8.A.1.2 Interrater Agreement 281 8.A.2 Diversity
Measure Equivalence 282 8.A.3 Independent Outputs
Independent Errors 284
8.A.4 A Bound on the Kappa-Error Diagram 286 8.A.5 Calculation of the
Pareto Frontier 287 9 Ensemble Feature Selection 290 9.1 Preliminaries 290
9.1.1 Right and Wrong Protocols 290 9.1.2 Ensemble Feature Selection
Approaches 294 9.1.3 Natural Grouping 294 9.2 Ranking by Decision Tree
Ensembles 295 9.2.1 Simple Count and Split Criterion 295 9.2.2 Permuted
Features or the "Noised-up" Method 297 9.3 Ensembles of Rankers 299 9.3.1
The Approach 299 9.3.2 Ranking Methods (Criteria) 300 9.4 Random Feature
Selection for the Ensemble 305 9.4.1 Random Subspace Revisited 305 9.4.2
Usability Coverage and Feature Diversity 306 9.4.3 Genetic Algorithms 312
9.5 Nonrandom Selection 315 9.5.1 The "Favorite Class" Model 315 9.5.2 The
Iterative Model 315 9.5.3 The Incremental Model 316 9.6 A Stability Index
317 9.6.1 Consistency Between a Pair of Subsets 317 9.6.2 A Stability Index
for K Sequences 319 9.6.3 An Example of Applying the Stability Index 320
Appendix 322 9.A.1 MATLAB Code for the Numerical Example of Ensemble
Ranking 322 9.A.2 MATLAB GA Nuggets 322 9.A.3 MATLAB Code for the Stability
Index 324 10 A Final Thought 326 References 327 Index 353
Basic Concepts: Class Feature Data Set 1 1.1.1 Classes and Class Labels 1
1.1.2 Features 2 1.1.3 Data Set 3 1.1.4 Generate Your Own Data 6 1.2
Classifier Discriminant Functions Classification Regions 9 1.3
Classification Error and Classification Accuracy 11 1.3.1 Where Does the
Error Come From? Bias and Variance 11 1.3.2 Estimation of the Error 13
1.3.3 Confusion Matrices and Loss Matrices 14 1.3.4 Training and Testing
Protocols 15 1.3.5 Overtraining and Peeking 17 1.4 Experimental Comparison
of Classifiers 19 1.4.1 Two Trained Classifiers and a Fixed Testing Set 20
1.4.2 Two Classifier Models and a Single Data Set 22 1.4.3 Two Classifier
Models and Multiple Data Sets 26 1.4.4 Multiple Classifier Models and
Multiple Data Sets 27 1.5 Bayes Decision Theory 30 1.5.1 Probabilistic
Framework 301.5.2 Discriminant Functions and Decision Boundaries 31 1.5.3
Bayes Error 33 1.6 Clustering and Feature Selection 35 1.6.1 Clustering 35
1.6.2 Feature Selection 37 1.7 Challenges of Real-Life Data 40 Appendix 41
1.A.1 Data Generation 41 1.A.2 Comparison of Classifiers 42 1.A.2.1 MATLAB
Functions for Comparing Classifiers 42 1.A.2.2 Critical Values for Wilcoxon
and Sign Test 45 1.A.3 Feature Selection 47 2 Base Classifiers 49 2.1
Linear and Quadratic Classifiers 49 2.1.1 Linear Discriminant Classifier 49
2.1.2 Nearest Mean Classifier 52 2.1.3 Quadratic Discriminant Classifier 52
2.1.4 Stability of LDC and QDC 53 2.2 Decision Tree Classifiers 55 2.2.1
Basics and Terminology 55 2.2.2 Training of Decision Tree Classifiers 57
2.2.3 Selection of the Feature for a Node 58 2.2.4 Stopping Criterion 60
2.2.5 Pruning of the Decision Tree 63 2.2.6 C4.5 and ID3 64 2.2.7
Instability of Decision Trees 64 2.2.8 Random Trees 65 2.3 The Nä?ve Bayes
Classifier 66 2.4 Neural Networks 68 2.4.1 Neurons 68 2.4.2 Rosenblatt's
Perceptron 70 2.4.3 Multi-Layer Perceptron 71 2.5 Support Vector Machines
73 2.5.1 Why Would It Work? 73 2.5.2 Classification Margins 74 2.5.3
Optimal Linear Boundary 76 2.5.4 Parameters and Classification Boundaries
of SVM 78 2.6 The k-Nearest Neighbor Classifier (k-nn) 80 2.7 Final Remarks
82 2.7.1 Simple or Complex Models? 82 2.7.2 The Triangle Diagram 83 2.7.3
Choosing a Base Classifier for Ensembles 85 Appendix 85 2.A.1 MATLAB Code
for the Fish Data 85 2.A.2 MATLAB Code for Individual Classifiers 86
2.A.2.1 Decision Tree 86 2.A.2.2 Nä?ve Bayes 89 2.A.2.3 Multi-Layer
Perceptron 90 2.A.2.4 1-nn Classifier 92 3 An Overview of the Field 94 3.1
Philosophy 94 3.2 Two Examples 98 3.2.1 The Wisdom of the "Classifier
Crowd" 98 3.2.2 The Power of Divide-and-Conquer 98 3.3 Structure of the
Area 100 3.3.1 Terminology 100 3.3.2 A Taxonomy of Classifier Ensemble
Methods 100 3.3.3 Classifier Fusion and Classifier Selection 104 3.4 Quo
Vadis? 105 3.4.1 Reinventing the Wheel? 105 3.4.2 The Illusion of Progress?
106 3.4.3 A Bibliometric Snapshot 107 4 Combining Label Outputs 111 4.1
Types of Classifier Outputs 111 4.2 A Probabilistic Framework for Combining
Label Outputs 112 4.3 Majority Vote 113 4.3.1 "Democracy" in Classifier
Combination 113 4.3.2 Accuracy of the Majority Vote 114 4.3.3 Limits on the
Majority Vote Accuracy: An Example 117 4.3.4 Patterns of Success and
Failure 119 4.3.5 Optimality of the Majority Vote Combiner 124 4.4 Weighted
Majority Vote 125 4.4.1 Two Examples 126 4.4.2 Optimality of the Weighted
Majority Vote Combiner 127 4.5 Nä?ve-Bayes Combiner 128 4.5.1 Optimality of
the Nä?ve Bayes Combiner 128 4.5.2 Implementation of the NB Combiner 130
4.6 Multinomial Methods 132 4.7 Comparison of Combination Methods for Label
Outputs 135 Appendix 137 4.A.1 Matan's Proof for the Limits on the Majority
Vote Accuracy 137 4.A.2 Selected MATLAB Code 139 5 Combining
Continuous-Valued Outputs 143 5.1 Decision Profile 143 5.2 How Do We Get
Probability Outputs? 144 5.2.1 Probabilities Based on Discriminant Scores
144 5.2.2 Probabilities Based on Counts: Laplace Estimator 147 5.3
Nontrainable (Fixed) Combination Rules 150 5.3.1 A Generic Formulation 150
5.3.2 Equivalence of Simple Combination Rules 152 5.3.3 Generalized Mean
Combiner 153 5.3.4 A Theoretical Comparison of Simple Combiners 156 5.3.5
Where Do They Come From? 160 5.4 The Weighted Average (Linear Combiner) 166
5.4.1 Consensus Theory 166 5.4.2 Added Error for the Weighted Mean
Combination 167 5.4.3 Linear Regression 168 5.5 A Classifier as a Combiner
172 5.5.1 The Supra Bayesian Approach 172 5.5.2 Decision Templates 173
5.5.3 A Linear Classifier 175 5.6 An Example of Nine Combiners for
Continuous-Valued Outputs 175 5.7 To Train or Not to Train? 176 Appendix
178 5.A.1 Theoretical Classification Error for the Simple Combiners 178
5.A.1.1 Set-up and Assumptions 178 5.A.1.2 Individual Error 180 5.A.1.3
Minimum and Maximum 180 5.A.1.4 Average (Sum) 181 5.A.1.5 Median and
Majority Vote 182 5.A.1.6 Oracle 183 5.A.2 Selected MATLAB Code 183 6
Ensemble Methods 186 6.1 Bagging 186 6.1.1 The Origins: Bagging Predictors
186 6.1.2 Why Does Bagging Work? 187 6.1.3 Out-of-bag Estimates 189 6.1.4
Variants of Bagging 190 6.2 Random Forests 190 6.3 AdaBoost 192 6.3.1 The
AdaBoost Algorithm 192 6.3.2 The arc-x4 Algorithm 194 6.3.3 Why Does
AdaBoost Work? 195 6.3.4 Variants of Boosting 199 6.3.5 A Famous
Application: AdaBoost for Face Detection 199 6.4 Random Subspace Ensembles
203 6.5 Rotation Forest 204 6.6 Random Linear Oracle 208 6.7 Error
Correcting Output Codes (ECOC) 211 6.7.1 Code Designs 212 6.7.2 Decoding
214 6.7.3 Ensembles of Nested Dichotomies 216 Appendix 218 6.A.1 Bagging
218 6.A.2 AdaBoost 220 6.A.3 Random Subspace 223 6.A.4 Rotation Forest 225
6.A.5 Random Linear Oracle 228 6.A.6 ECOC 229 7 Classifier Selection 230
7.1 Preliminaries 230 7.2 Why Classifier Selection Works 231 7.3 Estimating
Local Competence Dynamically 233 7.3.1 Decision-Independent Estimates 233
7.3.2 Decision-Dependent Estimates 238 7.4 Pre-Estimation of the Competence
Regions 239 7.4.1 Bespoke Classifiers 240 7.4.2 Clustering and Selection
241 7.5 Simultaneous Training of Regions and Classifiers 242 7.6 Cascade
Classifiers 244 Appendix: Selected MATLAB Code 244 7.A.1 Banana Data 244
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the Banana Data
245 8 Diversity in Classifier Ensembles 247 8.1 What Is Diversity? 247
8.1.1 Diversity for a Point-Value Estimate 248 8.1.2 Diversity in Software
Engineering 248 8.1.3 Statistical Measures of Relationship 249 8.2
Measuring Diversity in Classifier Ensembles 250 8.2.1 Pairwise Measures 250
8.2.2 Nonpairwise Measures 251 8.3 Relationship Between Diversity and
Accuracy 256 8.3.1 An Example 256 8.3.2 Relationship Patterns 258 8.3.3 A
Caveat: Independent Outputs
Independent Errors 262 8.3.4 Independence Is
Not the Best Scenario 265 8.3.5 Diversity and Ensemble Margins 267 8.4
Using Diversity 270 8.4.1 Diversity for Finding Bounds and Theoretical
Relationships 270 8.4.2 Kappa-error Diagrams and Ensemble Maps 271 8.4.3
Overproduce and Select 275 8.5 Conclusions: Diversity of Diversity 279
Appendix 280 8.A.1 Derivation of Diversity Measures for Oracle Outputs 280
8.A.1.1 Correlation 280 8.A.1.2 Interrater Agreement 281 8.A.2 Diversity
Measure Equivalence 282 8.A.3 Independent Outputs
Independent Errors 284
8.A.4 A Bound on the Kappa-Error Diagram 286 8.A.5 Calculation of the
Pareto Frontier 287 9 Ensemble Feature Selection 290 9.1 Preliminaries 290
9.1.1 Right and Wrong Protocols 290 9.1.2 Ensemble Feature Selection
Approaches 294 9.1.3 Natural Grouping 294 9.2 Ranking by Decision Tree
Ensembles 295 9.2.1 Simple Count and Split Criterion 295 9.2.2 Permuted
Features or the "Noised-up" Method 297 9.3 Ensembles of Rankers 299 9.3.1
The Approach 299 9.3.2 Ranking Methods (Criteria) 300 9.4 Random Feature
Selection for the Ensemble 305 9.4.1 Random Subspace Revisited 305 9.4.2
Usability Coverage and Feature Diversity 306 9.4.3 Genetic Algorithms 312
9.5 Nonrandom Selection 315 9.5.1 The "Favorite Class" Model 315 9.5.2 The
Iterative Model 315 9.5.3 The Incremental Model 316 9.6 A Stability Index
317 9.6.1 Consistency Between a Pair of Subsets 317 9.6.2 A Stability Index
for K Sequences 319 9.6.3 An Example of Applying the Stability Index 320
Appendix 322 9.A.1 MATLAB Code for the Numerical Example of Ensemble
Ranking 322 9.A.2 MATLAB GA Nuggets 322 9.A.3 MATLAB Code for the Stability
Index 324 10 A Final Thought 326 References 327 Index 353