- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
The rapid uncontrolled growth of classification methods in DNA microarray studies has resulted in a body of information scattered throughout literature, numerous conference proceedings, and others. This book brings together many of the unsupervised and supervised classification methods now dispersed in the literature. It breaks away from traditional statistical methods by providing chapters on newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, and swarm intelligence involving particle swarm…mehr
Andere Kunden interessierten sich auch für
- Joy MundyThe Microsoft Data Warehouse Toolkit55,99 €
- Pierre BonnetEnterprise Data Governance186,99 €
- Thomas C. HammergrenData Warehousing for Dummies33,99 €
- Adam JorgensenMicrosoft SQL Server 2012 Bible50,99 €
- Sheeri K. CabralMySQL Administrator's Bible63,99 €
- Khalid Mohammad JaberFast Decision Tree To Index Large DNA-Protein Sequence Datasets51,99 €
- Renu RawatGenome annotation and finding repetitive DNA elements17,95 €
-
-
-
The rapid uncontrolled growth of classification methods in DNA microarray studies has resulted in a body of information scattered throughout literature, numerous conference proceedings, and others. This book brings together many of the unsupervised and supervised classification methods now dispersed in the literature. It breaks away from traditional statistical methods by providing chapters on newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, and swarm intelligence involving particle swarm optimization and ant colony optimization.
Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource:
Provides information on the use of classification analysis for DNA microarrays used for large-scale high-throughput transcriptional studies
Serves as a historical repository of general use supervised classification methods as well as newer contemporary methods
Brings the reader quickly up to speed on the various classification methods by implementing the programming pseudo-code and source code provided in the book
Describes implementation methods that help shorten discovery times
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource:
Provides information on the use of classification analysis for DNA microarrays used for large-scale high-throughput transcriptional studies
Serves as a historical repository of general use supervised classification methods as well as newer contemporary methods
Brings the reader quickly up to speed on the various classification methods by implementing the programming pseudo-code and source code provided in the book
Describes implementation methods that help shorten discovery times
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Wiley Series in Bioinformatics
- Verlag: IEEE Press / Wiley & Sons
- 1. Auflage
- Seitenzahl: 736
- Erscheinungstermin: Mai 2013
- Englisch
- Abmessung: 240mm x 161mm x 45mm
- Gewicht: 1285g
- ISBN-13: 9780470170816
- ISBN-10: 0470170816
- Artikelnr.: 26172588
- Wiley Series in Bioinformatics
- Verlag: IEEE Press / Wiley & Sons
- 1. Auflage
- Seitenzahl: 736
- Erscheinungstermin: Mai 2013
- Englisch
- Abmessung: 240mm x 161mm x 45mm
- Gewicht: 1285g
- ISBN-13: 9780470170816
- ISBN-10: 0470170816
- Artikelnr.: 26172588
LEIF E. PETERSON, PHD, is Associate Professor of Public Health, Weill Cornell Medical College, Cornell University, and is with the Center for Biostatistics, The Methodist Hospital Research Institute (Houston). He is a member of the IEEE Computational Intelligence Society, and Editor-in-Chief of the BioMed Central Source Code for Biology and Medicine.
Preface xix
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
Part I Class Discovery 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.5.1 Davies-Bouldin Index 25
2.5.2 Dunn's Index 25
2.5.3 Intracluster Distance 26
2.5.4 Intercluster Distance 27
2.5.5 Silhouette Index 30
2.5.6 Hubert's Statistic 31
2.5.7 Randomization Tests for Optimal Value of K 31
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.7.1 K Randomly Selected Microarrays 37
2.7.2 K Random Partitions 40
2.7.3 Prototype Splitting 41
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.2.1 Feature Transformation and Reference Vector Initialization 59
4.2.2 Learning 60
4.2.3 Conscience 61
4.3 Implementation 63
4.3.1 Feature Transformation and Reference Vector Initialization 63
4.3.2 Reference Vector Weight Learning 66
4.4 Cluster Visualization 67
4.4.1 Crisp K-Means Cluster Analysis 67
4.4.2 Adjacency Matrix Method 68
4.4.3 Cluster Connectivity Method 69
4.4.4 Hue-Saturation-Value (HSV) Color Normalization 69
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.3.1 Feature Transformation and Prototype Initialization 82
5.3.2 Prototype Learning 83
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.2.1 General Programming Methods 91
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes
92
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes
94
6.3 Algorithm 96
6.4 Implementation 96
6.4.1 Heatmap Color Control 96
6.4.2 User Choices for Clustering Arrays and Genes 97
6.4.3 Distance Matrices and Agglomeration Sequences 98
6.4.4 Drawing Dendograms and Heatmaps 104
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.4.1 Automatic Indexing 120
8.4.2 Removing Stopwords 121
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
Part II Dimension Reduction 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.2.1 Matrix Definitions 162
10.2.2 Principal Component Solution of R 163
10.2.3 Extraction of Principal Components 164
10.2.4 Varimax Orthogonal Rotation of Components 166
10.2.5 Principal Component Score Coefficients 168
10.2.6 Principal Component Scores 169
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.5.1 Correlation Matrix R 171
10.5.2 Eigenanalysis of Correlation Matrix R 172
10.5.3 Determination of Loadings and Varimax Rotation 174
10.5.4 Calculating Principal Component (PC) Scores 176
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
Part III Class Prediction 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.3.1 Numbers 209
12.3.2 Responses 209
12.3.3 Measurement Scales 210
12.3.4 Variables 211
12.4 Data Arrangement 211
12.5 Filtering 213
12.5.1 Continuous Features 213
12.5.2 Best Rank Filters 219
12.5.3 Randomization Tests 236
12.5.4 Multitesting Problem 237
12.5.5 Filtering Qualitative Features 242
12.5.6 Multiclass Gini Diversity Index 246
12.5.7 Class Comparison Techniques 247
12.5.8 Generation of Nonredundant Gene List 250
12.6 Selection Methods 254
12.6.1 Greedy Plus Takeaway (Greedy PTA) 254
12.6.2 Best Ranked Genes 258
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Introduction 273
13.2 Input-Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver-Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 297
14.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 311
15.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 331
16.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 361
17.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Na¿ve Bayes Classifier 379
18.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 393
19.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.3.1 Algorithm 397
19.3.2 Cross-Validation Results 397
19.3.3 Bootstrap Bias 401
19.3.4 Multiclass ROC Curves 402
19.3.5 Decision Boundaries 403
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher's Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 415
20.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 433
21.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 449
22.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.5.1 Cross-Validation Results 455
22.5.2 Bootstrap Bias 457
22.5.3 Multiclass ROC Curves 465
22.5.4 Decision Boundaries 465
22.6 Least-Squares Soft-Margin SVM 465
22.6.1 Cross-Validation Results 470
22.6.2 Bootstrap Bias 477
22.6.3 Multiclass ROC Curves 477
22.6.4 Decision Boundaries 477
22.7 Summary 481
References 483
23 Artificial Neural Networks 487
23.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.3.1 Backpropagation Learning 493
23.3.2 Resilient Backpropagation (RPROP) Learning 496
23.3.3 Cycles and Epochs 496
23.4 ANN Training Methods 497
23.4.1 Method 1: Gene Dimensional Reduction and Recursive Feature
Elimination for Large Gene Lists 497
23.4.2 Method 2: Gene Filtering and Selection 502
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 525
24.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 543
25.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.5.1 Classification 560
25.5.2 Continuous-Function Approximation 562
25.6 Summary 567
References 567
26 Supervised Neural Gas 573
26.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 591
27.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 601
28.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
Appendixes 625
A Probability Primer 627
A.1 Choices 627
A.2 Permutations 628
A.3 Combinations 630
A.4 Probability 632
A.4.1 Addition Rule 633
A.4.2 Multiplication Rule and Conditional Probabilities 634
A.4.3 Multiplication Rule for Independent Events 635
A.4.4 Elimination Rule (Disease Prevalence) 636
A.4.5 Bayes' Rule (Pathway Probabilities) 637
B Matrix Algebra 639
B.1 Vectors 639
B.2 Matrices 642
B.3 Sample Mean, Covariance, and Correlation 647
B.4 Diagonal Matrices 648
B.5 Identity Matrices 649
B.6 Trace of a Matrix 650
B.7 Eigenanalysis 650
B.8 Symmetric Eigenvalue Problem 650
B.9 Generalized Eigenvalue Problem 651
B.10 Matrix Properties 652
C Mathematical Functions 655
C.1 Inequalities 655
C.2 Laws of Exponents 655
C.3 Laws of Radicals 656
C.4 Absolute Value 656
C.5 Logarithms 656
C.6 Product and Summation Operators 657
C.7 Partial Derivatives 657
C.8 Likelihood Functions 658
D Statistical Primitives 665
D.1 Rules of Thumb 665
D.2 Primitives 668
References 678
E Probability Distributions 679
E.1 Basics of Hypothesis Testing 679
E.2 Probability Functions: Source of p Values 682
E.3 Normal Distribution 682
E.4 Gamma Function 686
E.5 Beta Function 689
E.6 Pseudo-Random-Number Generation 692
E.6.1 Standard Uniform Distribution 692
E.6.2 Normal Distribution 693
E.6.3 Lognormal Distribution 694
E.6.4 Binomial Distribution 695
E.6.5 Poisson Distribution 696
E.6.6 Triangle Distribution 697
E.6.7 Log-Triangle Distribution 698
References 698
F Symbols and Notation 699
Index 703
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
Part I Class Discovery 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.5.1 Davies-Bouldin Index 25
2.5.2 Dunn's Index 25
2.5.3 Intracluster Distance 26
2.5.4 Intercluster Distance 27
2.5.5 Silhouette Index 30
2.5.6 Hubert's Statistic 31
2.5.7 Randomization Tests for Optimal Value of K 31
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.7.1 K Randomly Selected Microarrays 37
2.7.2 K Random Partitions 40
2.7.3 Prototype Splitting 41
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.2.1 Feature Transformation and Reference Vector Initialization 59
4.2.2 Learning 60
4.2.3 Conscience 61
4.3 Implementation 63
4.3.1 Feature Transformation and Reference Vector Initialization 63
4.3.2 Reference Vector Weight Learning 66
4.4 Cluster Visualization 67
4.4.1 Crisp K-Means Cluster Analysis 67
4.4.2 Adjacency Matrix Method 68
4.4.3 Cluster Connectivity Method 69
4.4.4 Hue-Saturation-Value (HSV) Color Normalization 69
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.3.1 Feature Transformation and Prototype Initialization 82
5.3.2 Prototype Learning 83
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.2.1 General Programming Methods 91
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes
92
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes
94
6.3 Algorithm 96
6.4 Implementation 96
6.4.1 Heatmap Color Control 96
6.4.2 User Choices for Clustering Arrays and Genes 97
6.4.3 Distance Matrices and Agglomeration Sequences 98
6.4.4 Drawing Dendograms and Heatmaps 104
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.4.1 Automatic Indexing 120
8.4.2 Removing Stopwords 121
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
Part II Dimension Reduction 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.2.1 Matrix Definitions 162
10.2.2 Principal Component Solution of R 163
10.2.3 Extraction of Principal Components 164
10.2.4 Varimax Orthogonal Rotation of Components 166
10.2.5 Principal Component Score Coefficients 168
10.2.6 Principal Component Scores 169
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.5.1 Correlation Matrix R 171
10.5.2 Eigenanalysis of Correlation Matrix R 172
10.5.3 Determination of Loadings and Varimax Rotation 174
10.5.4 Calculating Principal Component (PC) Scores 176
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
Part III Class Prediction 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.3.1 Numbers 209
12.3.2 Responses 209
12.3.3 Measurement Scales 210
12.3.4 Variables 211
12.4 Data Arrangement 211
12.5 Filtering 213
12.5.1 Continuous Features 213
12.5.2 Best Rank Filters 219
12.5.3 Randomization Tests 236
12.5.4 Multitesting Problem 237
12.5.5 Filtering Qualitative Features 242
12.5.6 Multiclass Gini Diversity Index 246
12.5.7 Class Comparison Techniques 247
12.5.8 Generation of Nonredundant Gene List 250
12.6 Selection Methods 254
12.6.1 Greedy Plus Takeaway (Greedy PTA) 254
12.6.2 Best Ranked Genes 258
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Introduction 273
13.2 Input-Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver-Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 297
14.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 311
15.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 331
16.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 361
17.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Na¿ve Bayes Classifier 379
18.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 393
19.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.3.1 Algorithm 397
19.3.2 Cross-Validation Results 397
19.3.3 Bootstrap Bias 401
19.3.4 Multiclass ROC Curves 402
19.3.5 Decision Boundaries 403
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher's Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 415
20.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 433
21.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 449
22.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.5.1 Cross-Validation Results 455
22.5.2 Bootstrap Bias 457
22.5.3 Multiclass ROC Curves 465
22.5.4 Decision Boundaries 465
22.6 Least-Squares Soft-Margin SVM 465
22.6.1 Cross-Validation Results 470
22.6.2 Bootstrap Bias 477
22.6.3 Multiclass ROC Curves 477
22.6.4 Decision Boundaries 477
22.7 Summary 481
References 483
23 Artificial Neural Networks 487
23.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.3.1 Backpropagation Learning 493
23.3.2 Resilient Backpropagation (RPROP) Learning 496
23.3.3 Cycles and Epochs 496
23.4 ANN Training Methods 497
23.4.1 Method 1: Gene Dimensional Reduction and Recursive Feature
Elimination for Large Gene Lists 497
23.4.2 Method 2: Gene Filtering and Selection 502
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 525
24.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 543
25.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.5.1 Classification 560
25.5.2 Continuous-Function Approximation 562
25.6 Summary 567
References 567
26 Supervised Neural Gas 573
26.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 591
27.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 601
28.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
Appendixes 625
A Probability Primer 627
A.1 Choices 627
A.2 Permutations 628
A.3 Combinations 630
A.4 Probability 632
A.4.1 Addition Rule 633
A.4.2 Multiplication Rule and Conditional Probabilities 634
A.4.3 Multiplication Rule for Independent Events 635
A.4.4 Elimination Rule (Disease Prevalence) 636
A.4.5 Bayes' Rule (Pathway Probabilities) 637
B Matrix Algebra 639
B.1 Vectors 639
B.2 Matrices 642
B.3 Sample Mean, Covariance, and Correlation 647
B.4 Diagonal Matrices 648
B.5 Identity Matrices 649
B.6 Trace of a Matrix 650
B.7 Eigenanalysis 650
B.8 Symmetric Eigenvalue Problem 650
B.9 Generalized Eigenvalue Problem 651
B.10 Matrix Properties 652
C Mathematical Functions 655
C.1 Inequalities 655
C.2 Laws of Exponents 655
C.3 Laws of Radicals 656
C.4 Absolute Value 656
C.5 Logarithms 656
C.6 Product and Summation Operators 657
C.7 Partial Derivatives 657
C.8 Likelihood Functions 658
D Statistical Primitives 665
D.1 Rules of Thumb 665
D.2 Primitives 668
References 678
E Probability Distributions 679
E.1 Basics of Hypothesis Testing 679
E.2 Probability Functions: Source of p Values 682
E.3 Normal Distribution 682
E.4 Gamma Function 686
E.5 Beta Function 689
E.6 Pseudo-Random-Number Generation 692
E.6.1 Standard Uniform Distribution 692
E.6.2 Normal Distribution 693
E.6.3 Lognormal Distribution 694
E.6.4 Binomial Distribution 695
E.6.5 Poisson Distribution 696
E.6.6 Triangle Distribution 697
E.6.7 Log-Triangle Distribution 698
References 698
F Symbols and Notation 699
Index 703
Preface xix
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
Part I Class Discovery 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.5.1 Davies-Bouldin Index 25
2.5.2 Dunn's Index 25
2.5.3 Intracluster Distance 26
2.5.4 Intercluster Distance 27
2.5.5 Silhouette Index 30
2.5.6 Hubert's Statistic 31
2.5.7 Randomization Tests for Optimal Value of K 31
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.7.1 K Randomly Selected Microarrays 37
2.7.2 K Random Partitions 40
2.7.3 Prototype Splitting 41
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.2.1 Feature Transformation and Reference Vector Initialization 59
4.2.2 Learning 60
4.2.3 Conscience 61
4.3 Implementation 63
4.3.1 Feature Transformation and Reference Vector Initialization 63
4.3.2 Reference Vector Weight Learning 66
4.4 Cluster Visualization 67
4.4.1 Crisp K-Means Cluster Analysis 67
4.4.2 Adjacency Matrix Method 68
4.4.3 Cluster Connectivity Method 69
4.4.4 Hue-Saturation-Value (HSV) Color Normalization 69
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.3.1 Feature Transformation and Prototype Initialization 82
5.3.2 Prototype Learning 83
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.2.1 General Programming Methods 91
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes
92
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes
94
6.3 Algorithm 96
6.4 Implementation 96
6.4.1 Heatmap Color Control 96
6.4.2 User Choices for Clustering Arrays and Genes 97
6.4.3 Distance Matrices and Agglomeration Sequences 98
6.4.4 Drawing Dendograms and Heatmaps 104
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.4.1 Automatic Indexing 120
8.4.2 Removing Stopwords 121
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
Part II Dimension Reduction 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.2.1 Matrix Definitions 162
10.2.2 Principal Component Solution of R 163
10.2.3 Extraction of Principal Components 164
10.2.4 Varimax Orthogonal Rotation of Components 166
10.2.5 Principal Component Score Coefficients 168
10.2.6 Principal Component Scores 169
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.5.1 Correlation Matrix R 171
10.5.2 Eigenanalysis of Correlation Matrix R 172
10.5.3 Determination of Loadings and Varimax Rotation 174
10.5.4 Calculating Principal Component (PC) Scores 176
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
Part III Class Prediction 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.3.1 Numbers 209
12.3.2 Responses 209
12.3.3 Measurement Scales 210
12.3.4 Variables 211
12.4 Data Arrangement 211
12.5 Filtering 213
12.5.1 Continuous Features 213
12.5.2 Best Rank Filters 219
12.5.3 Randomization Tests 236
12.5.4 Multitesting Problem 237
12.5.5 Filtering Qualitative Features 242
12.5.6 Multiclass Gini Diversity Index 246
12.5.7 Class Comparison Techniques 247
12.5.8 Generation of Nonredundant Gene List 250
12.6 Selection Methods 254
12.6.1 Greedy Plus Takeaway (Greedy PTA) 254
12.6.2 Best Ranked Genes 258
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Introduction 273
13.2 Input-Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver-Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 297
14.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 311
15.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 331
16.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 361
17.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Na¿ve Bayes Classifier 379
18.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 393
19.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.3.1 Algorithm 397
19.3.2 Cross-Validation Results 397
19.3.3 Bootstrap Bias 401
19.3.4 Multiclass ROC Curves 402
19.3.5 Decision Boundaries 403
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher's Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 415
20.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 433
21.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 449
22.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.5.1 Cross-Validation Results 455
22.5.2 Bootstrap Bias 457
22.5.3 Multiclass ROC Curves 465
22.5.4 Decision Boundaries 465
22.6 Least-Squares Soft-Margin SVM 465
22.6.1 Cross-Validation Results 470
22.6.2 Bootstrap Bias 477
22.6.3 Multiclass ROC Curves 477
22.6.4 Decision Boundaries 477
22.7 Summary 481
References 483
23 Artificial Neural Networks 487
23.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.3.1 Backpropagation Learning 493
23.3.2 Resilient Backpropagation (RPROP) Learning 496
23.3.3 Cycles and Epochs 496
23.4 ANN Training Methods 497
23.4.1 Method 1: Gene Dimensional Reduction and Recursive Feature
Elimination for Large Gene Lists 497
23.4.2 Method 2: Gene Filtering and Selection 502
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 525
24.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 543
25.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.5.1 Classification 560
25.5.2 Continuous-Function Approximation 562
25.6 Summary 567
References 567
26 Supervised Neural Gas 573
26.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 591
27.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 601
28.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
Appendixes 625
A Probability Primer 627
A.1 Choices 627
A.2 Permutations 628
A.3 Combinations 630
A.4 Probability 632
A.4.1 Addition Rule 633
A.4.2 Multiplication Rule and Conditional Probabilities 634
A.4.3 Multiplication Rule for Independent Events 635
A.4.4 Elimination Rule (Disease Prevalence) 636
A.4.5 Bayes' Rule (Pathway Probabilities) 637
B Matrix Algebra 639
B.1 Vectors 639
B.2 Matrices 642
B.3 Sample Mean, Covariance, and Correlation 647
B.4 Diagonal Matrices 648
B.5 Identity Matrices 649
B.6 Trace of a Matrix 650
B.7 Eigenanalysis 650
B.8 Symmetric Eigenvalue Problem 650
B.9 Generalized Eigenvalue Problem 651
B.10 Matrix Properties 652
C Mathematical Functions 655
C.1 Inequalities 655
C.2 Laws of Exponents 655
C.3 Laws of Radicals 656
C.4 Absolute Value 656
C.5 Logarithms 656
C.6 Product and Summation Operators 657
C.7 Partial Derivatives 657
C.8 Likelihood Functions 658
D Statistical Primitives 665
D.1 Rules of Thumb 665
D.2 Primitives 668
References 678
E Probability Distributions 679
E.1 Basics of Hypothesis Testing 679
E.2 Probability Functions: Source of p Values 682
E.3 Normal Distribution 682
E.4 Gamma Function 686
E.5 Beta Function 689
E.6 Pseudo-Random-Number Generation 692
E.6.1 Standard Uniform Distribution 692
E.6.2 Normal Distribution 693
E.6.3 Lognormal Distribution 694
E.6.4 Binomial Distribution 695
E.6.5 Poisson Distribution 696
E.6.6 Triangle Distribution 697
E.6.7 Log-Triangle Distribution 698
References 698
F Symbols and Notation 699
Index 703
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
Part I Class Discovery 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.5.1 Davies-Bouldin Index 25
2.5.2 Dunn's Index 25
2.5.3 Intracluster Distance 26
2.5.4 Intercluster Distance 27
2.5.5 Silhouette Index 30
2.5.6 Hubert's Statistic 31
2.5.7 Randomization Tests for Optimal Value of K 31
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.7.1 K Randomly Selected Microarrays 37
2.7.2 K Random Partitions 40
2.7.3 Prototype Splitting 41
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.2.1 Feature Transformation and Reference Vector Initialization 59
4.2.2 Learning 60
4.2.3 Conscience 61
4.3 Implementation 63
4.3.1 Feature Transformation and Reference Vector Initialization 63
4.3.2 Reference Vector Weight Learning 66
4.4 Cluster Visualization 67
4.4.1 Crisp K-Means Cluster Analysis 67
4.4.2 Adjacency Matrix Method 68
4.4.3 Cluster Connectivity Method 69
4.4.4 Hue-Saturation-Value (HSV) Color Normalization 69
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.3.1 Feature Transformation and Prototype Initialization 82
5.3.2 Prototype Learning 83
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.2.1 General Programming Methods 91
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes
92
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes
94
6.3 Algorithm 96
6.4 Implementation 96
6.4.1 Heatmap Color Control 96
6.4.2 User Choices for Clustering Arrays and Genes 97
6.4.3 Distance Matrices and Agglomeration Sequences 98
6.4.4 Drawing Dendograms and Heatmaps 104
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.4.1 Automatic Indexing 120
8.4.2 Removing Stopwords 121
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
Part II Dimension Reduction 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.2.1 Matrix Definitions 162
10.2.2 Principal Component Solution of R 163
10.2.3 Extraction of Principal Components 164
10.2.4 Varimax Orthogonal Rotation of Components 166
10.2.5 Principal Component Score Coefficients 168
10.2.6 Principal Component Scores 169
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.5.1 Correlation Matrix R 171
10.5.2 Eigenanalysis of Correlation Matrix R 172
10.5.3 Determination of Loadings and Varimax Rotation 174
10.5.4 Calculating Principal Component (PC) Scores 176
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
Part III Class Prediction 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.3.1 Numbers 209
12.3.2 Responses 209
12.3.3 Measurement Scales 210
12.3.4 Variables 211
12.4 Data Arrangement 211
12.5 Filtering 213
12.5.1 Continuous Features 213
12.5.2 Best Rank Filters 219
12.5.3 Randomization Tests 236
12.5.4 Multitesting Problem 237
12.5.5 Filtering Qualitative Features 242
12.5.6 Multiclass Gini Diversity Index 246
12.5.7 Class Comparison Techniques 247
12.5.8 Generation of Nonredundant Gene List 250
12.6 Selection Methods 254
12.6.1 Greedy Plus Takeaway (Greedy PTA) 254
12.6.2 Best Ranked Genes 258
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Introduction 273
13.2 Input-Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver-Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 297
14.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 311
15.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 331
16.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 361
17.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Na¿ve Bayes Classifier 379
18.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 393
19.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.3.1 Algorithm 397
19.3.2 Cross-Validation Results 397
19.3.3 Bootstrap Bias 401
19.3.4 Multiclass ROC Curves 402
19.3.5 Decision Boundaries 403
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher's Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 415
20.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 433
21.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 449
22.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.5.1 Cross-Validation Results 455
22.5.2 Bootstrap Bias 457
22.5.3 Multiclass ROC Curves 465
22.5.4 Decision Boundaries 465
22.6 Least-Squares Soft-Margin SVM 465
22.6.1 Cross-Validation Results 470
22.6.2 Bootstrap Bias 477
22.6.3 Multiclass ROC Curves 477
22.6.4 Decision Boundaries 477
22.7 Summary 481
References 483
23 Artificial Neural Networks 487
23.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.3.1 Backpropagation Learning 493
23.3.2 Resilient Backpropagation (RPROP) Learning 496
23.3.3 Cycles and Epochs 496
23.4 ANN Training Methods 497
23.4.1 Method 1: Gene Dimensional Reduction and Recursive Feature
Elimination for Large Gene Lists 497
23.4.2 Method 2: Gene Filtering and Selection 502
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 525
24.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 543
25.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.5.1 Classification 560
25.5.2 Continuous-Function Approximation 562
25.6 Summary 567
References 567
26 Supervised Neural Gas 573
26.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 591
27.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 601
28.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
Appendixes 625
A Probability Primer 627
A.1 Choices 627
A.2 Permutations 628
A.3 Combinations 630
A.4 Probability 632
A.4.1 Addition Rule 633
A.4.2 Multiplication Rule and Conditional Probabilities 634
A.4.3 Multiplication Rule for Independent Events 635
A.4.4 Elimination Rule (Disease Prevalence) 636
A.4.5 Bayes' Rule (Pathway Probabilities) 637
B Matrix Algebra 639
B.1 Vectors 639
B.2 Matrices 642
B.3 Sample Mean, Covariance, and Correlation 647
B.4 Diagonal Matrices 648
B.5 Identity Matrices 649
B.6 Trace of a Matrix 650
B.7 Eigenanalysis 650
B.8 Symmetric Eigenvalue Problem 650
B.9 Generalized Eigenvalue Problem 651
B.10 Matrix Properties 652
C Mathematical Functions 655
C.1 Inequalities 655
C.2 Laws of Exponents 655
C.3 Laws of Radicals 656
C.4 Absolute Value 656
C.5 Logarithms 656
C.6 Product and Summation Operators 657
C.7 Partial Derivatives 657
C.8 Likelihood Functions 658
D Statistical Primitives 665
D.1 Rules of Thumb 665
D.2 Primitives 668
References 678
E Probability Distributions 679
E.1 Basics of Hypothesis Testing 679
E.2 Probability Functions: Source of p Values 682
E.3 Normal Distribution 682
E.4 Gamma Function 686
E.5 Beta Function 689
E.6 Pseudo-Random-Number Generation 692
E.6.1 Standard Uniform Distribution 692
E.6.2 Normal Distribution 693
E.6.3 Lognormal Distribution 694
E.6.4 Binomial Distribution 695
E.6.5 Poisson Distribution 696
E.6.6 Triangle Distribution 697
E.6.7 Log-Triangle Distribution 698
References 698
F Symbols and Notation 699
Index 703