Mourad Elloumi, Costas Iliopoulos, Jason T. L. Wang, Albert Y. Zomaya
Pattern Recognition in Computational Molecular Biology (eBook, ePUB)
Techniques and Approaches
123,99 €
123,99 €
inkl. MwSt.
Sofort per Download lieferbar
0 °P sammeln
123,99 €
Als Download kaufen
123,99 €
inkl. MwSt.
Sofort per Download lieferbar
0 °P sammeln
Jetzt verschenken
Alle Infos zum eBook verschenken
123,99 €
inkl. MwSt.
Sofort per Download lieferbar
Alle Infos zum eBook verschenken
0 °P sammeln
Mourad Elloumi, Costas Iliopoulos, Jason T. L. Wang, Albert Y. Zomaya
Pattern Recognition in Computational Molecular Biology (eBook, ePUB)
Techniques and Approaches
- Format: ePub
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Bitte loggen Sie sich zunächst in Ihr Kundenkonto ein oder registrieren Sie sich bei
bücher.de, um das eBook-Abo tolino select nutzen zu können.
Hier können Sie sich einloggen
Hier können Sie sich einloggen
Sie sind bereits eingeloggt. Klicken Sie auf 2. tolino select Abo, um fortzufahren.
Bitte loggen Sie sich zunächst in Ihr Kundenkonto ein oder registrieren Sie sich bei bücher.de, um das eBook-Abo tolino select nutzen zu können.
A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology
This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences , Pattern Recognition in Secondary Structures , Pattern…mehr
- Geräte: eReader
- ohne Kopierschutz
- eBook Hilfe
Andere Kunden interessierten sich auch für
- Ulisses M. Braga NetoError Estimation for Pattern Recognition (eBook, ePUB)118,99 €
- Wladyslaw HomendaPattern Recognition (eBook, ePUB)109,99 €
- Pradipta MajiRough-Fuzzy Pattern Recognition (eBook, ePUB)101,99 €
- Amit KonarEmotion Recognition (eBook, ePUB)118,99 €
- Jürgen BeyererPattern Recognition (eBook, ePUB)59,95 €
- Streamlining Digital Signal Processing (eBook, ePUB)80,99 €
- Basel Abu-JamousIntegrative Cluster Analysis in Bioinformatics (eBook, ePUB)101,99 €
-
-
-
A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology
This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.
This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.
- Surveys the development of techniques and approaches on pattern recognition in biomolecular data
- Discusses pattern recognition in primary, secondary, tertiary and quaternary structures, as well as microarrays, phylogenetic trees and biological networks
- Includes case studies and examples to further illustrate the concepts discussed in the book
Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in D ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: John Wiley & Sons
- Erscheinungstermin: 24. Dezember 2015
- Englisch
- ISBN-13: 9781119078869
- Artikelnr.: 44447401
- Verlag: John Wiley & Sons
- Erscheinungstermin: 24. Dezember 2015
- Englisch
- ISBN-13: 9781119078869
- Artikelnr.: 44447401
- Herstellerkennzeichnung Die Herstellerinformationen sind derzeit nicht verfügbar.
Mourad Elloumi, PhD, is Professor in Computer Science at the University of Tunis-El Manar, Tunisia. Dr. Elloumi is the author/co-author of more than 50 publications in international journals and conference proceedings related to Algorithmics, Computational Molecular Biology, and Knowledge Discovery and Data Mining.
Costas S. Iliopoulos, PhD, is Professor of AlgorithmDesign at King's College London, UK. Dr. Iliopoulos co-authored over 300 peer-reviewed articles in pattern matching and combinatorics of strings. He serves on the editorial board of the Journal of Discrete Algorithms, Computer Mathematics & Combinatorial Computing, and System Biology & Biomedical Technologies.
Jason T. L. Wang, PhD, is Professor of Computer Science at the New Jersey Institute of Technology, USA. Dr. Wang has published extensively on Data Mining and Computational Molecular Biology, and has been a member of program committees for over 200 conferences and workshops in these and related areas.
Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney, Australia. Dr. Zomaya published more than 500 scientific papers and articles and is author, co-author or editor of more than 20 books. Dr. Zomaya is Fellow of AAAS, IEEE, and IET.
Costas S. Iliopoulos, PhD, is Professor of AlgorithmDesign at King's College London, UK. Dr. Iliopoulos co-authored over 300 peer-reviewed articles in pattern matching and combinatorics of strings. He serves on the editorial board of the Journal of Discrete Algorithms, Computer Mathematics & Combinatorial Computing, and System Biology & Biomedical Technologies.
Jason T. L. Wang, PhD, is Professor of Computer Science at the New Jersey Institute of Technology, USA. Dr. Wang has published extensively on Data Mining and Computational Molecular Biology, and has been a member of program committees for over 200 conferences and workshops in these and related areas.
Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney, Australia. Dr. Zomaya published more than 500 scientific papers and articles and is author, co-author or editor of more than 20 books. Dr. Zomaya is Fellow of AAAS, IEEE, and IET.
LIST OF CONTRIBUTORS xxi
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.2.1 The Minimum Error Correction Model / 8
1.2.2 Probabilistic Approaches and Alternative Models / 10
1.3 Population Haplotyping / 12
1.3.1 Clark's Rule / 14
1.3.2 Pure Parsimony / 15
1.3.3 Perfect Phylogeny / 19
1.3.4 Disease Association / 21
1.3.5 Other Models / 22
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.2.1 Average Length of Optimal Barcodes / 33
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding
Problems / 36
2.4.1 Reductions from Set Covering Problem / 36
2.4.2 Reduction from Graph-Coloring Problem / 38
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.5.1 Entropy-Based Method with a Different Measure for Information Content
/ 39
2.5.2 Balanced Partitioning Approach / 40
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.2.1 Background on Whole-Genome Comparison / 44
3.2.2 Alignment-Free Methods / 45
3.2.3 Average Common Subword / 46
3.2.4 Kullback-Leibler Information Divergence / 47
3.3 Underlying Approach / 47
3.3.1 Irredundant Common Subwords / 48
3.3.2 Underlying Subwords / 49
3.3.3 Efficient Computation of Underlying Subwords / 50
3.3.4 Extension to Inversions and Complements / 53
3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53
3.4 Experimental Results / 54
3.4.1 Genome Data sets and Reference Taxonomies / 54
3.4.2 Whole-Genome Phylogeny Reconstruction / 56
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.2.1 Overall Objective Function / 67
4.2.2 Maximum Likelihood Model / 68
4.3 Motif Finding Algorithms / 70
4.3.1 DEM Motif Algorithm / 70
4.3.2 WEM Motif Finding Algorithm / 70
4.3.3 Metropolis Motif Finding Algorithm / 72
4.3.4 Gibbs Motif Finding Algorithm / 73
4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.5.1 Performance Evaluation / 76
4.5.2 CRP Binding Sites / 76
4.5.3 Multiple Motifs in Helix-Turn-Helix Protein Structure / 78
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomá Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang
6.1 Introduction / 99
6.2 Representative Protein Secondary Structure Prediction Methods / 102
6.2.1 Chou-Fasman / 103
6.2.2 GOR / 104
6.2.3 PHD / 104
6.2.4 PSIPRED / 104
6.2.5 SPINE-X / 105
6.2.6 PSSpred / 105
6.2.7 Meta Methods / 105
6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106
6.3.1 Measures / 106
6.3.2 Benchmark / 106
6.3.3 Performances / 107
6.4 Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS:
APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
Yann Guermeur and Fabien Lauer
7.1 Introduction / 114
7.2 Biological Sequence Segmentation / 115
7.3 MSVMpred / 117
7.3.1 Base Classifiers / 117
7.3.2 Ensemble Methods / 118
7.3.3 Convex Combination / 119
7.4 Postprocessing with A Generative Model / 119
7.5 Dedication to Protein Secondary Structure Prediction / 120
7.5.1 Biological Problem / 121
7.5.2 MSVMpred2 / 121
7.5.3 Hidden Semi-Markov Model / 122
7.5.4 Experimental Results / 122
7.6 Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih
8.1 Introduction / 129
8.2 A Few Basic Concepts / 130
8.2.1 Hierarchy of Protein Structures / 130
8.2.2 Secondary Structure Elements / 131
8.2.3 Structural Motifs / 132
8.2.4 Available Sources for Protein Data / 134
8.3 State of the Art / 135
8.3.1 Protein Structure Motif Search / 135
8.3.2 Promotif / 136
8.3.3 Secondary-Structure Matching / 137
8.3.4 Multiple Structural Alignment by Secondary Structures / 138
8.4 A Novel Geometrical Approach to Motif Retrieval / 138
8.4.1 Secondary Structures Cooccurrences / 138
8.4.2 Cross Motif Search / 143
8.4.3 Complete Cross Motif Search / 146
8.5 Implementation Notes / 149
8.5.1 Optimizations / 149
8.5.2 Parallel Approaches / 150
8.6 Conclusions and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY
155
Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang
9.1 Introduction / 155
9.2 Background / 156
9.2.1 Noncoding RNAs and Their Secondary Structures / 156
9.2.2 Pseudoknotted ncRNA Search Tools / 157
9.3 Methodology / 157
9.4 Results and Interpretation / 161
9.5 Conclusion / 162
References / 163
III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165
10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES
167
Wajdi Dhifli and Engelbert Mephu Nguifo
10.1 Introduction / 167
10.2 From Protein 3D-Structures to Protein Graphs / 169
10.2.1 Parsing Protein 3D-Structures into Graphs / 169
10.3 Graph Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.5.1 Problem Definition / 174
10.5.2 Candidates Generation / 176
10.5.3 Frequent Subgraph Discovery Approaches / 177
10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs /
178
10.6 Feature Selection / 179
10.6.1 Relevance of a Feature / 179
10.7 Feature Selection for Subgraphs / 180
10.7.1 Problem Statement / 180
10.7.2 Mining Top-k Subgraphs / 180
10.7.3 Clustering-Based Subgraph Selection / 181
10.7.4 Sampling-Based Approaches / 181
10.7.5 Approximate Subgraph Mining / 181
10.7.6 Discriminative Subgraph Selection / 182
10.7.7 Other Significant Subgraph Selection Approaches / 182
10.8 Discussion / 183
10.9 Conclusion / 185
Acknowledgments / 185
References / 186
11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION
OF PROTEIN TERTIARY STRUCTURES 190
Chinua Umoja, Xiaxia Yu, and Robert Harrison
11.1 Introduction / 190
11.2 Genetic Algorithms / 192
11.2.1 GA Model Selection in Protein Structure Prediction / 196
11.2.2 Common Methodology / 198
11.3 Supervised Machine Learning Algorithm / 201
11.3.1 Artificial Neural Networks / 201
11.3.2 ANNs in Protein Structure Prediction / 202
11.3.3 Support Vector Machines / 203
11.4 Fuzzy Application / 204
11.4.1 Fuzzy Logic / 204
11.4.2 Fuzzy SVMs / 204
11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205
11.4.4 Fuzzy Decision Trees / 206
11.5 Conclusion / 207
References / 208
12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat
12.1 Introduction / 212
12.2 Protein Structure Overview / 213
12.3 Technical Challenges and Open Issues / 214
12.4 Prediction Assessment / 215
12.5 Current Approaches / 216
12.5.1 DomCut / 216
12.5.2 Scooby-Domain / 217
12.5.3 FIEFDom / 218
12.5.4 Chatterjee et al. (2009) / 219
12.5.5 Drop / 219
12.6 Domain Boundary Prediction Using Enhanced General Regression Network /
220
12.6.1 Multi-Domain Benchmark Data Set / 220
12.6.2 Compact Domain Profile / 221
12.6.3 The Enhanced Semi-Parametric Model / 222
12.6.4 Training, Testing, and Validation / 225
12.6.5 Experimental Results / 226
12.7 Inter-Domain Linkers Prediction Using Compositional Index and
Simulated Annealing / 227
12.7.1 Compositional Index / 228
12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated
Annealing / 229
12.7.3 Experimental Results / 230
12.8 Conclusion / 232
References / 233
13 PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION 236
Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya
13.1 Introduction / 236
13.2 Methods / 238
13.2.1 Evolutionary Data Set Construction / 238
13.2.2 Protein Secondary Structure Information / 239
13.2.3 Method I: Intelligent Voting / 239
13.2.4 Method II: Randomized Meta-Learning / 241
13.2.5 Model Validation and Testing / 242
13.2.6 Parameter Tuning / 242
13.3 Model Evaluation and Analysis / 243
13.4 Conclusion / 245
References / 245
IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249
14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad
Jahandideh, and Kuo-Chen Chou
14.1 Introduction / 251
14.2 Protein Structure Prediction / 255
14.2.1 Secondary Structure Prediction / 255
14.2.2 Modeling of Tertiary Structure / 256
14.3 Template-Based Predictions / 257
14.3.1 Homology Modeling / 257
14.3.2 Threading Methods / 257
14.3.3 Ab initio Modeling / 257
14.4 Critical Assessment of Protein Structure Prediction / 258
14.5 Quaternary Structure Prediction / 258
14.6 Conclusion / 261
Acknowledgments / 261
References / 261
15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
Sheng-Lung Peng and Yu-Wei Tsay
15.1 Introduction / 266
15.2 Similarity in the Graph Model / 268
15.2.1 Graph Model for Proteins / 270
15.3 Measuring Structural Similarity VIA MCES / 272
15.3.1 Problem Formulation / 273
15.3.2 Constructing P-Graphs / 274
15.3.3 Constructing Line Graphs / 276
15.3.4 Constructing Modular Graphs / 276
15.3.5 Maximum Clique Detection / 277
15.3.6 Experimental Results / 277
15.4 Protein Comparison VIA Graph Spectra / 279
15.4.1 Graph Spectra / 279
15.4.2 Matrix Selection / 281
15.4.3 Graph Cospectrality and Similarity / 283
15.4.4 Cospectral Comparison / 283
15.4.5 Experimental Results / 284
15.5 Conclusion / 287
References / 287
16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN
INTERACTIONS 291
Mina Maleki, Michael Hall, and Luis Rueda
16.1 Introduction / 291
16.2 Structural Domains / 293
16.3 The Prediction Framework / 293
16.4 Feature Extraction and Prediction Properties / 294
16.4.1 Physicochemical Properties / 296
16.4.2 Domain-Based Properties / 298
16.5 Feature Selection / 299
16.5.1 Filter Methods / 299
16.5.2 Wrapper Methods / 301
16.6 Classification / 301
16.6.1 Linear Dimensionality Reduction / 301
16.6.2 Support Vector Machines / 303
16.6.3 k-Nearest Neighbor / 303
16.6.4 Naive Bayes / 304
16.7 Evaluation and Analysis / 304
16.8 Results and Discussion / 304
16.8.1 Analysis of the Prediction Properties / 304
16.8.2 Analysis of Structural DDIs / 307
16.9 Conclusion / 309
References / 310
V PATTERN RECOGNITION IN MICROARRAYS 315
17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
Hasan O¢gul
17.1 Introduction / 317
17.2 Information Retrieval: Terminology and Background / 318
17.3 Content-Based Retrieval / 320
17.4 Microarray Data and Databases / 322
17.5 Methods for Retrieving Microarray Experiments / 324
17.6 Similarity Metrics / 327
17.7 Evaluating Retrieval Performance / 329
17.8 Software Tools / 330
17.9 Conclusion and Future Directions / 331
Acknowledgment / 332
References / 332
18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa
18.1 Introduction / 335
18.2 From Microarray Image to Signal / 336
18.2.1 Signal from Oligo DNA Array Image / 336
18.2.2 Signal from Two-Color cDNA Array / 337
18.3 Microarray Signal Analysis / 337
18.3.1 Absolute Analysis and Replicates in Microarrays / 338
18.3.2 Microarray Normalization / 339
18.4 Algorithms for De Gene Selection / 339
18.4.1 Within-Between DE Gene (WB-DEG) Selection Algorithm / 340
18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection
Methods on Latin Square Data / 341
18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE
PATTERN ANALYSIS 347
Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and
Paola Bertolazzi
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.3.1 Applications / 349
19.3.2 Microarray Technology / 350
19.3.3 Microarray Workflow / 350
19.4 RNA-Seq / 351
19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353
19.6 Gene Expression Profile Analysis / 356
19.6.1 Data Definition / 356
19.6.2 Data Analysis / 357
19.6.3 Normalization and Background Correction / 357
19.6.4 Genes Clustering / 359
19.6.5 Experiment Classification / 361
19.6.6 Software Tools for Gene Expression Profile Analysis / 362
19.7 Real Case Studies / 364
19.8 Conclusions / 367
References / 368
20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
Li Teng
20.1 Introduction / 371
20.2 Patterns with Similarity / 373
20.2.1 Similarity Measurement / 374
20.2.2 Clustering / 376
20.2.3 Biclustering / 379
20.2.4 Types of Biclusters / 380
20.2.5 Measurement of the Homogeneity / 383
20.2.6 Biclustering Algorithms with Different Searching Schemes / 387
20.3 Conclusion / 391
References / 391
21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS
OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa
21.1 Overview / 394
21.1.1 Background Correction Methods / 395
21.1.2 Normalization Methods / 396
21.1.3 Literature Review / 397
21.2 Arrow Plot / 399
21.2.1 DE Genes Versus Special Genes / 399
21.2.2 Definition and Properties of the ROC Curve / 400
21.2.3 AUC and Degenerate ROC Curves / 401
21.2.4 Overlapping Coefficient / 402
21.2.5 Arrow Plot Construction / 403
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence Analysis / 405
21.4.1 Basic Principles / 405
21.4.2 Interpretation of CA Maps / 406
21.5 Impact of the Preprocessing Methods / 407
21.5.1 Class Prediction Context / 408
21.5.2 Class Comparison Context / 408
21.6 Conclusions / 412
Acknowledgments / 413
References / 413
VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417
22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
David A. Morrison
22.1 Introduction / 419
22.2 Networks and Trees / 420
22.3 Patterns and Their Processes / 424
22.4 The Types of Patterns / 427
22.5 Fingerprints / 431
22.6 Constructing Networks / 433
22.7 Multi-Labeled Trees / 435
22.8 Conclusion / 436
References / 437
23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION:
IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN
RECOGNITION 439
Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas
23.1 Introduction / 440
23.2 Overview on Methods and Frameworks for Phylogenetic Tree
Reconstruction / 440
23.2.1 Inferring Gene Trees / 441
23.2.2 Inferring Species Trees / 442
23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree
Reconstruction / 445
23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence of Diverse Evolutionary Processes on Species Tree
Reconstruction / 447
23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals
of Pattern Recognition / 449
23.7 Concluding Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
David Dao, Tomá Flouri, and Alexandros Stamatakis
24.1 Introduction / 457
24.2 Preliminaries / 459
24.3 A Naïve Approach / 462
24.4 Toward a Faster Method / 463
24.5 Improved Algorithm / 467
24.5.1 Preprocessing / 467
24.5.2 Computing Lowest Common Ancestors / 468
24.5.3 Constructing the Induced Tree / 468
24.5.4 Final Remarks / 471
24.6 Implementation / 473
24.6.1 Preprocessing / 473
24.6.2 Reconstruction / 473
24.6.3 Extracting Bipartitions / 474
24.7 Evaluation / 474
24.7.1 Test Data Sets / 474
24.7.2 Experimental Results / 475
24.8 Conclusion / 479
Acknowledgment / 481
References / 481
25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER
EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov
25.1 Introduction / 483
25.2 Methods / 485
25.2.1 Clustering Using Variability Functions / 485
25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm
/ 487
25.2.3 Description of the New Algorithm / 488
25.2.4 Time Complexity / 491
25.3 Experimental Study / 491
25.3.1 Implementation / 491
25.3.2 Synthetic Data / 491
25.3.3 Real Prokaryotic (Genomic) Data / 495
25.4 Results and Discussion / 501
25.4.1 Analysis of Synthetic Data / 501
25.4.2 Analysis of Prokaryotic Data / 502
25.5 Conclusion / 502
References / 503
VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505
26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
Christos Makris and Evangelos Theodoridis
26.1 Introduction / 507
26.2 Measures/Metrics / 508
26.3 Models of Biological Networks / 511
26.4 Reconstructing and Partitioning Biological Networks / 511
26.5 PPI Networks / 513
26.6 Mining PPI Networks-Interaction Prediction / 517
26.7 Conclusions / 519
References / 519
27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO
SPECIES INTERACTIONS 525
Andrej Aderhold, V Anne Smith, and Dirk Husmeier
27.1 Introduction / 525
27.2 Molecular Systems / 528
27.3 Ecological Systems / 528
27.4 Models and Evaluation / 529
27.4.1 Notations / 529
27.4.2 Sparse Regression and the LASSO / 530
27.4.3 Bayesian Regression / 530
27.4.4 Evaluation Metric / 531
27.5 Learning Gene Regulation Networks / 532
27.5.1 Nonhomogeneous Bayesian Regression / 533
27.5.2 Gradient Estimation / 534
27.5.3 Simulated Bio-PEPA Data / 534
27.5.4 Real mRNA Expression Profile Data / 535
27.5.5 Method Evaluation and Learned Networks / 536
27.6 Learning Species Interaction Networks / 540
27.6.1 Regression Model of Species interactions / 540
27.6.2 Multiple Global Change-Points / 541
27.6.3 Mondrian Process Change-Points / 542
27.6.4 Synthetic Data / 544
27.6.5 Simulated Population Dynamics / 544
27.6.6 Real World Plant Data / 546
27.6.7 Method Evaluation and Learned Networks / 546
27.7 Conclusion / 550
References / 550
28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING:
APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC
LEUKEMIA 555
Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas
Stamatopoulos, and Lefteris Angelis
28.1 Introduction / 555
28.2 Toll-Like Receptors / 557
28.2.1 Basics / 557
28.2.2 Structure and Signaling of TLRs / 558
28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559
28.3 Structural Equation Modeling / 560
28.3.1 Methodology of SEM Modeling / 560
28.3.2 Assumptions / 561
28.3.3 Estimation Methods / 562
28.3.4 Missing Data / 562
28.3.5 Goodness-of-Fit Indices / 563
28.3.6 Other Indications of a Misspecified Model / 565
28.4 Application / 566
28.5 Conclusion / 580
References / 581
29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi
29.1 Introduction / 585
29.2 Related Work / 587
29.3 Problem Formulation / 589
29.3.1 The Algorithm / 591
29.4 Experimental Setup / 592
29.4.1 Data sets / 592
29.4.2 Comparative Methods / 593
29.4.3 Experimental Protocol / 594
29.4.4 Evaluation Criteria / 594
29.5 Experimental Analysis / 596
29.5.1 Replenishing Missing Functions / 596
29.5.2 Predicting Unlabeled Proteins / 600
29.5.3 Component Analysis / 604
29.5.4 Run Time Analysis / 604
29.6 Conclusions / 605
Acknowledgments / 606
References / 606
INDEX 609
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.2.1 The Minimum Error Correction Model / 8
1.2.2 Probabilistic Approaches and Alternative Models / 10
1.3 Population Haplotyping / 12
1.3.1 Clark's Rule / 14
1.3.2 Pure Parsimony / 15
1.3.3 Perfect Phylogeny / 19
1.3.4 Disease Association / 21
1.3.5 Other Models / 22
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.2.1 Average Length of Optimal Barcodes / 33
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding
Problems / 36
2.4.1 Reductions from Set Covering Problem / 36
2.4.2 Reduction from Graph-Coloring Problem / 38
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.5.1 Entropy-Based Method with a Different Measure for Information Content
/ 39
2.5.2 Balanced Partitioning Approach / 40
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.2.1 Background on Whole-Genome Comparison / 44
3.2.2 Alignment-Free Methods / 45
3.2.3 Average Common Subword / 46
3.2.4 Kullback-Leibler Information Divergence / 47
3.3 Underlying Approach / 47
3.3.1 Irredundant Common Subwords / 48
3.3.2 Underlying Subwords / 49
3.3.3 Efficient Computation of Underlying Subwords / 50
3.3.4 Extension to Inversions and Complements / 53
3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53
3.4 Experimental Results / 54
3.4.1 Genome Data sets and Reference Taxonomies / 54
3.4.2 Whole-Genome Phylogeny Reconstruction / 56
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.2.1 Overall Objective Function / 67
4.2.2 Maximum Likelihood Model / 68
4.3 Motif Finding Algorithms / 70
4.3.1 DEM Motif Algorithm / 70
4.3.2 WEM Motif Finding Algorithm / 70
4.3.3 Metropolis Motif Finding Algorithm / 72
4.3.4 Gibbs Motif Finding Algorithm / 73
4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.5.1 Performance Evaluation / 76
4.5.2 CRP Binding Sites / 76
4.5.3 Multiple Motifs in Helix-Turn-Helix Protein Structure / 78
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomá Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang
6.1 Introduction / 99
6.2 Representative Protein Secondary Structure Prediction Methods / 102
6.2.1 Chou-Fasman / 103
6.2.2 GOR / 104
6.2.3 PHD / 104
6.2.4 PSIPRED / 104
6.2.5 SPINE-X / 105
6.2.6 PSSpred / 105
6.2.7 Meta Methods / 105
6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106
6.3.1 Measures / 106
6.3.2 Benchmark / 106
6.3.3 Performances / 107
6.4 Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS:
APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
Yann Guermeur and Fabien Lauer
7.1 Introduction / 114
7.2 Biological Sequence Segmentation / 115
7.3 MSVMpred / 117
7.3.1 Base Classifiers / 117
7.3.2 Ensemble Methods / 118
7.3.3 Convex Combination / 119
7.4 Postprocessing with A Generative Model / 119
7.5 Dedication to Protein Secondary Structure Prediction / 120
7.5.1 Biological Problem / 121
7.5.2 MSVMpred2 / 121
7.5.3 Hidden Semi-Markov Model / 122
7.5.4 Experimental Results / 122
7.6 Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih
8.1 Introduction / 129
8.2 A Few Basic Concepts / 130
8.2.1 Hierarchy of Protein Structures / 130
8.2.2 Secondary Structure Elements / 131
8.2.3 Structural Motifs / 132
8.2.4 Available Sources for Protein Data / 134
8.3 State of the Art / 135
8.3.1 Protein Structure Motif Search / 135
8.3.2 Promotif / 136
8.3.3 Secondary-Structure Matching / 137
8.3.4 Multiple Structural Alignment by Secondary Structures / 138
8.4 A Novel Geometrical Approach to Motif Retrieval / 138
8.4.1 Secondary Structures Cooccurrences / 138
8.4.2 Cross Motif Search / 143
8.4.3 Complete Cross Motif Search / 146
8.5 Implementation Notes / 149
8.5.1 Optimizations / 149
8.5.2 Parallel Approaches / 150
8.6 Conclusions and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY
155
Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang
9.1 Introduction / 155
9.2 Background / 156
9.2.1 Noncoding RNAs and Their Secondary Structures / 156
9.2.2 Pseudoknotted ncRNA Search Tools / 157
9.3 Methodology / 157
9.4 Results and Interpretation / 161
9.5 Conclusion / 162
References / 163
III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165
10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES
167
Wajdi Dhifli and Engelbert Mephu Nguifo
10.1 Introduction / 167
10.2 From Protein 3D-Structures to Protein Graphs / 169
10.2.1 Parsing Protein 3D-Structures into Graphs / 169
10.3 Graph Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.5.1 Problem Definition / 174
10.5.2 Candidates Generation / 176
10.5.3 Frequent Subgraph Discovery Approaches / 177
10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs /
178
10.6 Feature Selection / 179
10.6.1 Relevance of a Feature / 179
10.7 Feature Selection for Subgraphs / 180
10.7.1 Problem Statement / 180
10.7.2 Mining Top-k Subgraphs / 180
10.7.3 Clustering-Based Subgraph Selection / 181
10.7.4 Sampling-Based Approaches / 181
10.7.5 Approximate Subgraph Mining / 181
10.7.6 Discriminative Subgraph Selection / 182
10.7.7 Other Significant Subgraph Selection Approaches / 182
10.8 Discussion / 183
10.9 Conclusion / 185
Acknowledgments / 185
References / 186
11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION
OF PROTEIN TERTIARY STRUCTURES 190
Chinua Umoja, Xiaxia Yu, and Robert Harrison
11.1 Introduction / 190
11.2 Genetic Algorithms / 192
11.2.1 GA Model Selection in Protein Structure Prediction / 196
11.2.2 Common Methodology / 198
11.3 Supervised Machine Learning Algorithm / 201
11.3.1 Artificial Neural Networks / 201
11.3.2 ANNs in Protein Structure Prediction / 202
11.3.3 Support Vector Machines / 203
11.4 Fuzzy Application / 204
11.4.1 Fuzzy Logic / 204
11.4.2 Fuzzy SVMs / 204
11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205
11.4.4 Fuzzy Decision Trees / 206
11.5 Conclusion / 207
References / 208
12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat
12.1 Introduction / 212
12.2 Protein Structure Overview / 213
12.3 Technical Challenges and Open Issues / 214
12.4 Prediction Assessment / 215
12.5 Current Approaches / 216
12.5.1 DomCut / 216
12.5.2 Scooby-Domain / 217
12.5.3 FIEFDom / 218
12.5.4 Chatterjee et al. (2009) / 219
12.5.5 Drop / 219
12.6 Domain Boundary Prediction Using Enhanced General Regression Network /
220
12.6.1 Multi-Domain Benchmark Data Set / 220
12.6.2 Compact Domain Profile / 221
12.6.3 The Enhanced Semi-Parametric Model / 222
12.6.4 Training, Testing, and Validation / 225
12.6.5 Experimental Results / 226
12.7 Inter-Domain Linkers Prediction Using Compositional Index and
Simulated Annealing / 227
12.7.1 Compositional Index / 228
12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated
Annealing / 229
12.7.3 Experimental Results / 230
12.8 Conclusion / 232
References / 233
13 PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION 236
Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya
13.1 Introduction / 236
13.2 Methods / 238
13.2.1 Evolutionary Data Set Construction / 238
13.2.2 Protein Secondary Structure Information / 239
13.2.3 Method I: Intelligent Voting / 239
13.2.4 Method II: Randomized Meta-Learning / 241
13.2.5 Model Validation and Testing / 242
13.2.6 Parameter Tuning / 242
13.3 Model Evaluation and Analysis / 243
13.4 Conclusion / 245
References / 245
IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249
14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad
Jahandideh, and Kuo-Chen Chou
14.1 Introduction / 251
14.2 Protein Structure Prediction / 255
14.2.1 Secondary Structure Prediction / 255
14.2.2 Modeling of Tertiary Structure / 256
14.3 Template-Based Predictions / 257
14.3.1 Homology Modeling / 257
14.3.2 Threading Methods / 257
14.3.3 Ab initio Modeling / 257
14.4 Critical Assessment of Protein Structure Prediction / 258
14.5 Quaternary Structure Prediction / 258
14.6 Conclusion / 261
Acknowledgments / 261
References / 261
15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
Sheng-Lung Peng and Yu-Wei Tsay
15.1 Introduction / 266
15.2 Similarity in the Graph Model / 268
15.2.1 Graph Model for Proteins / 270
15.3 Measuring Structural Similarity VIA MCES / 272
15.3.1 Problem Formulation / 273
15.3.2 Constructing P-Graphs / 274
15.3.3 Constructing Line Graphs / 276
15.3.4 Constructing Modular Graphs / 276
15.3.5 Maximum Clique Detection / 277
15.3.6 Experimental Results / 277
15.4 Protein Comparison VIA Graph Spectra / 279
15.4.1 Graph Spectra / 279
15.4.2 Matrix Selection / 281
15.4.3 Graph Cospectrality and Similarity / 283
15.4.4 Cospectral Comparison / 283
15.4.5 Experimental Results / 284
15.5 Conclusion / 287
References / 287
16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN
INTERACTIONS 291
Mina Maleki, Michael Hall, and Luis Rueda
16.1 Introduction / 291
16.2 Structural Domains / 293
16.3 The Prediction Framework / 293
16.4 Feature Extraction and Prediction Properties / 294
16.4.1 Physicochemical Properties / 296
16.4.2 Domain-Based Properties / 298
16.5 Feature Selection / 299
16.5.1 Filter Methods / 299
16.5.2 Wrapper Methods / 301
16.6 Classification / 301
16.6.1 Linear Dimensionality Reduction / 301
16.6.2 Support Vector Machines / 303
16.6.3 k-Nearest Neighbor / 303
16.6.4 Naive Bayes / 304
16.7 Evaluation and Analysis / 304
16.8 Results and Discussion / 304
16.8.1 Analysis of the Prediction Properties / 304
16.8.2 Analysis of Structural DDIs / 307
16.9 Conclusion / 309
References / 310
V PATTERN RECOGNITION IN MICROARRAYS 315
17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
Hasan O¢gul
17.1 Introduction / 317
17.2 Information Retrieval: Terminology and Background / 318
17.3 Content-Based Retrieval / 320
17.4 Microarray Data and Databases / 322
17.5 Methods for Retrieving Microarray Experiments / 324
17.6 Similarity Metrics / 327
17.7 Evaluating Retrieval Performance / 329
17.8 Software Tools / 330
17.9 Conclusion and Future Directions / 331
Acknowledgment / 332
References / 332
18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa
18.1 Introduction / 335
18.2 From Microarray Image to Signal / 336
18.2.1 Signal from Oligo DNA Array Image / 336
18.2.2 Signal from Two-Color cDNA Array / 337
18.3 Microarray Signal Analysis / 337
18.3.1 Absolute Analysis and Replicates in Microarrays / 338
18.3.2 Microarray Normalization / 339
18.4 Algorithms for De Gene Selection / 339
18.4.1 Within-Between DE Gene (WB-DEG) Selection Algorithm / 340
18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection
Methods on Latin Square Data / 341
18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE
PATTERN ANALYSIS 347
Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and
Paola Bertolazzi
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.3.1 Applications / 349
19.3.2 Microarray Technology / 350
19.3.3 Microarray Workflow / 350
19.4 RNA-Seq / 351
19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353
19.6 Gene Expression Profile Analysis / 356
19.6.1 Data Definition / 356
19.6.2 Data Analysis / 357
19.6.3 Normalization and Background Correction / 357
19.6.4 Genes Clustering / 359
19.6.5 Experiment Classification / 361
19.6.6 Software Tools for Gene Expression Profile Analysis / 362
19.7 Real Case Studies / 364
19.8 Conclusions / 367
References / 368
20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
Li Teng
20.1 Introduction / 371
20.2 Patterns with Similarity / 373
20.2.1 Similarity Measurement / 374
20.2.2 Clustering / 376
20.2.3 Biclustering / 379
20.2.4 Types of Biclusters / 380
20.2.5 Measurement of the Homogeneity / 383
20.2.6 Biclustering Algorithms with Different Searching Schemes / 387
20.3 Conclusion / 391
References / 391
21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS
OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa
21.1 Overview / 394
21.1.1 Background Correction Methods / 395
21.1.2 Normalization Methods / 396
21.1.3 Literature Review / 397
21.2 Arrow Plot / 399
21.2.1 DE Genes Versus Special Genes / 399
21.2.2 Definition and Properties of the ROC Curve / 400
21.2.3 AUC and Degenerate ROC Curves / 401
21.2.4 Overlapping Coefficient / 402
21.2.5 Arrow Plot Construction / 403
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence Analysis / 405
21.4.1 Basic Principles / 405
21.4.2 Interpretation of CA Maps / 406
21.5 Impact of the Preprocessing Methods / 407
21.5.1 Class Prediction Context / 408
21.5.2 Class Comparison Context / 408
21.6 Conclusions / 412
Acknowledgments / 413
References / 413
VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417
22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
David A. Morrison
22.1 Introduction / 419
22.2 Networks and Trees / 420
22.3 Patterns and Their Processes / 424
22.4 The Types of Patterns / 427
22.5 Fingerprints / 431
22.6 Constructing Networks / 433
22.7 Multi-Labeled Trees / 435
22.8 Conclusion / 436
References / 437
23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION:
IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN
RECOGNITION 439
Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas
23.1 Introduction / 440
23.2 Overview on Methods and Frameworks for Phylogenetic Tree
Reconstruction / 440
23.2.1 Inferring Gene Trees / 441
23.2.2 Inferring Species Trees / 442
23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree
Reconstruction / 445
23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence of Diverse Evolutionary Processes on Species Tree
Reconstruction / 447
23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals
of Pattern Recognition / 449
23.7 Concluding Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
David Dao, Tomá Flouri, and Alexandros Stamatakis
24.1 Introduction / 457
24.2 Preliminaries / 459
24.3 A Naïve Approach / 462
24.4 Toward a Faster Method / 463
24.5 Improved Algorithm / 467
24.5.1 Preprocessing / 467
24.5.2 Computing Lowest Common Ancestors / 468
24.5.3 Constructing the Induced Tree / 468
24.5.4 Final Remarks / 471
24.6 Implementation / 473
24.6.1 Preprocessing / 473
24.6.2 Reconstruction / 473
24.6.3 Extracting Bipartitions / 474
24.7 Evaluation / 474
24.7.1 Test Data Sets / 474
24.7.2 Experimental Results / 475
24.8 Conclusion / 479
Acknowledgment / 481
References / 481
25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER
EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov
25.1 Introduction / 483
25.2 Methods / 485
25.2.1 Clustering Using Variability Functions / 485
25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm
/ 487
25.2.3 Description of the New Algorithm / 488
25.2.4 Time Complexity / 491
25.3 Experimental Study / 491
25.3.1 Implementation / 491
25.3.2 Synthetic Data / 491
25.3.3 Real Prokaryotic (Genomic) Data / 495
25.4 Results and Discussion / 501
25.4.1 Analysis of Synthetic Data / 501
25.4.2 Analysis of Prokaryotic Data / 502
25.5 Conclusion / 502
References / 503
VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505
26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
Christos Makris and Evangelos Theodoridis
26.1 Introduction / 507
26.2 Measures/Metrics / 508
26.3 Models of Biological Networks / 511
26.4 Reconstructing and Partitioning Biological Networks / 511
26.5 PPI Networks / 513
26.6 Mining PPI Networks-Interaction Prediction / 517
26.7 Conclusions / 519
References / 519
27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO
SPECIES INTERACTIONS 525
Andrej Aderhold, V Anne Smith, and Dirk Husmeier
27.1 Introduction / 525
27.2 Molecular Systems / 528
27.3 Ecological Systems / 528
27.4 Models and Evaluation / 529
27.4.1 Notations / 529
27.4.2 Sparse Regression and the LASSO / 530
27.4.3 Bayesian Regression / 530
27.4.4 Evaluation Metric / 531
27.5 Learning Gene Regulation Networks / 532
27.5.1 Nonhomogeneous Bayesian Regression / 533
27.5.2 Gradient Estimation / 534
27.5.3 Simulated Bio-PEPA Data / 534
27.5.4 Real mRNA Expression Profile Data / 535
27.5.5 Method Evaluation and Learned Networks / 536
27.6 Learning Species Interaction Networks / 540
27.6.1 Regression Model of Species interactions / 540
27.6.2 Multiple Global Change-Points / 541
27.6.3 Mondrian Process Change-Points / 542
27.6.4 Synthetic Data / 544
27.6.5 Simulated Population Dynamics / 544
27.6.6 Real World Plant Data / 546
27.6.7 Method Evaluation and Learned Networks / 546
27.7 Conclusion / 550
References / 550
28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING:
APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC
LEUKEMIA 555
Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas
Stamatopoulos, and Lefteris Angelis
28.1 Introduction / 555
28.2 Toll-Like Receptors / 557
28.2.1 Basics / 557
28.2.2 Structure and Signaling of TLRs / 558
28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559
28.3 Structural Equation Modeling / 560
28.3.1 Methodology of SEM Modeling / 560
28.3.2 Assumptions / 561
28.3.3 Estimation Methods / 562
28.3.4 Missing Data / 562
28.3.5 Goodness-of-Fit Indices / 563
28.3.6 Other Indications of a Misspecified Model / 565
28.4 Application / 566
28.5 Conclusion / 580
References / 581
29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi
29.1 Introduction / 585
29.2 Related Work / 587
29.3 Problem Formulation / 589
29.3.1 The Algorithm / 591
29.4 Experimental Setup / 592
29.4.1 Data sets / 592
29.4.2 Comparative Methods / 593
29.4.3 Experimental Protocol / 594
29.4.4 Evaluation Criteria / 594
29.5 Experimental Analysis / 596
29.5.1 Replenishing Missing Functions / 596
29.5.2 Predicting Unlabeled Proteins / 600
29.5.3 Component Analysis / 604
29.5.4 Run Time Analysis / 604
29.6 Conclusions / 605
Acknowledgments / 606
References / 606
INDEX 609
LIST OF CONTRIBUTORS xxi
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.2.1 The Minimum Error Correction Model / 8
1.2.2 Probabilistic Approaches and Alternative Models / 10
1.3 Population Haplotyping / 12
1.3.1 Clark's Rule / 14
1.3.2 Pure Parsimony / 15
1.3.3 Perfect Phylogeny / 19
1.3.4 Disease Association / 21
1.3.5 Other Models / 22
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.2.1 Average Length of Optimal Barcodes / 33
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding
Problems / 36
2.4.1 Reductions from Set Covering Problem / 36
2.4.2 Reduction from Graph-Coloring Problem / 38
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.5.1 Entropy-Based Method with a Different Measure for Information Content
/ 39
2.5.2 Balanced Partitioning Approach / 40
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.2.1 Background on Whole-Genome Comparison / 44
3.2.2 Alignment-Free Methods / 45
3.2.3 Average Common Subword / 46
3.2.4 Kullback-Leibler Information Divergence / 47
3.3 Underlying Approach / 47
3.3.1 Irredundant Common Subwords / 48
3.3.2 Underlying Subwords / 49
3.3.3 Efficient Computation of Underlying Subwords / 50
3.3.4 Extension to Inversions and Complements / 53
3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53
3.4 Experimental Results / 54
3.4.1 Genome Data sets and Reference Taxonomies / 54
3.4.2 Whole-Genome Phylogeny Reconstruction / 56
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.2.1 Overall Objective Function / 67
4.2.2 Maximum Likelihood Model / 68
4.3 Motif Finding Algorithms / 70
4.3.1 DEM Motif Algorithm / 70
4.3.2 WEM Motif Finding Algorithm / 70
4.3.3 Metropolis Motif Finding Algorithm / 72
4.3.4 Gibbs Motif Finding Algorithm / 73
4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.5.1 Performance Evaluation / 76
4.5.2 CRP Binding Sites / 76
4.5.3 Multiple Motifs in Helix-Turn-Helix Protein Structure / 78
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomá Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang
6.1 Introduction / 99
6.2 Representative Protein Secondary Structure Prediction Methods / 102
6.2.1 Chou-Fasman / 103
6.2.2 GOR / 104
6.2.3 PHD / 104
6.2.4 PSIPRED / 104
6.2.5 SPINE-X / 105
6.2.6 PSSpred / 105
6.2.7 Meta Methods / 105
6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106
6.3.1 Measures / 106
6.3.2 Benchmark / 106
6.3.3 Performances / 107
6.4 Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS:
APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
Yann Guermeur and Fabien Lauer
7.1 Introduction / 114
7.2 Biological Sequence Segmentation / 115
7.3 MSVMpred / 117
7.3.1 Base Classifiers / 117
7.3.2 Ensemble Methods / 118
7.3.3 Convex Combination / 119
7.4 Postprocessing with A Generative Model / 119
7.5 Dedication to Protein Secondary Structure Prediction / 120
7.5.1 Biological Problem / 121
7.5.2 MSVMpred2 / 121
7.5.3 Hidden Semi-Markov Model / 122
7.5.4 Experimental Results / 122
7.6 Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih
8.1 Introduction / 129
8.2 A Few Basic Concepts / 130
8.2.1 Hierarchy of Protein Structures / 130
8.2.2 Secondary Structure Elements / 131
8.2.3 Structural Motifs / 132
8.2.4 Available Sources for Protein Data / 134
8.3 State of the Art / 135
8.3.1 Protein Structure Motif Search / 135
8.3.2 Promotif / 136
8.3.3 Secondary-Structure Matching / 137
8.3.4 Multiple Structural Alignment by Secondary Structures / 138
8.4 A Novel Geometrical Approach to Motif Retrieval / 138
8.4.1 Secondary Structures Cooccurrences / 138
8.4.2 Cross Motif Search / 143
8.4.3 Complete Cross Motif Search / 146
8.5 Implementation Notes / 149
8.5.1 Optimizations / 149
8.5.2 Parallel Approaches / 150
8.6 Conclusions and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY
155
Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang
9.1 Introduction / 155
9.2 Background / 156
9.2.1 Noncoding RNAs and Their Secondary Structures / 156
9.2.2 Pseudoknotted ncRNA Search Tools / 157
9.3 Methodology / 157
9.4 Results and Interpretation / 161
9.5 Conclusion / 162
References / 163
III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165
10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES
167
Wajdi Dhifli and Engelbert Mephu Nguifo
10.1 Introduction / 167
10.2 From Protein 3D-Structures to Protein Graphs / 169
10.2.1 Parsing Protein 3D-Structures into Graphs / 169
10.3 Graph Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.5.1 Problem Definition / 174
10.5.2 Candidates Generation / 176
10.5.3 Frequent Subgraph Discovery Approaches / 177
10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs /
178
10.6 Feature Selection / 179
10.6.1 Relevance of a Feature / 179
10.7 Feature Selection for Subgraphs / 180
10.7.1 Problem Statement / 180
10.7.2 Mining Top-k Subgraphs / 180
10.7.3 Clustering-Based Subgraph Selection / 181
10.7.4 Sampling-Based Approaches / 181
10.7.5 Approximate Subgraph Mining / 181
10.7.6 Discriminative Subgraph Selection / 182
10.7.7 Other Significant Subgraph Selection Approaches / 182
10.8 Discussion / 183
10.9 Conclusion / 185
Acknowledgments / 185
References / 186
11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION
OF PROTEIN TERTIARY STRUCTURES 190
Chinua Umoja, Xiaxia Yu, and Robert Harrison
11.1 Introduction / 190
11.2 Genetic Algorithms / 192
11.2.1 GA Model Selection in Protein Structure Prediction / 196
11.2.2 Common Methodology / 198
11.3 Supervised Machine Learning Algorithm / 201
11.3.1 Artificial Neural Networks / 201
11.3.2 ANNs in Protein Structure Prediction / 202
11.3.3 Support Vector Machines / 203
11.4 Fuzzy Application / 204
11.4.1 Fuzzy Logic / 204
11.4.2 Fuzzy SVMs / 204
11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205
11.4.4 Fuzzy Decision Trees / 206
11.5 Conclusion / 207
References / 208
12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat
12.1 Introduction / 212
12.2 Protein Structure Overview / 213
12.3 Technical Challenges and Open Issues / 214
12.4 Prediction Assessment / 215
12.5 Current Approaches / 216
12.5.1 DomCut / 216
12.5.2 Scooby-Domain / 217
12.5.3 FIEFDom / 218
12.5.4 Chatterjee et al. (2009) / 219
12.5.5 Drop / 219
12.6 Domain Boundary Prediction Using Enhanced General Regression Network /
220
12.6.1 Multi-Domain Benchmark Data Set / 220
12.6.2 Compact Domain Profile / 221
12.6.3 The Enhanced Semi-Parametric Model / 222
12.6.4 Training, Testing, and Validation / 225
12.6.5 Experimental Results / 226
12.7 Inter-Domain Linkers Prediction Using Compositional Index and
Simulated Annealing / 227
12.7.1 Compositional Index / 228
12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated
Annealing / 229
12.7.3 Experimental Results / 230
12.8 Conclusion / 232
References / 233
13 PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION 236
Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya
13.1 Introduction / 236
13.2 Methods / 238
13.2.1 Evolutionary Data Set Construction / 238
13.2.2 Protein Secondary Structure Information / 239
13.2.3 Method I: Intelligent Voting / 239
13.2.4 Method II: Randomized Meta-Learning / 241
13.2.5 Model Validation and Testing / 242
13.2.6 Parameter Tuning / 242
13.3 Model Evaluation and Analysis / 243
13.4 Conclusion / 245
References / 245
IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249
14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad
Jahandideh, and Kuo-Chen Chou
14.1 Introduction / 251
14.2 Protein Structure Prediction / 255
14.2.1 Secondary Structure Prediction / 255
14.2.2 Modeling of Tertiary Structure / 256
14.3 Template-Based Predictions / 257
14.3.1 Homology Modeling / 257
14.3.2 Threading Methods / 257
14.3.3 Ab initio Modeling / 257
14.4 Critical Assessment of Protein Structure Prediction / 258
14.5 Quaternary Structure Prediction / 258
14.6 Conclusion / 261
Acknowledgments / 261
References / 261
15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
Sheng-Lung Peng and Yu-Wei Tsay
15.1 Introduction / 266
15.2 Similarity in the Graph Model / 268
15.2.1 Graph Model for Proteins / 270
15.3 Measuring Structural Similarity VIA MCES / 272
15.3.1 Problem Formulation / 273
15.3.2 Constructing P-Graphs / 274
15.3.3 Constructing Line Graphs / 276
15.3.4 Constructing Modular Graphs / 276
15.3.5 Maximum Clique Detection / 277
15.3.6 Experimental Results / 277
15.4 Protein Comparison VIA Graph Spectra / 279
15.4.1 Graph Spectra / 279
15.4.2 Matrix Selection / 281
15.4.3 Graph Cospectrality and Similarity / 283
15.4.4 Cospectral Comparison / 283
15.4.5 Experimental Results / 284
15.5 Conclusion / 287
References / 287
16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN
INTERACTIONS 291
Mina Maleki, Michael Hall, and Luis Rueda
16.1 Introduction / 291
16.2 Structural Domains / 293
16.3 The Prediction Framework / 293
16.4 Feature Extraction and Prediction Properties / 294
16.4.1 Physicochemical Properties / 296
16.4.2 Domain-Based Properties / 298
16.5 Feature Selection / 299
16.5.1 Filter Methods / 299
16.5.2 Wrapper Methods / 301
16.6 Classification / 301
16.6.1 Linear Dimensionality Reduction / 301
16.6.2 Support Vector Machines / 303
16.6.3 k-Nearest Neighbor / 303
16.6.4 Naive Bayes / 304
16.7 Evaluation and Analysis / 304
16.8 Results and Discussion / 304
16.8.1 Analysis of the Prediction Properties / 304
16.8.2 Analysis of Structural DDIs / 307
16.9 Conclusion / 309
References / 310
V PATTERN RECOGNITION IN MICROARRAYS 315
17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
Hasan O¢gul
17.1 Introduction / 317
17.2 Information Retrieval: Terminology and Background / 318
17.3 Content-Based Retrieval / 320
17.4 Microarray Data and Databases / 322
17.5 Methods for Retrieving Microarray Experiments / 324
17.6 Similarity Metrics / 327
17.7 Evaluating Retrieval Performance / 329
17.8 Software Tools / 330
17.9 Conclusion and Future Directions / 331
Acknowledgment / 332
References / 332
18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa
18.1 Introduction / 335
18.2 From Microarray Image to Signal / 336
18.2.1 Signal from Oligo DNA Array Image / 336
18.2.2 Signal from Two-Color cDNA Array / 337
18.3 Microarray Signal Analysis / 337
18.3.1 Absolute Analysis and Replicates in Microarrays / 338
18.3.2 Microarray Normalization / 339
18.4 Algorithms for De Gene Selection / 339
18.4.1 Within-Between DE Gene (WB-DEG) Selection Algorithm / 340
18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection
Methods on Latin Square Data / 341
18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE
PATTERN ANALYSIS 347
Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and
Paola Bertolazzi
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.3.1 Applications / 349
19.3.2 Microarray Technology / 350
19.3.3 Microarray Workflow / 350
19.4 RNA-Seq / 351
19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353
19.6 Gene Expression Profile Analysis / 356
19.6.1 Data Definition / 356
19.6.2 Data Analysis / 357
19.6.3 Normalization and Background Correction / 357
19.6.4 Genes Clustering / 359
19.6.5 Experiment Classification / 361
19.6.6 Software Tools for Gene Expression Profile Analysis / 362
19.7 Real Case Studies / 364
19.8 Conclusions / 367
References / 368
20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
Li Teng
20.1 Introduction / 371
20.2 Patterns with Similarity / 373
20.2.1 Similarity Measurement / 374
20.2.2 Clustering / 376
20.2.3 Biclustering / 379
20.2.4 Types of Biclusters / 380
20.2.5 Measurement of the Homogeneity / 383
20.2.6 Biclustering Algorithms with Different Searching Schemes / 387
20.3 Conclusion / 391
References / 391
21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS
OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa
21.1 Overview / 394
21.1.1 Background Correction Methods / 395
21.1.2 Normalization Methods / 396
21.1.3 Literature Review / 397
21.2 Arrow Plot / 399
21.2.1 DE Genes Versus Special Genes / 399
21.2.2 Definition and Properties of the ROC Curve / 400
21.2.3 AUC and Degenerate ROC Curves / 401
21.2.4 Overlapping Coefficient / 402
21.2.5 Arrow Plot Construction / 403
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence Analysis / 405
21.4.1 Basic Principles / 405
21.4.2 Interpretation of CA Maps / 406
21.5 Impact of the Preprocessing Methods / 407
21.5.1 Class Prediction Context / 408
21.5.2 Class Comparison Context / 408
21.6 Conclusions / 412
Acknowledgments / 413
References / 413
VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417
22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
David A. Morrison
22.1 Introduction / 419
22.2 Networks and Trees / 420
22.3 Patterns and Their Processes / 424
22.4 The Types of Patterns / 427
22.5 Fingerprints / 431
22.6 Constructing Networks / 433
22.7 Multi-Labeled Trees / 435
22.8 Conclusion / 436
References / 437
23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION:
IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN
RECOGNITION 439
Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas
23.1 Introduction / 440
23.2 Overview on Methods and Frameworks for Phylogenetic Tree
Reconstruction / 440
23.2.1 Inferring Gene Trees / 441
23.2.2 Inferring Species Trees / 442
23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree
Reconstruction / 445
23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence of Diverse Evolutionary Processes on Species Tree
Reconstruction / 447
23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals
of Pattern Recognition / 449
23.7 Concluding Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
David Dao, Tomá Flouri, and Alexandros Stamatakis
24.1 Introduction / 457
24.2 Preliminaries / 459
24.3 A Naïve Approach / 462
24.4 Toward a Faster Method / 463
24.5 Improved Algorithm / 467
24.5.1 Preprocessing / 467
24.5.2 Computing Lowest Common Ancestors / 468
24.5.3 Constructing the Induced Tree / 468
24.5.4 Final Remarks / 471
24.6 Implementation / 473
24.6.1 Preprocessing / 473
24.6.2 Reconstruction / 473
24.6.3 Extracting Bipartitions / 474
24.7 Evaluation / 474
24.7.1 Test Data Sets / 474
24.7.2 Experimental Results / 475
24.8 Conclusion / 479
Acknowledgment / 481
References / 481
25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER
EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov
25.1 Introduction / 483
25.2 Methods / 485
25.2.1 Clustering Using Variability Functions / 485
25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm
/ 487
25.2.3 Description of the New Algorithm / 488
25.2.4 Time Complexity / 491
25.3 Experimental Study / 491
25.3.1 Implementation / 491
25.3.2 Synthetic Data / 491
25.3.3 Real Prokaryotic (Genomic) Data / 495
25.4 Results and Discussion / 501
25.4.1 Analysis of Synthetic Data / 501
25.4.2 Analysis of Prokaryotic Data / 502
25.5 Conclusion / 502
References / 503
VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505
26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
Christos Makris and Evangelos Theodoridis
26.1 Introduction / 507
26.2 Measures/Metrics / 508
26.3 Models of Biological Networks / 511
26.4 Reconstructing and Partitioning Biological Networks / 511
26.5 PPI Networks / 513
26.6 Mining PPI Networks-Interaction Prediction / 517
26.7 Conclusions / 519
References / 519
27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO
SPECIES INTERACTIONS 525
Andrej Aderhold, V Anne Smith, and Dirk Husmeier
27.1 Introduction / 525
27.2 Molecular Systems / 528
27.3 Ecological Systems / 528
27.4 Models and Evaluation / 529
27.4.1 Notations / 529
27.4.2 Sparse Regression and the LASSO / 530
27.4.3 Bayesian Regression / 530
27.4.4 Evaluation Metric / 531
27.5 Learning Gene Regulation Networks / 532
27.5.1 Nonhomogeneous Bayesian Regression / 533
27.5.2 Gradient Estimation / 534
27.5.3 Simulated Bio-PEPA Data / 534
27.5.4 Real mRNA Expression Profile Data / 535
27.5.5 Method Evaluation and Learned Networks / 536
27.6 Learning Species Interaction Networks / 540
27.6.1 Regression Model of Species interactions / 540
27.6.2 Multiple Global Change-Points / 541
27.6.3 Mondrian Process Change-Points / 542
27.6.4 Synthetic Data / 544
27.6.5 Simulated Population Dynamics / 544
27.6.6 Real World Plant Data / 546
27.6.7 Method Evaluation and Learned Networks / 546
27.7 Conclusion / 550
References / 550
28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING:
APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC
LEUKEMIA 555
Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas
Stamatopoulos, and Lefteris Angelis
28.1 Introduction / 555
28.2 Toll-Like Receptors / 557
28.2.1 Basics / 557
28.2.2 Structure and Signaling of TLRs / 558
28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559
28.3 Structural Equation Modeling / 560
28.3.1 Methodology of SEM Modeling / 560
28.3.2 Assumptions / 561
28.3.3 Estimation Methods / 562
28.3.4 Missing Data / 562
28.3.5 Goodness-of-Fit Indices / 563
28.3.6 Other Indications of a Misspecified Model / 565
28.4 Application / 566
28.5 Conclusion / 580
References / 581
29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi
29.1 Introduction / 585
29.2 Related Work / 587
29.3 Problem Formulation / 589
29.3.1 The Algorithm / 591
29.4 Experimental Setup / 592
29.4.1 Data sets / 592
29.4.2 Comparative Methods / 593
29.4.3 Experimental Protocol / 594
29.4.4 Evaluation Criteria / 594
29.5 Experimental Analysis / 596
29.5.1 Replenishing Missing Functions / 596
29.5.2 Predicting Unlabeled Proteins / 600
29.5.3 Component Analysis / 604
29.5.4 Run Time Analysis / 604
29.6 Conclusions / 605
Acknowledgments / 606
References / 606
INDEX 609
PREFACE xxvii
I PATTERN RECOGNITION IN SEQUENCES 1
1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
Giuseppe Lancia
1.1 Introduction / 3
1.2 Single Individual Haplotyping / 5
1.2.1 The Minimum Error Correction Model / 8
1.2.2 Probabilistic Approaches and Alternative Models / 10
1.3 Population Haplotyping / 12
1.3.1 Clark's Rule / 14
1.3.2 Pure Parsimony / 15
1.3.3 Perfect Phylogeny / 19
1.3.4 Disease Association / 21
1.3.5 Other Models / 22
References / 23
2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
Sima Behpour and Bhaskar DasGupta
2.1 Introduction / 28
2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32
2.2.1 Average Length of Optimal Barcodes / 33
2.3 Entropy-Based Information Content Technique for Designing
Approximation Algorithms for String Barcoding Problems / 34
2.4 Techniques for Proving Inapproximability Results for String Barcoding
Problems / 36
2.4.1 Reductions from Set Covering Problem / 36
2.4.2 Reduction from Graph-Coloring Problem / 38
2.5 Heuristic Algorithms for String Barcoding Problems / 39
2.5.1 Entropy-Based Method with a Different Measure for Information Content
/ 39
2.5.2 Balanced Partitioning Approach / 40
2.6 Conclusion / 40
Acknowledgments / 41
References / 41
3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
Matteo Comin and Davide Verzotto
3.1 Introduction / 43
3.2 Whole-Genome Sequence Analysis / 44
3.2.1 Background on Whole-Genome Comparison / 44
3.2.2 Alignment-Free Methods / 45
3.2.3 Average Common Subword / 46
3.2.4 Kullback-Leibler Information Divergence / 47
3.3 Underlying Approach / 47
3.3.1 Irredundant Common Subwords / 48
3.3.2 Underlying Subwords / 49
3.3.3 Efficient Computation of Underlying Subwords / 50
3.3.4 Extension to Inversions and Complements / 53
3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53
3.4 Experimental Results / 54
3.4.1 Genome Data sets and Reference Taxonomies / 54
3.4.2 Whole-Genome Phylogeny Reconstruction / 56
3.5 Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
Chengpeng Bi
4.1 Introduction / 65
4.2 Multiple Sequence Local Alignment / 67
4.2.1 Overall Objective Function / 67
4.2.2 Maximum Likelihood Model / 68
4.3 Motif Finding Algorithms / 70
4.3.1 DEM Motif Algorithm / 70
4.3.2 WEM Motif Finding Algorithm / 70
4.3.3 Metropolis Motif Finding Algorithm / 72
4.3.4 Gibbs Motif Finding Algorithm / 73
4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74
4.4 Time Complexity / 75
4.5 Case Studies / 75
4.5.1 Performance Evaluation / 76
4.5.2 CRP Binding Sites / 76
4.5.3 Multiple Motifs in Helix-Turn-Helix Protein Structure / 78
4.6 Conclusion / 80
References / 81
5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
Carl Barton, Tomá Flouri, Costas S. Iliopoulos, and Solon P. Pissis
5.1 Introduction / 83
5.2 Definitions and Notation / 85
5.3 Problem Definition / 87
5.4 Algorithms / 88
5.5 Conclusion / 94
References / 95
II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97
6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang
6.1 Introduction / 99
6.2 Representative Protein Secondary Structure Prediction Methods / 102
6.2.1 Chou-Fasman / 103
6.2.2 GOR / 104
6.2.3 PHD / 104
6.2.4 PSIPRED / 104
6.2.5 SPINE-X / 105
6.2.6 PSSpred / 105
6.2.7 Meta Methods / 105
6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106
6.3.1 Measures / 106
6.3.2 Benchmark / 106
6.3.3 Performances / 107
6.4 Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS:
APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
Yann Guermeur and Fabien Lauer
7.1 Introduction / 114
7.2 Biological Sequence Segmentation / 115
7.3 MSVMpred / 117
7.3.1 Base Classifiers / 117
7.3.2 Ensemble Methods / 118
7.3.3 Convex Combination / 119
7.4 Postprocessing with A Generative Model / 119
7.5 Dedication to Protein Secondary Structure Prediction / 120
7.5.1 Biological Problem / 121
7.5.2 MSVMpred2 / 121
7.5.3 Hidden Semi-Markov Model / 122
7.5.4 Experimental Results / 122
7.6 Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih
8.1 Introduction / 129
8.2 A Few Basic Concepts / 130
8.2.1 Hierarchy of Protein Structures / 130
8.2.2 Secondary Structure Elements / 131
8.2.3 Structural Motifs / 132
8.2.4 Available Sources for Protein Data / 134
8.3 State of the Art / 135
8.3.1 Protein Structure Motif Search / 135
8.3.2 Promotif / 136
8.3.3 Secondary-Structure Matching / 137
8.3.4 Multiple Structural Alignment by Secondary Structures / 138
8.4 A Novel Geometrical Approach to Motif Retrieval / 138
8.4.1 Secondary Structures Cooccurrences / 138
8.4.2 Cross Motif Search / 143
8.4.3 Complete Cross Motif Search / 146
8.5 Implementation Notes / 149
8.5.1 Optimizations / 149
8.5.2 Parallel Approaches / 150
8.6 Conclusions and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY
155
Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang
9.1 Introduction / 155
9.2 Background / 156
9.2.1 Noncoding RNAs and Their Secondary Structures / 156
9.2.2 Pseudoknotted ncRNA Search Tools / 157
9.3 Methodology / 157
9.4 Results and Interpretation / 161
9.5 Conclusion / 162
References / 163
III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165
10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES
167
Wajdi Dhifli and Engelbert Mephu Nguifo
10.1 Introduction / 167
10.2 From Protein 3D-Structures to Protein Graphs / 169
10.2.1 Parsing Protein 3D-Structures into Graphs / 169
10.3 Graph Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.5.1 Problem Definition / 174
10.5.2 Candidates Generation / 176
10.5.3 Frequent Subgraph Discovery Approaches / 177
10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs /
178
10.6 Feature Selection / 179
10.6.1 Relevance of a Feature / 179
10.7 Feature Selection for Subgraphs / 180
10.7.1 Problem Statement / 180
10.7.2 Mining Top-k Subgraphs / 180
10.7.3 Clustering-Based Subgraph Selection / 181
10.7.4 Sampling-Based Approaches / 181
10.7.5 Approximate Subgraph Mining / 181
10.7.6 Discriminative Subgraph Selection / 182
10.7.7 Other Significant Subgraph Selection Approaches / 182
10.8 Discussion / 183
10.9 Conclusion / 185
Acknowledgments / 185
References / 186
11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION
OF PROTEIN TERTIARY STRUCTURES 190
Chinua Umoja, Xiaxia Yu, and Robert Harrison
11.1 Introduction / 190
11.2 Genetic Algorithms / 192
11.2.1 GA Model Selection in Protein Structure Prediction / 196
11.2.2 Common Methodology / 198
11.3 Supervised Machine Learning Algorithm / 201
11.3.1 Artificial Neural Networks / 201
11.3.2 ANNs in Protein Structure Prediction / 202
11.3.3 Support Vector Machines / 203
11.4 Fuzzy Application / 204
11.4.1 Fuzzy Logic / 204
11.4.2 Fuzzy SVMs / 204
11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205
11.4.4 Fuzzy Decision Trees / 206
11.5 Conclusion / 207
References / 208
12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat
12.1 Introduction / 212
12.2 Protein Structure Overview / 213
12.3 Technical Challenges and Open Issues / 214
12.4 Prediction Assessment / 215
12.5 Current Approaches / 216
12.5.1 DomCut / 216
12.5.2 Scooby-Domain / 217
12.5.3 FIEFDom / 218
12.5.4 Chatterjee et al. (2009) / 219
12.5.5 Drop / 219
12.6 Domain Boundary Prediction Using Enhanced General Regression Network /
220
12.6.1 Multi-Domain Benchmark Data Set / 220
12.6.2 Compact Domain Profile / 221
12.6.3 The Enhanced Semi-Parametric Model / 222
12.6.4 Training, Testing, and Validation / 225
12.6.5 Experimental Results / 226
12.7 Inter-Domain Linkers Prediction Using Compositional Index and
Simulated Annealing / 227
12.7.1 Compositional Index / 228
12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated
Annealing / 229
12.7.3 Experimental Results / 230
12.8 Conclusion / 232
References / 233
13 PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION 236
Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya
13.1 Introduction / 236
13.2 Methods / 238
13.2.1 Evolutionary Data Set Construction / 238
13.2.2 Protein Secondary Structure Information / 239
13.2.3 Method I: Intelligent Voting / 239
13.2.4 Method II: Randomized Meta-Learning / 241
13.2.5 Model Validation and Testing / 242
13.2.6 Parameter Tuning / 242
13.3 Model Evaluation and Analysis / 243
13.4 Conclusion / 245
References / 245
IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249
14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad
Jahandideh, and Kuo-Chen Chou
14.1 Introduction / 251
14.2 Protein Structure Prediction / 255
14.2.1 Secondary Structure Prediction / 255
14.2.2 Modeling of Tertiary Structure / 256
14.3 Template-Based Predictions / 257
14.3.1 Homology Modeling / 257
14.3.2 Threading Methods / 257
14.3.3 Ab initio Modeling / 257
14.4 Critical Assessment of Protein Structure Prediction / 258
14.5 Quaternary Structure Prediction / 258
14.6 Conclusion / 261
Acknowledgments / 261
References / 261
15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
Sheng-Lung Peng and Yu-Wei Tsay
15.1 Introduction / 266
15.2 Similarity in the Graph Model / 268
15.2.1 Graph Model for Proteins / 270
15.3 Measuring Structural Similarity VIA MCES / 272
15.3.1 Problem Formulation / 273
15.3.2 Constructing P-Graphs / 274
15.3.3 Constructing Line Graphs / 276
15.3.4 Constructing Modular Graphs / 276
15.3.5 Maximum Clique Detection / 277
15.3.6 Experimental Results / 277
15.4 Protein Comparison VIA Graph Spectra / 279
15.4.1 Graph Spectra / 279
15.4.2 Matrix Selection / 281
15.4.3 Graph Cospectrality and Similarity / 283
15.4.4 Cospectral Comparison / 283
15.4.5 Experimental Results / 284
15.5 Conclusion / 287
References / 287
16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN
INTERACTIONS 291
Mina Maleki, Michael Hall, and Luis Rueda
16.1 Introduction / 291
16.2 Structural Domains / 293
16.3 The Prediction Framework / 293
16.4 Feature Extraction and Prediction Properties / 294
16.4.1 Physicochemical Properties / 296
16.4.2 Domain-Based Properties / 298
16.5 Feature Selection / 299
16.5.1 Filter Methods / 299
16.5.2 Wrapper Methods / 301
16.6 Classification / 301
16.6.1 Linear Dimensionality Reduction / 301
16.6.2 Support Vector Machines / 303
16.6.3 k-Nearest Neighbor / 303
16.6.4 Naive Bayes / 304
16.7 Evaluation and Analysis / 304
16.8 Results and Discussion / 304
16.8.1 Analysis of the Prediction Properties / 304
16.8.2 Analysis of Structural DDIs / 307
16.9 Conclusion / 309
References / 310
V PATTERN RECOGNITION IN MICROARRAYS 315
17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
Hasan O¢gul
17.1 Introduction / 317
17.2 Information Retrieval: Terminology and Background / 318
17.3 Content-Based Retrieval / 320
17.4 Microarray Data and Databases / 322
17.5 Methods for Retrieving Microarray Experiments / 324
17.6 Similarity Metrics / 327
17.7 Evaluating Retrieval Performance / 329
17.8 Software Tools / 330
17.9 Conclusion and Future Directions / 331
Acknowledgment / 332
References / 332
18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa
18.1 Introduction / 335
18.2 From Microarray Image to Signal / 336
18.2.1 Signal from Oligo DNA Array Image / 336
18.2.2 Signal from Two-Color cDNA Array / 337
18.3 Microarray Signal Analysis / 337
18.3.1 Absolute Analysis and Replicates in Microarrays / 338
18.3.2 Microarray Normalization / 339
18.4 Algorithms for De Gene Selection / 339
18.4.1 Within-Between DE Gene (WB-DEG) Selection Algorithm / 340
18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection
Methods on Latin Square Data / 341
18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE
PATTERN ANALYSIS 347
Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and
Paola Bertolazzi
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.3.1 Applications / 349
19.3.2 Microarray Technology / 350
19.3.3 Microarray Workflow / 350
19.4 RNA-Seq / 351
19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353
19.6 Gene Expression Profile Analysis / 356
19.6.1 Data Definition / 356
19.6.2 Data Analysis / 357
19.6.3 Normalization and Background Correction / 357
19.6.4 Genes Clustering / 359
19.6.5 Experiment Classification / 361
19.6.6 Software Tools for Gene Expression Profile Analysis / 362
19.7 Real Case Studies / 364
19.8 Conclusions / 367
References / 368
20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
Li Teng
20.1 Introduction / 371
20.2 Patterns with Similarity / 373
20.2.1 Similarity Measurement / 374
20.2.2 Clustering / 376
20.2.3 Biclustering / 379
20.2.4 Types of Biclusters / 380
20.2.5 Measurement of the Homogeneity / 383
20.2.6 Biclustering Algorithms with Different Searching Schemes / 387
20.3 Conclusion / 391
References / 391
21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS
OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa
21.1 Overview / 394
21.1.1 Background Correction Methods / 395
21.1.2 Normalization Methods / 396
21.1.3 Literature Review / 397
21.2 Arrow Plot / 399
21.2.1 DE Genes Versus Special Genes / 399
21.2.2 Definition and Properties of the ROC Curve / 400
21.2.3 AUC and Degenerate ROC Curves / 401
21.2.4 Overlapping Coefficient / 402
21.2.5 Arrow Plot Construction / 403
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence Analysis / 405
21.4.1 Basic Principles / 405
21.4.2 Interpretation of CA Maps / 406
21.5 Impact of the Preprocessing Methods / 407
21.5.1 Class Prediction Context / 408
21.5.2 Class Comparison Context / 408
21.6 Conclusions / 412
Acknowledgments / 413
References / 413
VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417
22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
David A. Morrison
22.1 Introduction / 419
22.2 Networks and Trees / 420
22.3 Patterns and Their Processes / 424
22.4 The Types of Patterns / 427
22.5 Fingerprints / 431
22.6 Constructing Networks / 433
22.7 Multi-Labeled Trees / 435
22.8 Conclusion / 436
References / 437
23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION:
IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN
RECOGNITION 439
Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas
23.1 Introduction / 440
23.2 Overview on Methods and Frameworks for Phylogenetic Tree
Reconstruction / 440
23.2.1 Inferring Gene Trees / 441
23.2.2 Inferring Species Trees / 442
23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree
Reconstruction / 445
23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence of Diverse Evolutionary Processes on Species Tree
Reconstruction / 447
23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals
of Pattern Recognition / 449
23.7 Concluding Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
David Dao, Tomá Flouri, and Alexandros Stamatakis
24.1 Introduction / 457
24.2 Preliminaries / 459
24.3 A Naïve Approach / 462
24.4 Toward a Faster Method / 463
24.5 Improved Algorithm / 467
24.5.1 Preprocessing / 467
24.5.2 Computing Lowest Common Ancestors / 468
24.5.3 Constructing the Induced Tree / 468
24.5.4 Final Remarks / 471
24.6 Implementation / 473
24.6.1 Preprocessing / 473
24.6.2 Reconstruction / 473
24.6.3 Extracting Bipartitions / 474
24.7 Evaluation / 474
24.7.1 Test Data Sets / 474
24.7.2 Experimental Results / 475
24.8 Conclusion / 479
Acknowledgment / 481
References / 481
25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER
EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov
25.1 Introduction / 483
25.2 Methods / 485
25.2.1 Clustering Using Variability Functions / 485
25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm
/ 487
25.2.3 Description of the New Algorithm / 488
25.2.4 Time Complexity / 491
25.3 Experimental Study / 491
25.3.1 Implementation / 491
25.3.2 Synthetic Data / 491
25.3.3 Real Prokaryotic (Genomic) Data / 495
25.4 Results and Discussion / 501
25.4.1 Analysis of Synthetic Data / 501
25.4.2 Analysis of Prokaryotic Data / 502
25.5 Conclusion / 502
References / 503
VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505
26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
Christos Makris and Evangelos Theodoridis
26.1 Introduction / 507
26.2 Measures/Metrics / 508
26.3 Models of Biological Networks / 511
26.4 Reconstructing and Partitioning Biological Networks / 511
26.5 PPI Networks / 513
26.6 Mining PPI Networks-Interaction Prediction / 517
26.7 Conclusions / 519
References / 519
27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO
SPECIES INTERACTIONS 525
Andrej Aderhold, V Anne Smith, and Dirk Husmeier
27.1 Introduction / 525
27.2 Molecular Systems / 528
27.3 Ecological Systems / 528
27.4 Models and Evaluation / 529
27.4.1 Notations / 529
27.4.2 Sparse Regression and the LASSO / 530
27.4.3 Bayesian Regression / 530
27.4.4 Evaluation Metric / 531
27.5 Learning Gene Regulation Networks / 532
27.5.1 Nonhomogeneous Bayesian Regression / 533
27.5.2 Gradient Estimation / 534
27.5.3 Simulated Bio-PEPA Data / 534
27.5.4 Real mRNA Expression Profile Data / 535
27.5.5 Method Evaluation and Learned Networks / 536
27.6 Learning Species Interaction Networks / 540
27.6.1 Regression Model of Species interactions / 540
27.6.2 Multiple Global Change-Points / 541
27.6.3 Mondrian Process Change-Points / 542
27.6.4 Synthetic Data / 544
27.6.5 Simulated Population Dynamics / 544
27.6.6 Real World Plant Data / 546
27.6.7 Method Evaluation and Learned Networks / 546
27.7 Conclusion / 550
References / 550
28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING:
APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC
LEUKEMIA 555
Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas
Stamatopoulos, and Lefteris Angelis
28.1 Introduction / 555
28.2 Toll-Like Receptors / 557
28.2.1 Basics / 557
28.2.2 Structure and Signaling of TLRs / 558
28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559
28.3 Structural Equation Modeling / 560
28.3.1 Methodology of SEM Modeling / 560
28.3.2 Assumptions / 561
28.3.3 Estimation Methods / 562
28.3.4 Missing Data / 562
28.3.5 Goodness-of-Fit Indices / 563
28.3.6 Other Indications of a Misspecified Model / 565
28.4 Application / 566
28.5 Conclusion / 580
References / 581
29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi
29.1 Introduction / 585
29.2 Related Work / 587
29.3 Problem Formulation / 589
29.3.1 The Algorithm / 591
29.4 Experimental Setup / 592
29.4.1 Data sets / 592
29.4.2 Comparative Methods / 593
29.4.3 Experimental Protocol / 594
29.4.4 Evaluation Criteria / 594
29.5 Experimental Analysis / 596
29.5.1 Replenishing Missing Functions / 596
29.5.2 Predicting Unlabeled Proteins / 600
29.5.3 Component Analysis / 604
29.5.4 Run Time Analysis / 604
29.6 Conclusions / 605
Acknowledgments / 606
References / 606
INDEX 609