- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
An interdisciplinary framework for learning methodologies-now revised and updated
Learning from Data provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, neural networks, and pattern recognition can be applied-showing that a few fundamental principles underlie most new methods being proposed today in statistics, engineering, and computer science.
Since the first edition was published, the field of data-driven learning has experienced rapid growth.…mehr
Andere Kunden interessierten sich auch für
- Olaf WolkenhauerData Engineering203,99 €
- William SilerFuzzy Expert Systems and Fuzzy Reasoning203,99 €
- Lefteri H. TsoukalasFuzzy and Neural Approaches in Engineering252,99 €
- Michael WinikoffDeveloping Intelligent Agent Systems97,99 €
- Rafael H. BordiniProgramming Multi-Agent Systems in Agentspeak Using Jason133,99 €
- Alex L. G. Hayzelden / Rachel A. Bourne (Hgg.)Agent Technology for Communication Infrastructures169,99 €
- Aapo HyvärinenIndependent Component Analysis225,99 €
-
-
-
An interdisciplinary framework for learning methodologies-now revised and updated
Learning from Data provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, neural networks, and pattern recognition can be applied-showing that a few fundamental principles underlie most new methods being proposed today in statistics, engineering, and computer science.
Since the first edition was published, the field of data-driven learning has experienced rapid growth. This Second Edition covers these developments with a completely revised chapter on support vector machines, a new chapter on noninductive inference and alternative learning formulations, and an in-depth discussion of the VC theoretical approach as it relates to other paradigms.
Complete with over one hundred illustrations, case studies, examples, and chapter summaries, Learning from Data accommodates both beginning and advanced graduate students in engineering, computer science, and statistics. It is also indispensable for researchers and practitioners in these areas who must understand the principles and methods for learning dependencies from data.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Learning from Data provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, neural networks, and pattern recognition can be applied-showing that a few fundamental principles underlie most new methods being proposed today in statistics, engineering, and computer science.
Since the first edition was published, the field of data-driven learning has experienced rapid growth. This Second Edition covers these developments with a completely revised chapter on support vector machines, a new chapter on noninductive inference and alternative learning formulations, and an in-depth discussion of the VC theoretical approach as it relates to other paradigms.
Complete with over one hundred illustrations, case studies, examples, and chapter summaries, Learning from Data accommodates both beginning and advanced graduate students in engineering, computer science, and statistics. It is also indispensable for researchers and practitioners in these areas who must understand the principles and methods for learning dependencies from data.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 14668182000
- 2. Aufl.
- Seitenzahl: 560
- Erscheinungstermin: 24. August 2007
- Englisch
- Abmessung: 240mm x 161mm x 34mm
- Gewicht: 920g
- ISBN-13: 9780471681823
- ISBN-10: 0471681822
- Artikelnr.: 22726800
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 14668182000
- 2. Aufl.
- Seitenzahl: 560
- Erscheinungstermin: 24. August 2007
- Englisch
- Abmessung: 240mm x 161mm x 34mm
- Gewicht: 920g
- ISBN-13: 9780471681823
- ISBN-10: 0471681822
- Artikelnr.: 22726800
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
Vladimir CherKassky, PhD, is Professor of Electrical and Computer Engineering at the University of Minnesota. He is internationally known for his research on neural networks and statistical learning. Filip Mulier, PhD, has worked in the software field for the last twelve years, part of which has been spent researching, developing, and applying advanced statistical and machine learning methods. He currently holds a project management position.
PREFACE.
NOTATION.
1 Introduction.
1.1 Learning and Statistical Estimation.
1.2 Statistical Dependency and Causality.
1.3 Characterization of Variables.
1.4 Characterization of Uncertainty.
1.5 Predictive Learning versus Other Data Analytical Methodologies.
2 Problem Statement, Classical Approaches, and Adaptive Learning.
2.1 Formulation of the Learning Problem.
2.1.1 Objective of Learning.
2.1.2 Common Learning Tasks.
2.1.3 Scope of the Learning Problem Formulation.
2.2 Classical Approaches.
2.2.1 Density Estimation.
2.2.2 Classification.
2.2.3 Regression.
2.2.4 Solving Problems with Finite Data.
2.2.5 Nonparametric Methods.
2.2.6 Stochastic Approximation.
2.3 Adaptive Learning: Concepts and Inductive Principles.
2.3.1 Philosophy, Major Concepts, and Issues.
2.3.2 A Priori Knowledge and Model Complexity.
2.3.3 Inductive Principles.
2.3.4 Alternative Learning Formulations.
2.4 Summary.
3 Regularization Framework.
3.1 Curse and Complexity of Dimensionality.
3.2 Function Approximation and Characterization of Complexity.
3.3 Penalization.
3.3.1 Parametric Penalties.
3.3.2 Nonparametric Penalties.
3.4 Model Selection (Complexity Control).
3.4.1 Analytical Model Selection Criteria.
3.4.2 Model Selection via Resampling.
3.4.3 Bias-Variance Tradeoff.
3.4.4 Example of Model Selection.
3.4.5 Function Approximation versus Predictive Learning.
3.5 Summary.
4 Statistical Learning Theory.
4.1 Conditions for Consistency and Convergence of ERM.
4.2 Growth Function and VC Dimension.
4.2.1 VC Dimension for Classification and Regression Problems.
4.2.2 Examples of Calculating VC Dimension.
4.3 Bounds on the Generalization.
4.3.1 Classification.
4.3.2 Regression.
4.3.3 Generalization Bounds and Sampling Theorem.
4.4 Structural Risk Minimization.
4.4.1 Dictionary Representation.
4.4.2 Feature Selection.
4.4.3 Penalization Formulation.
4.4.4 Input Preprocessing.
4.4.5 Initial Conditions for Training Algorithm.
4.5 Comparisons of Model Selection for Regression.
4.5.1 Model Selection for Linear Estimators.
4.5.2 Model Selection for k-Nearest-Neighbor Regression.
4.5.3 Model Selection for Linear Subset Regression.
4.5.4 Discussion.
4.6 Measuring the VC Dimension.
4.7 VC Dimension, Occam's Razor, and Popper's Falsifiability.
4.8 Summary and Discussion.
5 Nonlinear Optimization Strategies.
5.1 Stochastic Approximation Methods.
5.1.1 Linear Parameter Estimation.
5.1.2 Backpropagation Training of MLP Networks.
5.2 Iterative Methods.
5.2.1 EM Methods for Density Estimation.
5.2.2 Generalized Inverse Training of MLP Networks.
5.3 Greedy Optimization.
5.3.1 Neural Network Construction Algorithms.
5.3.2 Classification and Regression Trees.
5.4 Feature Selection, Optimization, and Statistical Learning Theory.
5.5 Summary.
6 Methods for Data Reduction and Dimensionality Reduction.
6.1 Vector Quantization and Clustering.
6.1.1 Optimal Source Coding in Vector Quantization.
6.1.2 Generalized Lloyd Algorithm.
6.1.3 Clustering.
6.1.4 EM Algorithm for VQ and Clustering.
6.1.5 Fuzzy Clustering.
6.2 Dimensionality Reduction: Statistical Methods.
6.2.1 Linear Principal Components.
6.2.2 Principal Curves and Surfaces.
6.2.3 Multidimensional Scaling.
6.3 Dimensionality Reduction: Neural Network Methods.
6.3.1 Discrete Principal Curves and Self-Organizing Map Algorithm.
6.3.2 Statistical Interpretation of the SOM Method.
6.3.3 Flow-Through Version of the SOM and Learning Rate Schedules.
6.3.4 SOM Applications and Modifications.
6.3.5 Self-Supervised MLP.
6.4 Methods for Multivariate Data Analysis.
6.4.1 Factor Analysis.
6.4.2 Independent Component Analysis.
6.5 Summary.
7 Methods for Regression.
7.1 Taxonomy: Dictionary versus Kernel Representation.
7.2 Linear Estimators.
7.2.1 Estimation of Linear Models and Equivalence of Representations.
7.2.2 Analytic Form of Cross-Validation.
7.2.3 Estimating Complexity of Penalized Linear Models.
7.2.4 Nonadaptive Methods.
7.3 Adaptive Dictionary Methods.
7.3.1 Additive Methods and Projection Pursuit Regression.
7.3.2 Multilayer Perceptrons and Backpropagation.
7.3.3 Multivariate Adaptive Regression Splines.
7.3.4 Orthogonal Basis Functions and Wavelet Signal Denoising.
7.4 Adaptive Kernel Methods and Local Risk Minimization.
7.4.1 Generalized Memory-Based Learning.
7.4.2 Constrained Topological Mapping.
7.5 Empirical Studies.
7.5.1 Predicting Net Asset Value (NAV) of Mutual Funds.
7.5.2 Comparison of Adaptive Methods for Regression.
7.6 Combining Predictive Models.
7.7 Summary.
8 Classification.
8.1 Statistical Learning Theory Formulation.
8.2 Classical Formulation.
8.2.1 Statistical Decision Theory.
8.2.2 Fisher's Linear Discriminant Analysis.
8.3 Methods for Classification.
8.3.1 Regression-Based Methods.
8.3.2 Tree-Based Methods.
8.3.3 Nearest-Neighbor and Prototype Methods.
8.3.4 Empirical Comparisons.
8.4 Combining Methods and Boosting.
8.4.1 Boosting as an Additive Model.
8.4.2 Boosting for Regression Problems.
8.5 Summary.
9 Support Vector Machines.
9.1 Motivation for Margin-Based Loss.
9.2 Margin-Based Loss, Robustness, and Complexity Control.
9.3 Optimal Separating Hyperplane.
9.4 High-Dimensional Mapping and Inner Product Kernels.
9.5 Support Vector Machine for Classification.
9.6 Support Vector Implementations.
9.7 Support Vector Regression.
9.8 SVM Model Selection.
9.9 Support Vector Machines and Regularization.
9.10 Single-Class SVM and Novelty Detection.
9.11 Summary and Discussion.
10 Noninductive Inference and Alternative Learning Formulations.
10.1 Sparse High-Dimensional Data.
10.2 Transduction.
10.3 Inference Through Contradictions.
10.4 Multiple-Model Estimation.
10.5 Summary.
11 Concluding Remarks.
Appendix A: Review of Nonlinear Optimization.
Appendix B: Eigenvalues and Singular Value Decomposition.
References.
Index.
NOTATION.
1 Introduction.
1.1 Learning and Statistical Estimation.
1.2 Statistical Dependency and Causality.
1.3 Characterization of Variables.
1.4 Characterization of Uncertainty.
1.5 Predictive Learning versus Other Data Analytical Methodologies.
2 Problem Statement, Classical Approaches, and Adaptive Learning.
2.1 Formulation of the Learning Problem.
2.1.1 Objective of Learning.
2.1.2 Common Learning Tasks.
2.1.3 Scope of the Learning Problem Formulation.
2.2 Classical Approaches.
2.2.1 Density Estimation.
2.2.2 Classification.
2.2.3 Regression.
2.2.4 Solving Problems with Finite Data.
2.2.5 Nonparametric Methods.
2.2.6 Stochastic Approximation.
2.3 Adaptive Learning: Concepts and Inductive Principles.
2.3.1 Philosophy, Major Concepts, and Issues.
2.3.2 A Priori Knowledge and Model Complexity.
2.3.3 Inductive Principles.
2.3.4 Alternative Learning Formulations.
2.4 Summary.
3 Regularization Framework.
3.1 Curse and Complexity of Dimensionality.
3.2 Function Approximation and Characterization of Complexity.
3.3 Penalization.
3.3.1 Parametric Penalties.
3.3.2 Nonparametric Penalties.
3.4 Model Selection (Complexity Control).
3.4.1 Analytical Model Selection Criteria.
3.4.2 Model Selection via Resampling.
3.4.3 Bias-Variance Tradeoff.
3.4.4 Example of Model Selection.
3.4.5 Function Approximation versus Predictive Learning.
3.5 Summary.
4 Statistical Learning Theory.
4.1 Conditions for Consistency and Convergence of ERM.
4.2 Growth Function and VC Dimension.
4.2.1 VC Dimension for Classification and Regression Problems.
4.2.2 Examples of Calculating VC Dimension.
4.3 Bounds on the Generalization.
4.3.1 Classification.
4.3.2 Regression.
4.3.3 Generalization Bounds and Sampling Theorem.
4.4 Structural Risk Minimization.
4.4.1 Dictionary Representation.
4.4.2 Feature Selection.
4.4.3 Penalization Formulation.
4.4.4 Input Preprocessing.
4.4.5 Initial Conditions for Training Algorithm.
4.5 Comparisons of Model Selection for Regression.
4.5.1 Model Selection for Linear Estimators.
4.5.2 Model Selection for k-Nearest-Neighbor Regression.
4.5.3 Model Selection for Linear Subset Regression.
4.5.4 Discussion.
4.6 Measuring the VC Dimension.
4.7 VC Dimension, Occam's Razor, and Popper's Falsifiability.
4.8 Summary and Discussion.
5 Nonlinear Optimization Strategies.
5.1 Stochastic Approximation Methods.
5.1.1 Linear Parameter Estimation.
5.1.2 Backpropagation Training of MLP Networks.
5.2 Iterative Methods.
5.2.1 EM Methods for Density Estimation.
5.2.2 Generalized Inverse Training of MLP Networks.
5.3 Greedy Optimization.
5.3.1 Neural Network Construction Algorithms.
5.3.2 Classification and Regression Trees.
5.4 Feature Selection, Optimization, and Statistical Learning Theory.
5.5 Summary.
6 Methods for Data Reduction and Dimensionality Reduction.
6.1 Vector Quantization and Clustering.
6.1.1 Optimal Source Coding in Vector Quantization.
6.1.2 Generalized Lloyd Algorithm.
6.1.3 Clustering.
6.1.4 EM Algorithm for VQ and Clustering.
6.1.5 Fuzzy Clustering.
6.2 Dimensionality Reduction: Statistical Methods.
6.2.1 Linear Principal Components.
6.2.2 Principal Curves and Surfaces.
6.2.3 Multidimensional Scaling.
6.3 Dimensionality Reduction: Neural Network Methods.
6.3.1 Discrete Principal Curves and Self-Organizing Map Algorithm.
6.3.2 Statistical Interpretation of the SOM Method.
6.3.3 Flow-Through Version of the SOM and Learning Rate Schedules.
6.3.4 SOM Applications and Modifications.
6.3.5 Self-Supervised MLP.
6.4 Methods for Multivariate Data Analysis.
6.4.1 Factor Analysis.
6.4.2 Independent Component Analysis.
6.5 Summary.
7 Methods for Regression.
7.1 Taxonomy: Dictionary versus Kernel Representation.
7.2 Linear Estimators.
7.2.1 Estimation of Linear Models and Equivalence of Representations.
7.2.2 Analytic Form of Cross-Validation.
7.2.3 Estimating Complexity of Penalized Linear Models.
7.2.4 Nonadaptive Methods.
7.3 Adaptive Dictionary Methods.
7.3.1 Additive Methods and Projection Pursuit Regression.
7.3.2 Multilayer Perceptrons and Backpropagation.
7.3.3 Multivariate Adaptive Regression Splines.
7.3.4 Orthogonal Basis Functions and Wavelet Signal Denoising.
7.4 Adaptive Kernel Methods and Local Risk Minimization.
7.4.1 Generalized Memory-Based Learning.
7.4.2 Constrained Topological Mapping.
7.5 Empirical Studies.
7.5.1 Predicting Net Asset Value (NAV) of Mutual Funds.
7.5.2 Comparison of Adaptive Methods for Regression.
7.6 Combining Predictive Models.
7.7 Summary.
8 Classification.
8.1 Statistical Learning Theory Formulation.
8.2 Classical Formulation.
8.2.1 Statistical Decision Theory.
8.2.2 Fisher's Linear Discriminant Analysis.
8.3 Methods for Classification.
8.3.1 Regression-Based Methods.
8.3.2 Tree-Based Methods.
8.3.3 Nearest-Neighbor and Prototype Methods.
8.3.4 Empirical Comparisons.
8.4 Combining Methods and Boosting.
8.4.1 Boosting as an Additive Model.
8.4.2 Boosting for Regression Problems.
8.5 Summary.
9 Support Vector Machines.
9.1 Motivation for Margin-Based Loss.
9.2 Margin-Based Loss, Robustness, and Complexity Control.
9.3 Optimal Separating Hyperplane.
9.4 High-Dimensional Mapping and Inner Product Kernels.
9.5 Support Vector Machine for Classification.
9.6 Support Vector Implementations.
9.7 Support Vector Regression.
9.8 SVM Model Selection.
9.9 Support Vector Machines and Regularization.
9.10 Single-Class SVM and Novelty Detection.
9.11 Summary and Discussion.
10 Noninductive Inference and Alternative Learning Formulations.
10.1 Sparse High-Dimensional Data.
10.2 Transduction.
10.3 Inference Through Contradictions.
10.4 Multiple-Model Estimation.
10.5 Summary.
11 Concluding Remarks.
Appendix A: Review of Nonlinear Optimization.
Appendix B: Eigenvalues and Singular Value Decomposition.
References.
Index.
PREFACE.
NOTATION.
1 Introduction.
1.1 Learning and Statistical Estimation.
1.2 Statistical Dependency and Causality.
1.3 Characterization of Variables.
1.4 Characterization of Uncertainty.
1.5 Predictive Learning versus Other Data Analytical Methodologies.
2 Problem Statement, Classical Approaches, and Adaptive Learning.
2.1 Formulation of the Learning Problem.
2.1.1 Objective of Learning.
2.1.2 Common Learning Tasks.
2.1.3 Scope of the Learning Problem Formulation.
2.2 Classical Approaches.
2.2.1 Density Estimation.
2.2.2 Classification.
2.2.3 Regression.
2.2.4 Solving Problems with Finite Data.
2.2.5 Nonparametric Methods.
2.2.6 Stochastic Approximation.
2.3 Adaptive Learning: Concepts and Inductive Principles.
2.3.1 Philosophy, Major Concepts, and Issues.
2.3.2 A Priori Knowledge and Model Complexity.
2.3.3 Inductive Principles.
2.3.4 Alternative Learning Formulations.
2.4 Summary.
3 Regularization Framework.
3.1 Curse and Complexity of Dimensionality.
3.2 Function Approximation and Characterization of Complexity.
3.3 Penalization.
3.3.1 Parametric Penalties.
3.3.2 Nonparametric Penalties.
3.4 Model Selection (Complexity Control).
3.4.1 Analytical Model Selection Criteria.
3.4.2 Model Selection via Resampling.
3.4.3 Bias-Variance Tradeoff.
3.4.4 Example of Model Selection.
3.4.5 Function Approximation versus Predictive Learning.
3.5 Summary.
4 Statistical Learning Theory.
4.1 Conditions for Consistency and Convergence of ERM.
4.2 Growth Function and VC Dimension.
4.2.1 VC Dimension for Classification and Regression Problems.
4.2.2 Examples of Calculating VC Dimension.
4.3 Bounds on the Generalization.
4.3.1 Classification.
4.3.2 Regression.
4.3.3 Generalization Bounds and Sampling Theorem.
4.4 Structural Risk Minimization.
4.4.1 Dictionary Representation.
4.4.2 Feature Selection.
4.4.3 Penalization Formulation.
4.4.4 Input Preprocessing.
4.4.5 Initial Conditions for Training Algorithm.
4.5 Comparisons of Model Selection for Regression.
4.5.1 Model Selection for Linear Estimators.
4.5.2 Model Selection for k-Nearest-Neighbor Regression.
4.5.3 Model Selection for Linear Subset Regression.
4.5.4 Discussion.
4.6 Measuring the VC Dimension.
4.7 VC Dimension, Occam's Razor, and Popper's Falsifiability.
4.8 Summary and Discussion.
5 Nonlinear Optimization Strategies.
5.1 Stochastic Approximation Methods.
5.1.1 Linear Parameter Estimation.
5.1.2 Backpropagation Training of MLP Networks.
5.2 Iterative Methods.
5.2.1 EM Methods for Density Estimation.
5.2.2 Generalized Inverse Training of MLP Networks.
5.3 Greedy Optimization.
5.3.1 Neural Network Construction Algorithms.
5.3.2 Classification and Regression Trees.
5.4 Feature Selection, Optimization, and Statistical Learning Theory.
5.5 Summary.
6 Methods for Data Reduction and Dimensionality Reduction.
6.1 Vector Quantization and Clustering.
6.1.1 Optimal Source Coding in Vector Quantization.
6.1.2 Generalized Lloyd Algorithm.
6.1.3 Clustering.
6.1.4 EM Algorithm for VQ and Clustering.
6.1.5 Fuzzy Clustering.
6.2 Dimensionality Reduction: Statistical Methods.
6.2.1 Linear Principal Components.
6.2.2 Principal Curves and Surfaces.
6.2.3 Multidimensional Scaling.
6.3 Dimensionality Reduction: Neural Network Methods.
6.3.1 Discrete Principal Curves and Self-Organizing Map Algorithm.
6.3.2 Statistical Interpretation of the SOM Method.
6.3.3 Flow-Through Version of the SOM and Learning Rate Schedules.
6.3.4 SOM Applications and Modifications.
6.3.5 Self-Supervised MLP.
6.4 Methods for Multivariate Data Analysis.
6.4.1 Factor Analysis.
6.4.2 Independent Component Analysis.
6.5 Summary.
7 Methods for Regression.
7.1 Taxonomy: Dictionary versus Kernel Representation.
7.2 Linear Estimators.
7.2.1 Estimation of Linear Models and Equivalence of Representations.
7.2.2 Analytic Form of Cross-Validation.
7.2.3 Estimating Complexity of Penalized Linear Models.
7.2.4 Nonadaptive Methods.
7.3 Adaptive Dictionary Methods.
7.3.1 Additive Methods and Projection Pursuit Regression.
7.3.2 Multilayer Perceptrons and Backpropagation.
7.3.3 Multivariate Adaptive Regression Splines.
7.3.4 Orthogonal Basis Functions and Wavelet Signal Denoising.
7.4 Adaptive Kernel Methods and Local Risk Minimization.
7.4.1 Generalized Memory-Based Learning.
7.4.2 Constrained Topological Mapping.
7.5 Empirical Studies.
7.5.1 Predicting Net Asset Value (NAV) of Mutual Funds.
7.5.2 Comparison of Adaptive Methods for Regression.
7.6 Combining Predictive Models.
7.7 Summary.
8 Classification.
8.1 Statistical Learning Theory Formulation.
8.2 Classical Formulation.
8.2.1 Statistical Decision Theory.
8.2.2 Fisher's Linear Discriminant Analysis.
8.3 Methods for Classification.
8.3.1 Regression-Based Methods.
8.3.2 Tree-Based Methods.
8.3.3 Nearest-Neighbor and Prototype Methods.
8.3.4 Empirical Comparisons.
8.4 Combining Methods and Boosting.
8.4.1 Boosting as an Additive Model.
8.4.2 Boosting for Regression Problems.
8.5 Summary.
9 Support Vector Machines.
9.1 Motivation for Margin-Based Loss.
9.2 Margin-Based Loss, Robustness, and Complexity Control.
9.3 Optimal Separating Hyperplane.
9.4 High-Dimensional Mapping and Inner Product Kernels.
9.5 Support Vector Machine for Classification.
9.6 Support Vector Implementations.
9.7 Support Vector Regression.
9.8 SVM Model Selection.
9.9 Support Vector Machines and Regularization.
9.10 Single-Class SVM and Novelty Detection.
9.11 Summary and Discussion.
10 Noninductive Inference and Alternative Learning Formulations.
10.1 Sparse High-Dimensional Data.
10.2 Transduction.
10.3 Inference Through Contradictions.
10.4 Multiple-Model Estimation.
10.5 Summary.
11 Concluding Remarks.
Appendix A: Review of Nonlinear Optimization.
Appendix B: Eigenvalues and Singular Value Decomposition.
References.
Index.
NOTATION.
1 Introduction.
1.1 Learning and Statistical Estimation.
1.2 Statistical Dependency and Causality.
1.3 Characterization of Variables.
1.4 Characterization of Uncertainty.
1.5 Predictive Learning versus Other Data Analytical Methodologies.
2 Problem Statement, Classical Approaches, and Adaptive Learning.
2.1 Formulation of the Learning Problem.
2.1.1 Objective of Learning.
2.1.2 Common Learning Tasks.
2.1.3 Scope of the Learning Problem Formulation.
2.2 Classical Approaches.
2.2.1 Density Estimation.
2.2.2 Classification.
2.2.3 Regression.
2.2.4 Solving Problems with Finite Data.
2.2.5 Nonparametric Methods.
2.2.6 Stochastic Approximation.
2.3 Adaptive Learning: Concepts and Inductive Principles.
2.3.1 Philosophy, Major Concepts, and Issues.
2.3.2 A Priori Knowledge and Model Complexity.
2.3.3 Inductive Principles.
2.3.4 Alternative Learning Formulations.
2.4 Summary.
3 Regularization Framework.
3.1 Curse and Complexity of Dimensionality.
3.2 Function Approximation and Characterization of Complexity.
3.3 Penalization.
3.3.1 Parametric Penalties.
3.3.2 Nonparametric Penalties.
3.4 Model Selection (Complexity Control).
3.4.1 Analytical Model Selection Criteria.
3.4.2 Model Selection via Resampling.
3.4.3 Bias-Variance Tradeoff.
3.4.4 Example of Model Selection.
3.4.5 Function Approximation versus Predictive Learning.
3.5 Summary.
4 Statistical Learning Theory.
4.1 Conditions for Consistency and Convergence of ERM.
4.2 Growth Function and VC Dimension.
4.2.1 VC Dimension for Classification and Regression Problems.
4.2.2 Examples of Calculating VC Dimension.
4.3 Bounds on the Generalization.
4.3.1 Classification.
4.3.2 Regression.
4.3.3 Generalization Bounds and Sampling Theorem.
4.4 Structural Risk Minimization.
4.4.1 Dictionary Representation.
4.4.2 Feature Selection.
4.4.3 Penalization Formulation.
4.4.4 Input Preprocessing.
4.4.5 Initial Conditions for Training Algorithm.
4.5 Comparisons of Model Selection for Regression.
4.5.1 Model Selection for Linear Estimators.
4.5.2 Model Selection for k-Nearest-Neighbor Regression.
4.5.3 Model Selection for Linear Subset Regression.
4.5.4 Discussion.
4.6 Measuring the VC Dimension.
4.7 VC Dimension, Occam's Razor, and Popper's Falsifiability.
4.8 Summary and Discussion.
5 Nonlinear Optimization Strategies.
5.1 Stochastic Approximation Methods.
5.1.1 Linear Parameter Estimation.
5.1.2 Backpropagation Training of MLP Networks.
5.2 Iterative Methods.
5.2.1 EM Methods for Density Estimation.
5.2.2 Generalized Inverse Training of MLP Networks.
5.3 Greedy Optimization.
5.3.1 Neural Network Construction Algorithms.
5.3.2 Classification and Regression Trees.
5.4 Feature Selection, Optimization, and Statistical Learning Theory.
5.5 Summary.
6 Methods for Data Reduction and Dimensionality Reduction.
6.1 Vector Quantization and Clustering.
6.1.1 Optimal Source Coding in Vector Quantization.
6.1.2 Generalized Lloyd Algorithm.
6.1.3 Clustering.
6.1.4 EM Algorithm for VQ and Clustering.
6.1.5 Fuzzy Clustering.
6.2 Dimensionality Reduction: Statistical Methods.
6.2.1 Linear Principal Components.
6.2.2 Principal Curves and Surfaces.
6.2.3 Multidimensional Scaling.
6.3 Dimensionality Reduction: Neural Network Methods.
6.3.1 Discrete Principal Curves and Self-Organizing Map Algorithm.
6.3.2 Statistical Interpretation of the SOM Method.
6.3.3 Flow-Through Version of the SOM and Learning Rate Schedules.
6.3.4 SOM Applications and Modifications.
6.3.5 Self-Supervised MLP.
6.4 Methods for Multivariate Data Analysis.
6.4.1 Factor Analysis.
6.4.2 Independent Component Analysis.
6.5 Summary.
7 Methods for Regression.
7.1 Taxonomy: Dictionary versus Kernel Representation.
7.2 Linear Estimators.
7.2.1 Estimation of Linear Models and Equivalence of Representations.
7.2.2 Analytic Form of Cross-Validation.
7.2.3 Estimating Complexity of Penalized Linear Models.
7.2.4 Nonadaptive Methods.
7.3 Adaptive Dictionary Methods.
7.3.1 Additive Methods and Projection Pursuit Regression.
7.3.2 Multilayer Perceptrons and Backpropagation.
7.3.3 Multivariate Adaptive Regression Splines.
7.3.4 Orthogonal Basis Functions and Wavelet Signal Denoising.
7.4 Adaptive Kernel Methods and Local Risk Minimization.
7.4.1 Generalized Memory-Based Learning.
7.4.2 Constrained Topological Mapping.
7.5 Empirical Studies.
7.5.1 Predicting Net Asset Value (NAV) of Mutual Funds.
7.5.2 Comparison of Adaptive Methods for Regression.
7.6 Combining Predictive Models.
7.7 Summary.
8 Classification.
8.1 Statistical Learning Theory Formulation.
8.2 Classical Formulation.
8.2.1 Statistical Decision Theory.
8.2.2 Fisher's Linear Discriminant Analysis.
8.3 Methods for Classification.
8.3.1 Regression-Based Methods.
8.3.2 Tree-Based Methods.
8.3.3 Nearest-Neighbor and Prototype Methods.
8.3.4 Empirical Comparisons.
8.4 Combining Methods and Boosting.
8.4.1 Boosting as an Additive Model.
8.4.2 Boosting for Regression Problems.
8.5 Summary.
9 Support Vector Machines.
9.1 Motivation for Margin-Based Loss.
9.2 Margin-Based Loss, Robustness, and Complexity Control.
9.3 Optimal Separating Hyperplane.
9.4 High-Dimensional Mapping and Inner Product Kernels.
9.5 Support Vector Machine for Classification.
9.6 Support Vector Implementations.
9.7 Support Vector Regression.
9.8 SVM Model Selection.
9.9 Support Vector Machines and Regularization.
9.10 Single-Class SVM and Novelty Detection.
9.11 Summary and Discussion.
10 Noninductive Inference and Alternative Learning Formulations.
10.1 Sparse High-Dimensional Data.
10.2 Transduction.
10.3 Inference Through Contradictions.
10.4 Multiple-Model Estimation.
10.5 Summary.
11 Concluding Remarks.
Appendix A: Review of Nonlinear Optimization.
Appendix B: Eigenvalues and Singular Value Decomposition.
References.
Index.