Computational Statistics in Data Science
Herausgeber: Piegorsch, Walter W; Lee, Thomas C M; Zhang, Hao Helen; Levine, Richard A
Computational Statistics in Data Science
Herausgeber: Piegorsch, Walter W; Lee, Thomas C M; Zhang, Hao Helen; Levine, Richard A
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Ein unverzichtbarer Leitfaden bei der Anwendung computergestützter Statistik in der modernen Datenwissenschaft In Computational Statistics in Data Science präsentiert ein Team aus bekannten Mathematikern und Statistikern eine fundierte Zusammenstellung von Konzepten, Theorien, Techniken und Praktiken der computergestützten Statistik für ein Publikum, das auf der Suche nach einem einzigen, umfassenden Referenzwerk für Statistik in der modernen Datenwissenschaft ist. Das Buch enthält etliche Kapitel zu den wesentlichen konkreten Bereichen der computergestützten Statistik, in denen modernste…mehr
Andere Kunden interessierten sich auch für
- Data Engineering and Data Science244,99 €
- Peter C BruceStatistics for Data Science and Analytics132,99 €
- Norman MatloffParallel Computing for Data Science93,99 €
- Franco TaroniData Analysis in Forensic Science113,99 €
- Riccardo BoeroBehavioral Computational Social Science92,99 €
- Alfred DemarisRegression with Social Data223,99 €
- Kristin H JarmanBeyond Basic Statistics74,99 €
-
-
-
Ein unverzichtbarer Leitfaden bei der Anwendung computergestützter Statistik in der modernen Datenwissenschaft In Computational Statistics in Data Science präsentiert ein Team aus bekannten Mathematikern und Statistikern eine fundierte Zusammenstellung von Konzepten, Theorien, Techniken und Praktiken der computergestützten Statistik für ein Publikum, das auf der Suche nach einem einzigen, umfassenden Referenzwerk für Statistik in der modernen Datenwissenschaft ist. Das Buch enthält etliche Kapitel zu den wesentlichen konkreten Bereichen der computergestützten Statistik, in denen modernste Techniken zeitgemäß und verständlich dargestellt werden. Darüber hinaus bietet Computational Statistics in Data Science einen kostenlosen Zugang zu den fertigen Einträgen im Online-Nachschlagewerk Wiley StatsRef: Statistics Reference Online. Außerdem erhalten die Leserinnen und Leser: * Eine gründliche Einführung in die computergestützte Statistik mit relevanten und verständlichen Informationen für Anwender und Forscher in verschiedenen datenintensiven Bereichen * Umfassende Erläuterungen zu aktuellen Themen in der Statistik, darunter Big Data, Datenstromverarbeitung, quantitative Visualisierung und Deep Learning Das Werk eignet sich perfekt für Forscher und Wissenschaftler sämtlicher Fachbereiche, die Techniken der computergestützten Statistik auf einem gehobenen oder fortgeschrittenen Niveau anwenden müssen. Zudem gehört Computational Statistics in Data Science in das Bücherregal von Wissenschaftlern, die sich mit der Erforschung und Entwicklung von Techniken der computergestützten Statistik und statistischen Grafiken beschäftigen.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Wiley
- Seitenzahl: 672
- Erscheinungstermin: 18. April 2022
- Englisch
- Abmessung: 244mm x 170mm x 37mm
- Gewicht: 1252g
- ISBN-13: 9781119561071
- ISBN-10: 1119561078
- Artikelnr.: 61329949
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
- Verlag: Wiley
- Seitenzahl: 672
- Erscheinungstermin: 18. April 2022
- Englisch
- Abmessung: 244mm x 170mm x 37mm
- Gewicht: 1252g
- ISBN-13: 9781119561071
- ISBN-10: 1119561078
- Artikelnr.: 61329949
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
WALTER W. PIEGORSCH is Professor of Mathematics at the University of Arizona and Director of Statistical Research & Education at the University's BIO5 Institute. He is also a former Chair of the UArizona Interdisciplinary Program in Statistics, and a past editor of the Journal of the American Statistical Association (Theory & Methods Section). He is a fellow of the American Statistical Association and an elected member of the International Statistical Institute. RICHARD A. LEVINE is Professor of Statistics at San Diego State University and Faculty Advisor overseeing the Statistical Modeling Group in SDSU Analytic Studies and Institutional Research. He is former Chair of the SDSU Department of Mathematics and Statistics and past Editor of the Journal of Computational and Graphical Statistics. He is Associate Editor for Statistics of the Notices of the American Mathematical Society and is a fellow of the American Statistical Association. HAO HELEN ZHANG is Professor of Mathematics at the University of Arizona and Chair of the UArizona Interdisciplinary Program in Statistics. She is Editor-in-Chief of STAT (the ISI journal) and Associate Editor of the Journal of the American Statistical Association and the Journal of the Royal Statistical Society. She is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and an elected member of the International Statistical Institute. THOMAS C. M. LEE is Professor of Statistics and Associate Dean of the Faculty in Mathematical and Physical Sciences at the University of California, Davis. He is a former Chair of the Department of Statistics at the same institution and a past editor of the Journal of Computational and Graphical Statistics. He is an elected fellow of the American Association for the Advancement of Science, the American Statistical Association, and the Institute of Mathematical Statistics.
List of Contributors xxiii Preface xxix Part I Computational Statistics and Data Science 1 1 Computational Statistics and Data Science in the Twenty-first Century 3 Andrew J. Holbrook, Akihiko Nishimura, Xiang Ji, and Marc A. Suchard 1 Introduction 3 2 Core Challenges 1-3 5 3 Model-Specific Advances 8 4 Core Challenges 4 and 5 12 5 Rise of Data Science 16 2 Statistical Software 23 Alfred G. Schissler and Alexander D. Knudson 1 User Development Environments 23 2 Popular Statistical Software 26 3 Noteworthy Statistical Software and Related Tools 30 4 Promising and Emerging Statistical Software 36 5 The Future of Statistical Computing 38 6 Concluding Remarks 39 3 An Introduction to Deep Learning Methods 43 Yao Li, Justin Wang and Thomas C.M. Lee 1 Introduction 43 2 Machine Learning: An Overview 43 3 Feedforward Neural Networks 45 4 Convolutional Neural Networks 48 5 Autoencoders 52 6 Recurrent Neural Networks 54 7 Conclusion 57 4 Streaming Data and Data Streams 59 Taiwo Kolajo, Olawande Daramola, and Ayodele Adebiyi 1 Introduction 59 2 Data Stream Computing 61 3 Issues in Data Stream Mining 61 4 Streaming Data Tools and Technologies 64 5 Streaming Data Pre-Processing: Concept and Implementation 65 6 Streaming Data Algorithms 65 7 Strategies for Processing Data Streams 68 8 Best Practices for Managing Data Streams 69 9 Conclusion and theWay Forward 70 Part II Simulation-Based Methods 79 5 Monte Carlo Simulation: Are We There Yet? 81 Dootika Vats, James M. Flegal, and Galin L. Jones 1 Introduction 81 2 Estimation 83 3 Sampling Distribution 84 4 Estimating
87 5 Stopping Rules 88 6 Workflow 89 7 Examples 90 6 Sequential Monte Carlo: Particle Filters and Beyond 99 Adam M. Johansen 1 Introduction 99 2 Sequential Importance Sampling and Resampling 99 3 SMC in Statistical Contexts 106 4 Selected Recent Developments 112 7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings 119 Christian P. Robert and Wu Changye 1 Introduction 119 2 Monte Carlo Methods 121 3 Markov Chain Monte Carlo Methods 128 4 Approximate Bayesian Computation 141 5 Further Reading 145 8 Bayesian Inference with Adaptive Markov Chain Monte Carlo 151 Matti Vihola 1 Introduction 151 2 Random-Walk Metropolis Algorithm 151 3 Adaptation of Random-Walk Metropolis 152 4 Multimodal Targets with Parallel Tempering 156 5 Dynamic Models with Particle Filters 157 6 Discussion 159 9 Advances in Importance Sampling 165 Víctor Elvira and Luca Martino 1 Introduction and Problem Statement 165 2 Importance Sampling 167 3 Multiple Importance Sampling (MIS) 171 4 Adaptive Importance Sampling (AIS) 174 Part III Statistical Learning 183 10 Supervised Learning 185 Weibin Mo and Yufeng Liu 1 Introduction 185 2 Penalized Empirical Risk Minimization 186 3 Linear Regression 190 4 Classification 193 5 Extensions for Complex Data 200 6 Discussion 203 11 Unsupervised and Semisupervised Learning 209 Jia Li and Vincent A. Pisztora 1 Introduction 209 2 Unsupervised Learning 210 3 Semisupervised Learning 219 4 Conclusions 224 12 Random Forest 231 Peter Calhoun, Xiaogang Su, Kelly M. Spoon, Richard A. Levine, and Juanjuan Fan 1 Introduction 231 2 Random Forest (RF) 232 3 Random Forest Extensions 235 4 Random Forests of Interaction Trees (RFIT) 239 5 Random Forest of Interaction Trees for Observational Studies 243 6 Discussion 249 13 Network Analysis 253 Rong Ma and Hongzhe Li 1 Introduction 253 2 Gaussian Graphical Models for Mixed Partial Compositional Data 255 3 Theoretical Properties 257 4 Graphical Model Selection 260 5 Analysis of a Microbiome-Metabolomics Data 260 6 Discussion 261 14 Tensors in Modern Statistical Learning 269 Will Wei Sun, Botao Hao, and Lexin Li 1 Introduction 269 2 Background270 3 Tensor Supervised Learning 272 4 Tensor Unsupervised Learning 276 5 Tensor Reinforcement Learning 282 6 Tensor Deep Learning 286 15 Computational Approaches to Bayesian Additive Regression Trees 297 Hugh Chipman, Edward George, Richard Hahn, Robert McCulloch, Matthew Pratola, and Rodney Sparapani 1 Introduction 297 2 Bayesian CART 298 3 TreeMCMC302 4 The BART Model 308 5 BART Example: Boston Housing Values and Air Pollution 310 6 BARTMCMC311 7 BART Extentions 313 8 Conclusion 320 Part IV High-Dimensional Data Analysis 323 16 Penalized Regression 325 Seung Jun Shin and Yichao Wu 1 Introduction 325 2 Penalization for Smoothness 326 3 Penalization for Sparsity 328 4 Tuning Parameter Selection 330 17 Model Selection in High-Dimensional Regression 333 Hao H. Zhang 1 Model Selection Problem 333 2 Model Selection in High-Dimensional Linear Regression 335 3 Interaction-Effect Selection for High-Dimensional Data 339 4 Model Selection in High-Dimensional Nonparametric Models 342 5 Concluding Remarks 349 18 Sampling Local Scale Parameters in High-Dimensional Regression Models 355 Anirban Bhattacharya and James E. Johndrow 1 Introduction 355 2 A Blocked Gibbs Sampler for the Horseshoe 356 3 Sampling (
,
2,
) 359 4 Sampling
360 5 Appendix: A. Newton-Raphson Steps for the Inverse-cdf Sampler for
367 19 Factor Modeling for High-Dimensional Time Series 371 Chun Yip Yau 1 Introduction 371 2 Identifiability 372 3 Estimation of High-Dimensional Factor Model 373 4 Determining the Number of Factors 383 Part V Quantitative Visualization 387 20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception 389 Edward Mulrow and Nola du Toit 1 Introduction 389 2 Case Studies Part 1 391 3 Let StAR Be Your Guide 393 4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics 394 5 Ask Colleagues Their Opinion 397 6 Case Studies: Part 3 398 7 Iterate 401 8 Final Thoughts 402 21 Uncertainty Visualization 405 Lace Padilla, Matthew Kay, and Jessica Hullman 1 Introduction 405 2 Uncertainty Visualization Theories 408 3 General Discussion 420 22 Big Data Visualization 427 Leland Wilkinson 1 Introduction 427 2 Architecture for Big Data Analytics 428 3 Filtering430 4 Aggregating 430 5 Analyzing 436 6 Big Data Graphics 436 7 Conclusion 440 23 Visualization-Assisted Statistical Learning 443 Catherine B. Hurley and Katarina Domijan 1 Introduction 443 2 Better Visualizations with Seriation 444 3 Visualizing Machine Learning Fits 445 4 Condvis2 Case Studies 447 5 Discussion 453 24 Functional Data Visualization 457 Marc G. Genton and Ying Sun 1 Introduction 457 2 Univariate Functional Data Visualization 458 3 Multivariate Functional Data Visualization 461 4 Conclusions 465 Part VI Numerical Approximation and Optimization 469 25 Gradient-Based Optimizers for Statistics and Machine Learning 471 Cho-Jui Hsieh 1 Introduction 471 2 Convex Versus Nonconvex Optimization 472 3 Gradient Descent 473 4 Proximal Gradient Descent: Handling Nondifferentiable Regularization 475 5 Stochastic Gradient Descent 476 26 Alternating Minimization Algorithms 481 David R. Hunter 1 Introduction 481 2 Coordinate Descent 482 3 EM as Alternating Minimization 484 3.1 Finite Mixture Models 485 4 Matrix Approximation Algorithms 486 5 Conclusion 489 27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems 493 Shiqian Ma and Mingyi Hong 1 Introduction 493 2 Two Perfect Examples of ADMM 494 3 Variable Splitting and Linearized ADMM 496 4 Multiblock ADMM 499 5 Nonconvex Problems 501 6 Stopping Criteria 502 7 Convergence Results of ADMM 502 28 Nonconvex Optimization via MM Algorithms: Convergence Theory 509 Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou 1 Background509 2 Convergence Theorems 510 3 Paracontraction 521 4 Bregman Majorization 523 Part VII High-Performance Computing 535 29 Massive Parallelization 537 Robert B. Gramacy 1 Introduction 537 2 Gaussian Process Regression and Surrogate Modeling 539 3 Divide-and-Conquer GP Regression 542 4 Empirical Results 548 5 Conclusion 552 30 Divide-and-Conquer Methods for Big Data Analysis 559 Xueying Chen, Jerry Q. Cheng, and Min-ge Xie 1 Introduction 559 2 Linear Regression Model 560 3 Parametric Models 561 4 Nonparametric and Semiparametric Models 567 5 Online Sequential Updating 568 6 Splitting the Number of Covariates 569 7 Bayesian Divide-and-Conquer and Median-Based Combining 570 8 Real-World Applications 571 9 Discussion 572 31 Bayesian Aggregation 577 Yuling Yao 1 From Model Selection to Model Combination 577 2 From Bayesian Model Averaging to Bayesian Stacking 580 3 Asymptotic Theories of Stacking 584 4 Stacking in Practice 586 5 Discussion 588 32 Asynchronous Parallel Computing 593 Ming Yan 1 Introduction 593 2 Asynchronous Parallel Coordinate Update 597 3 Asynchronous Parallel Stochastic Approaches 602 4 Doubly Stochastic Coordinate Optimization with Variance Reduction 604 5 Concluding Remarks 605
87 5 Stopping Rules 88 6 Workflow 89 7 Examples 90 6 Sequential Monte Carlo: Particle Filters and Beyond 99 Adam M. Johansen 1 Introduction 99 2 Sequential Importance Sampling and Resampling 99 3 SMC in Statistical Contexts 106 4 Selected Recent Developments 112 7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings 119 Christian P. Robert and Wu Changye 1 Introduction 119 2 Monte Carlo Methods 121 3 Markov Chain Monte Carlo Methods 128 4 Approximate Bayesian Computation 141 5 Further Reading 145 8 Bayesian Inference with Adaptive Markov Chain Monte Carlo 151 Matti Vihola 1 Introduction 151 2 Random-Walk Metropolis Algorithm 151 3 Adaptation of Random-Walk Metropolis 152 4 Multimodal Targets with Parallel Tempering 156 5 Dynamic Models with Particle Filters 157 6 Discussion 159 9 Advances in Importance Sampling 165 Víctor Elvira and Luca Martino 1 Introduction and Problem Statement 165 2 Importance Sampling 167 3 Multiple Importance Sampling (MIS) 171 4 Adaptive Importance Sampling (AIS) 174 Part III Statistical Learning 183 10 Supervised Learning 185 Weibin Mo and Yufeng Liu 1 Introduction 185 2 Penalized Empirical Risk Minimization 186 3 Linear Regression 190 4 Classification 193 5 Extensions for Complex Data 200 6 Discussion 203 11 Unsupervised and Semisupervised Learning 209 Jia Li and Vincent A. Pisztora 1 Introduction 209 2 Unsupervised Learning 210 3 Semisupervised Learning 219 4 Conclusions 224 12 Random Forest 231 Peter Calhoun, Xiaogang Su, Kelly M. Spoon, Richard A. Levine, and Juanjuan Fan 1 Introduction 231 2 Random Forest (RF) 232 3 Random Forest Extensions 235 4 Random Forests of Interaction Trees (RFIT) 239 5 Random Forest of Interaction Trees for Observational Studies 243 6 Discussion 249 13 Network Analysis 253 Rong Ma and Hongzhe Li 1 Introduction 253 2 Gaussian Graphical Models for Mixed Partial Compositional Data 255 3 Theoretical Properties 257 4 Graphical Model Selection 260 5 Analysis of a Microbiome-Metabolomics Data 260 6 Discussion 261 14 Tensors in Modern Statistical Learning 269 Will Wei Sun, Botao Hao, and Lexin Li 1 Introduction 269 2 Background270 3 Tensor Supervised Learning 272 4 Tensor Unsupervised Learning 276 5 Tensor Reinforcement Learning 282 6 Tensor Deep Learning 286 15 Computational Approaches to Bayesian Additive Regression Trees 297 Hugh Chipman, Edward George, Richard Hahn, Robert McCulloch, Matthew Pratola, and Rodney Sparapani 1 Introduction 297 2 Bayesian CART 298 3 TreeMCMC302 4 The BART Model 308 5 BART Example: Boston Housing Values and Air Pollution 310 6 BARTMCMC311 7 BART Extentions 313 8 Conclusion 320 Part IV High-Dimensional Data Analysis 323 16 Penalized Regression 325 Seung Jun Shin and Yichao Wu 1 Introduction 325 2 Penalization for Smoothness 326 3 Penalization for Sparsity 328 4 Tuning Parameter Selection 330 17 Model Selection in High-Dimensional Regression 333 Hao H. Zhang 1 Model Selection Problem 333 2 Model Selection in High-Dimensional Linear Regression 335 3 Interaction-Effect Selection for High-Dimensional Data 339 4 Model Selection in High-Dimensional Nonparametric Models 342 5 Concluding Remarks 349 18 Sampling Local Scale Parameters in High-Dimensional Regression Models 355 Anirban Bhattacharya and James E. Johndrow 1 Introduction 355 2 A Blocked Gibbs Sampler for the Horseshoe 356 3 Sampling (
,
2,
) 359 4 Sampling
360 5 Appendix: A. Newton-Raphson Steps for the Inverse-cdf Sampler for
367 19 Factor Modeling for High-Dimensional Time Series 371 Chun Yip Yau 1 Introduction 371 2 Identifiability 372 3 Estimation of High-Dimensional Factor Model 373 4 Determining the Number of Factors 383 Part V Quantitative Visualization 387 20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception 389 Edward Mulrow and Nola du Toit 1 Introduction 389 2 Case Studies Part 1 391 3 Let StAR Be Your Guide 393 4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics 394 5 Ask Colleagues Their Opinion 397 6 Case Studies: Part 3 398 7 Iterate 401 8 Final Thoughts 402 21 Uncertainty Visualization 405 Lace Padilla, Matthew Kay, and Jessica Hullman 1 Introduction 405 2 Uncertainty Visualization Theories 408 3 General Discussion 420 22 Big Data Visualization 427 Leland Wilkinson 1 Introduction 427 2 Architecture for Big Data Analytics 428 3 Filtering430 4 Aggregating 430 5 Analyzing 436 6 Big Data Graphics 436 7 Conclusion 440 23 Visualization-Assisted Statistical Learning 443 Catherine B. Hurley and Katarina Domijan 1 Introduction 443 2 Better Visualizations with Seriation 444 3 Visualizing Machine Learning Fits 445 4 Condvis2 Case Studies 447 5 Discussion 453 24 Functional Data Visualization 457 Marc G. Genton and Ying Sun 1 Introduction 457 2 Univariate Functional Data Visualization 458 3 Multivariate Functional Data Visualization 461 4 Conclusions 465 Part VI Numerical Approximation and Optimization 469 25 Gradient-Based Optimizers for Statistics and Machine Learning 471 Cho-Jui Hsieh 1 Introduction 471 2 Convex Versus Nonconvex Optimization 472 3 Gradient Descent 473 4 Proximal Gradient Descent: Handling Nondifferentiable Regularization 475 5 Stochastic Gradient Descent 476 26 Alternating Minimization Algorithms 481 David R. Hunter 1 Introduction 481 2 Coordinate Descent 482 3 EM as Alternating Minimization 484 3.1 Finite Mixture Models 485 4 Matrix Approximation Algorithms 486 5 Conclusion 489 27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems 493 Shiqian Ma and Mingyi Hong 1 Introduction 493 2 Two Perfect Examples of ADMM 494 3 Variable Splitting and Linearized ADMM 496 4 Multiblock ADMM 499 5 Nonconvex Problems 501 6 Stopping Criteria 502 7 Convergence Results of ADMM 502 28 Nonconvex Optimization via MM Algorithms: Convergence Theory 509 Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou 1 Background509 2 Convergence Theorems 510 3 Paracontraction 521 4 Bregman Majorization 523 Part VII High-Performance Computing 535 29 Massive Parallelization 537 Robert B. Gramacy 1 Introduction 537 2 Gaussian Process Regression and Surrogate Modeling 539 3 Divide-and-Conquer GP Regression 542 4 Empirical Results 548 5 Conclusion 552 30 Divide-and-Conquer Methods for Big Data Analysis 559 Xueying Chen, Jerry Q. Cheng, and Min-ge Xie 1 Introduction 559 2 Linear Regression Model 560 3 Parametric Models 561 4 Nonparametric and Semiparametric Models 567 5 Online Sequential Updating 568 6 Splitting the Number of Covariates 569 7 Bayesian Divide-and-Conquer and Median-Based Combining 570 8 Real-World Applications 571 9 Discussion 572 31 Bayesian Aggregation 577 Yuling Yao 1 From Model Selection to Model Combination 577 2 From Bayesian Model Averaging to Bayesian Stacking 580 3 Asymptotic Theories of Stacking 584 4 Stacking in Practice 586 5 Discussion 588 32 Asynchronous Parallel Computing 593 Ming Yan 1 Introduction 593 2 Asynchronous Parallel Coordinate Update 597 3 Asynchronous Parallel Stochastic Approaches 602 4 Doubly Stochastic Coordinate Optimization with Variance Reduction 604 5 Concluding Remarks 605
List of Contributors xxiii Preface xxix Part I Computational Statistics and Data Science 1 1 Computational Statistics and Data Science in the Twenty-first Century 3 Andrew J. Holbrook, Akihiko Nishimura, Xiang Ji, and Marc A. Suchard 1 Introduction 3 2 Core Challenges 1-3 5 3 Model-Specific Advances 8 4 Core Challenges 4 and 5 12 5 Rise of Data Science 16 2 Statistical Software 23 Alfred G. Schissler and Alexander D. Knudson 1 User Development Environments 23 2 Popular Statistical Software 26 3 Noteworthy Statistical Software and Related Tools 30 4 Promising and Emerging Statistical Software 36 5 The Future of Statistical Computing 38 6 Concluding Remarks 39 3 An Introduction to Deep Learning Methods 43 Yao Li, Justin Wang and Thomas C.M. Lee 1 Introduction 43 2 Machine Learning: An Overview 43 3 Feedforward Neural Networks 45 4 Convolutional Neural Networks 48 5 Autoencoders 52 6 Recurrent Neural Networks 54 7 Conclusion 57 4 Streaming Data and Data Streams 59 Taiwo Kolajo, Olawande Daramola, and Ayodele Adebiyi 1 Introduction 59 2 Data Stream Computing 61 3 Issues in Data Stream Mining 61 4 Streaming Data Tools and Technologies 64 5 Streaming Data Pre-Processing: Concept and Implementation 65 6 Streaming Data Algorithms 65 7 Strategies for Processing Data Streams 68 8 Best Practices for Managing Data Streams 69 9 Conclusion and theWay Forward 70 Part II Simulation-Based Methods 79 5 Monte Carlo Simulation: Are We There Yet? 81 Dootika Vats, James M. Flegal, and Galin L. Jones 1 Introduction 81 2 Estimation 83 3 Sampling Distribution 84 4 Estimating
87 5 Stopping Rules 88 6 Workflow 89 7 Examples 90 6 Sequential Monte Carlo: Particle Filters and Beyond 99 Adam M. Johansen 1 Introduction 99 2 Sequential Importance Sampling and Resampling 99 3 SMC in Statistical Contexts 106 4 Selected Recent Developments 112 7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings 119 Christian P. Robert and Wu Changye 1 Introduction 119 2 Monte Carlo Methods 121 3 Markov Chain Monte Carlo Methods 128 4 Approximate Bayesian Computation 141 5 Further Reading 145 8 Bayesian Inference with Adaptive Markov Chain Monte Carlo 151 Matti Vihola 1 Introduction 151 2 Random-Walk Metropolis Algorithm 151 3 Adaptation of Random-Walk Metropolis 152 4 Multimodal Targets with Parallel Tempering 156 5 Dynamic Models with Particle Filters 157 6 Discussion 159 9 Advances in Importance Sampling 165 Víctor Elvira and Luca Martino 1 Introduction and Problem Statement 165 2 Importance Sampling 167 3 Multiple Importance Sampling (MIS) 171 4 Adaptive Importance Sampling (AIS) 174 Part III Statistical Learning 183 10 Supervised Learning 185 Weibin Mo and Yufeng Liu 1 Introduction 185 2 Penalized Empirical Risk Minimization 186 3 Linear Regression 190 4 Classification 193 5 Extensions for Complex Data 200 6 Discussion 203 11 Unsupervised and Semisupervised Learning 209 Jia Li and Vincent A. Pisztora 1 Introduction 209 2 Unsupervised Learning 210 3 Semisupervised Learning 219 4 Conclusions 224 12 Random Forest 231 Peter Calhoun, Xiaogang Su, Kelly M. Spoon, Richard A. Levine, and Juanjuan Fan 1 Introduction 231 2 Random Forest (RF) 232 3 Random Forest Extensions 235 4 Random Forests of Interaction Trees (RFIT) 239 5 Random Forest of Interaction Trees for Observational Studies 243 6 Discussion 249 13 Network Analysis 253 Rong Ma and Hongzhe Li 1 Introduction 253 2 Gaussian Graphical Models for Mixed Partial Compositional Data 255 3 Theoretical Properties 257 4 Graphical Model Selection 260 5 Analysis of a Microbiome-Metabolomics Data 260 6 Discussion 261 14 Tensors in Modern Statistical Learning 269 Will Wei Sun, Botao Hao, and Lexin Li 1 Introduction 269 2 Background270 3 Tensor Supervised Learning 272 4 Tensor Unsupervised Learning 276 5 Tensor Reinforcement Learning 282 6 Tensor Deep Learning 286 15 Computational Approaches to Bayesian Additive Regression Trees 297 Hugh Chipman, Edward George, Richard Hahn, Robert McCulloch, Matthew Pratola, and Rodney Sparapani 1 Introduction 297 2 Bayesian CART 298 3 TreeMCMC302 4 The BART Model 308 5 BART Example: Boston Housing Values and Air Pollution 310 6 BARTMCMC311 7 BART Extentions 313 8 Conclusion 320 Part IV High-Dimensional Data Analysis 323 16 Penalized Regression 325 Seung Jun Shin and Yichao Wu 1 Introduction 325 2 Penalization for Smoothness 326 3 Penalization for Sparsity 328 4 Tuning Parameter Selection 330 17 Model Selection in High-Dimensional Regression 333 Hao H. Zhang 1 Model Selection Problem 333 2 Model Selection in High-Dimensional Linear Regression 335 3 Interaction-Effect Selection for High-Dimensional Data 339 4 Model Selection in High-Dimensional Nonparametric Models 342 5 Concluding Remarks 349 18 Sampling Local Scale Parameters in High-Dimensional Regression Models 355 Anirban Bhattacharya and James E. Johndrow 1 Introduction 355 2 A Blocked Gibbs Sampler for the Horseshoe 356 3 Sampling (
,
2,
) 359 4 Sampling
360 5 Appendix: A. Newton-Raphson Steps for the Inverse-cdf Sampler for
367 19 Factor Modeling for High-Dimensional Time Series 371 Chun Yip Yau 1 Introduction 371 2 Identifiability 372 3 Estimation of High-Dimensional Factor Model 373 4 Determining the Number of Factors 383 Part V Quantitative Visualization 387 20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception 389 Edward Mulrow and Nola du Toit 1 Introduction 389 2 Case Studies Part 1 391 3 Let StAR Be Your Guide 393 4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics 394 5 Ask Colleagues Their Opinion 397 6 Case Studies: Part 3 398 7 Iterate 401 8 Final Thoughts 402 21 Uncertainty Visualization 405 Lace Padilla, Matthew Kay, and Jessica Hullman 1 Introduction 405 2 Uncertainty Visualization Theories 408 3 General Discussion 420 22 Big Data Visualization 427 Leland Wilkinson 1 Introduction 427 2 Architecture for Big Data Analytics 428 3 Filtering430 4 Aggregating 430 5 Analyzing 436 6 Big Data Graphics 436 7 Conclusion 440 23 Visualization-Assisted Statistical Learning 443 Catherine B. Hurley and Katarina Domijan 1 Introduction 443 2 Better Visualizations with Seriation 444 3 Visualizing Machine Learning Fits 445 4 Condvis2 Case Studies 447 5 Discussion 453 24 Functional Data Visualization 457 Marc G. Genton and Ying Sun 1 Introduction 457 2 Univariate Functional Data Visualization 458 3 Multivariate Functional Data Visualization 461 4 Conclusions 465 Part VI Numerical Approximation and Optimization 469 25 Gradient-Based Optimizers for Statistics and Machine Learning 471 Cho-Jui Hsieh 1 Introduction 471 2 Convex Versus Nonconvex Optimization 472 3 Gradient Descent 473 4 Proximal Gradient Descent: Handling Nondifferentiable Regularization 475 5 Stochastic Gradient Descent 476 26 Alternating Minimization Algorithms 481 David R. Hunter 1 Introduction 481 2 Coordinate Descent 482 3 EM as Alternating Minimization 484 3.1 Finite Mixture Models 485 4 Matrix Approximation Algorithms 486 5 Conclusion 489 27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems 493 Shiqian Ma and Mingyi Hong 1 Introduction 493 2 Two Perfect Examples of ADMM 494 3 Variable Splitting and Linearized ADMM 496 4 Multiblock ADMM 499 5 Nonconvex Problems 501 6 Stopping Criteria 502 7 Convergence Results of ADMM 502 28 Nonconvex Optimization via MM Algorithms: Convergence Theory 509 Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou 1 Background509 2 Convergence Theorems 510 3 Paracontraction 521 4 Bregman Majorization 523 Part VII High-Performance Computing 535 29 Massive Parallelization 537 Robert B. Gramacy 1 Introduction 537 2 Gaussian Process Regression and Surrogate Modeling 539 3 Divide-and-Conquer GP Regression 542 4 Empirical Results 548 5 Conclusion 552 30 Divide-and-Conquer Methods for Big Data Analysis 559 Xueying Chen, Jerry Q. Cheng, and Min-ge Xie 1 Introduction 559 2 Linear Regression Model 560 3 Parametric Models 561 4 Nonparametric and Semiparametric Models 567 5 Online Sequential Updating 568 6 Splitting the Number of Covariates 569 7 Bayesian Divide-and-Conquer and Median-Based Combining 570 8 Real-World Applications 571 9 Discussion 572 31 Bayesian Aggregation 577 Yuling Yao 1 From Model Selection to Model Combination 577 2 From Bayesian Model Averaging to Bayesian Stacking 580 3 Asymptotic Theories of Stacking 584 4 Stacking in Practice 586 5 Discussion 588 32 Asynchronous Parallel Computing 593 Ming Yan 1 Introduction 593 2 Asynchronous Parallel Coordinate Update 597 3 Asynchronous Parallel Stochastic Approaches 602 4 Doubly Stochastic Coordinate Optimization with Variance Reduction 604 5 Concluding Remarks 605
87 5 Stopping Rules 88 6 Workflow 89 7 Examples 90 6 Sequential Monte Carlo: Particle Filters and Beyond 99 Adam M. Johansen 1 Introduction 99 2 Sequential Importance Sampling and Resampling 99 3 SMC in Statistical Contexts 106 4 Selected Recent Developments 112 7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings 119 Christian P. Robert and Wu Changye 1 Introduction 119 2 Monte Carlo Methods 121 3 Markov Chain Monte Carlo Methods 128 4 Approximate Bayesian Computation 141 5 Further Reading 145 8 Bayesian Inference with Adaptive Markov Chain Monte Carlo 151 Matti Vihola 1 Introduction 151 2 Random-Walk Metropolis Algorithm 151 3 Adaptation of Random-Walk Metropolis 152 4 Multimodal Targets with Parallel Tempering 156 5 Dynamic Models with Particle Filters 157 6 Discussion 159 9 Advances in Importance Sampling 165 Víctor Elvira and Luca Martino 1 Introduction and Problem Statement 165 2 Importance Sampling 167 3 Multiple Importance Sampling (MIS) 171 4 Adaptive Importance Sampling (AIS) 174 Part III Statistical Learning 183 10 Supervised Learning 185 Weibin Mo and Yufeng Liu 1 Introduction 185 2 Penalized Empirical Risk Minimization 186 3 Linear Regression 190 4 Classification 193 5 Extensions for Complex Data 200 6 Discussion 203 11 Unsupervised and Semisupervised Learning 209 Jia Li and Vincent A. Pisztora 1 Introduction 209 2 Unsupervised Learning 210 3 Semisupervised Learning 219 4 Conclusions 224 12 Random Forest 231 Peter Calhoun, Xiaogang Su, Kelly M. Spoon, Richard A. Levine, and Juanjuan Fan 1 Introduction 231 2 Random Forest (RF) 232 3 Random Forest Extensions 235 4 Random Forests of Interaction Trees (RFIT) 239 5 Random Forest of Interaction Trees for Observational Studies 243 6 Discussion 249 13 Network Analysis 253 Rong Ma and Hongzhe Li 1 Introduction 253 2 Gaussian Graphical Models for Mixed Partial Compositional Data 255 3 Theoretical Properties 257 4 Graphical Model Selection 260 5 Analysis of a Microbiome-Metabolomics Data 260 6 Discussion 261 14 Tensors in Modern Statistical Learning 269 Will Wei Sun, Botao Hao, and Lexin Li 1 Introduction 269 2 Background270 3 Tensor Supervised Learning 272 4 Tensor Unsupervised Learning 276 5 Tensor Reinforcement Learning 282 6 Tensor Deep Learning 286 15 Computational Approaches to Bayesian Additive Regression Trees 297 Hugh Chipman, Edward George, Richard Hahn, Robert McCulloch, Matthew Pratola, and Rodney Sparapani 1 Introduction 297 2 Bayesian CART 298 3 TreeMCMC302 4 The BART Model 308 5 BART Example: Boston Housing Values and Air Pollution 310 6 BARTMCMC311 7 BART Extentions 313 8 Conclusion 320 Part IV High-Dimensional Data Analysis 323 16 Penalized Regression 325 Seung Jun Shin and Yichao Wu 1 Introduction 325 2 Penalization for Smoothness 326 3 Penalization for Sparsity 328 4 Tuning Parameter Selection 330 17 Model Selection in High-Dimensional Regression 333 Hao H. Zhang 1 Model Selection Problem 333 2 Model Selection in High-Dimensional Linear Regression 335 3 Interaction-Effect Selection for High-Dimensional Data 339 4 Model Selection in High-Dimensional Nonparametric Models 342 5 Concluding Remarks 349 18 Sampling Local Scale Parameters in High-Dimensional Regression Models 355 Anirban Bhattacharya and James E. Johndrow 1 Introduction 355 2 A Blocked Gibbs Sampler for the Horseshoe 356 3 Sampling (
,
2,
) 359 4 Sampling
360 5 Appendix: A. Newton-Raphson Steps for the Inverse-cdf Sampler for
367 19 Factor Modeling for High-Dimensional Time Series 371 Chun Yip Yau 1 Introduction 371 2 Identifiability 372 3 Estimation of High-Dimensional Factor Model 373 4 Determining the Number of Factors 383 Part V Quantitative Visualization 387 20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception 389 Edward Mulrow and Nola du Toit 1 Introduction 389 2 Case Studies Part 1 391 3 Let StAR Be Your Guide 393 4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics 394 5 Ask Colleagues Their Opinion 397 6 Case Studies: Part 3 398 7 Iterate 401 8 Final Thoughts 402 21 Uncertainty Visualization 405 Lace Padilla, Matthew Kay, and Jessica Hullman 1 Introduction 405 2 Uncertainty Visualization Theories 408 3 General Discussion 420 22 Big Data Visualization 427 Leland Wilkinson 1 Introduction 427 2 Architecture for Big Data Analytics 428 3 Filtering430 4 Aggregating 430 5 Analyzing 436 6 Big Data Graphics 436 7 Conclusion 440 23 Visualization-Assisted Statistical Learning 443 Catherine B. Hurley and Katarina Domijan 1 Introduction 443 2 Better Visualizations with Seriation 444 3 Visualizing Machine Learning Fits 445 4 Condvis2 Case Studies 447 5 Discussion 453 24 Functional Data Visualization 457 Marc G. Genton and Ying Sun 1 Introduction 457 2 Univariate Functional Data Visualization 458 3 Multivariate Functional Data Visualization 461 4 Conclusions 465 Part VI Numerical Approximation and Optimization 469 25 Gradient-Based Optimizers for Statistics and Machine Learning 471 Cho-Jui Hsieh 1 Introduction 471 2 Convex Versus Nonconvex Optimization 472 3 Gradient Descent 473 4 Proximal Gradient Descent: Handling Nondifferentiable Regularization 475 5 Stochastic Gradient Descent 476 26 Alternating Minimization Algorithms 481 David R. Hunter 1 Introduction 481 2 Coordinate Descent 482 3 EM as Alternating Minimization 484 3.1 Finite Mixture Models 485 4 Matrix Approximation Algorithms 486 5 Conclusion 489 27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems 493 Shiqian Ma and Mingyi Hong 1 Introduction 493 2 Two Perfect Examples of ADMM 494 3 Variable Splitting and Linearized ADMM 496 4 Multiblock ADMM 499 5 Nonconvex Problems 501 6 Stopping Criteria 502 7 Convergence Results of ADMM 502 28 Nonconvex Optimization via MM Algorithms: Convergence Theory 509 Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou 1 Background509 2 Convergence Theorems 510 3 Paracontraction 521 4 Bregman Majorization 523 Part VII High-Performance Computing 535 29 Massive Parallelization 537 Robert B. Gramacy 1 Introduction 537 2 Gaussian Process Regression and Surrogate Modeling 539 3 Divide-and-Conquer GP Regression 542 4 Empirical Results 548 5 Conclusion 552 30 Divide-and-Conquer Methods for Big Data Analysis 559 Xueying Chen, Jerry Q. Cheng, and Min-ge Xie 1 Introduction 559 2 Linear Regression Model 560 3 Parametric Models 561 4 Nonparametric and Semiparametric Models 567 5 Online Sequential Updating 568 6 Splitting the Number of Covariates 569 7 Bayesian Divide-and-Conquer and Median-Based Combining 570 8 Real-World Applications 571 9 Discussion 572 31 Bayesian Aggregation 577 Yuling Yao 1 From Model Selection to Model Combination 577 2 From Bayesian Model Averaging to Bayesian Stacking 580 3 Asymptotic Theories of Stacking 584 4 Stacking in Practice 586 5 Discussion 588 32 Asynchronous Parallel Computing 593 Ming Yan 1 Introduction 593 2 Asynchronous Parallel Coordinate Update 597 3 Asynchronous Parallel Stochastic Approaches 602 4 Doubly Stochastic Coordinate Optimization with Variance Reduction 604 5 Concluding Remarks 605