- Format: PDF
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Hier können Sie sich einloggen
Bitte loggen Sie sich zunächst in Ihr Kundenkonto ein oder registrieren Sie sich bei bücher.de, um das eBook-Abo tolino select nutzen zu können.
A guide to the principles and methods of data analysis that does not require knowledge of statistics or programming A General Introduction to Data Analytics is an essential guide to understand and use data analytics. This book is written using easy-to-understand terms and does not require familiarity with statistics or programming. The authors--noted experts in the field--highlight an explanation of the intuition behind the basic data analytics techniques. The text also contains exercises and illustrative examples. Thought to be easily accessible to non-experts, the book provides motivation to…mehr
- Geräte: PC
- mit Kopierschutz
- eBook Hilfe
- Größe: 8.75MB
Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in A, B, BG, CY, CZ, D, DK, EW, E, FIN, F, GR, HR, H, IRL, I, LT, L, LR, M, NL, PL, P, R, S, SLO, SK ausgeliefert werden.
- Produktdetails
- Verlag: John Wiley & Sons
- Seitenzahl: 352
- Erscheinungstermin: 2. Juli 2018
- Englisch
- ISBN-13: 9781119296256
- Artikelnr.: 53433315
- Verlag: John Wiley & Sons
- Seitenzahl: 352
- Erscheinungstermin: 2. Juli 2018
- Englisch
- ISBN-13: 9781119296256
- Artikelnr.: 53433315
erent Scale Type 77 4.2.1 Converting Nominal to Relative 78 4.2.2 Converting Ordinal to Relative or Absolute 81 4.2.3 Converting Relative or Absolute to Ordinal or Nominal 82 4.3 Converting to a Di
erent Scale 83 4.4 Data Transformation 85 4.5 Dimensionality Reduction 86 4.5.1 Attribute Aggregation 88 4.5.1.1 Principal Component Analysis 88 4.5.1.2 Independent Component Analysis 91 4.5.1.3 Multidimensional Scaling 91 4.5.2 Attribute Selection 92 4.5.2.1 Filters 92 4.5.2.2 Wrappers 93 4.5.2.3 Embedded 94 4.5.2.4 Search Strategies 95 4.6 Final Remarks 96 4.7 Exercises 96 5 Clustering 99 5.1 Distance Measures 100 5.1.1 Di
erences between Values of Common Attribute Types 101 5.1.2 Distance Measures for Objects with Quantitative Attributes 103 5.1.3 Distance Measures for Non-conventional Attributes 104 5.2 Clustering Validation 107 5.3 Clustering Techniques 108 5.3.1 K-means 110 5.3.1.1 Centroids and Distance Measures 110 5.3.1.2 How K-means Works 111 5.3.2 DBSCAN 115 5.3.3 Agglomerative Hierarchical Clustering Technique 117 5.3.3.1 Linkage Criterion 119 5.3.3.2 Dendrograms 120 5.4 Final Remarks 122 5.5 Exercises 123 6 Frequent Pattern Mining 125 6.1 Frequent Itemsets 127 6.1.1 Setting the min_sup Threshold 128 6.1.2 Apriori - a Join-based Method 131 6.1.3 Eclat 133 6.1.4 FP-Growth 134 6.1.5 Maximal and Closed Frequent Itemsets 138 6.2 Association Rules 139 6.3 Behind Support and Con
dence 142 6.3.1 Cross-support Patterns 143 6.3.2 Lift 144 6.3.3 Simpson's Paradox 145 6.4 Other Types of Pattern 147 6.4.1 Sequential patterns 147 6.4.2 Frequent Sequence Mining 148 6.4.3 Closed and Maximal Sequences 148 6.5 Final Remarks 149 6.6 Exercises 149 7 Cheat Sheet and Project on Descriptive Analytics 151 7.1 Cheat Sheet of Descriptive Analytics 151 7.1.1 On Data Summarization 151 7.1.2 On Clustering 151 7.1.3 On Frequent Pattern Mining 153 7.2 Project on Descriptive Analytics 154 7.2.1 Business Understanding 154 7.2.2 Data Understanding 155 7.2.3 Data Preparation 155 7.2.4 Modeling 157 7.2.5 Evaluation 158 7.2.6 Deployment 158 Part III Predicting the Unknown 159 8 Regression 161 8.1 Predictive Performance Estimation 164 8.1.1 Generalization 164 8.1.2 Model Validation 165 8.1.3 Predictive Performance Measures for Regression 169 8.2 Finding the Parameters of the Model 171 8.2.1 Linear Regression 171 8.2.1.1 Empirical Error 173 8.2.2 The Bias-variance Trade-ö 175 8.2.3 Shrinkage Methods 177 8.2.3.1 Ridge Regression 179 8.2.3.2 Lasso Regression 180 8.2.4 Methods that use Linear Combinations of Attributes 181 8.2.4.1 Principal Components Regression 181 8.2.4.2 Partial Least Squares Regression 182 8.3 Technique and Model Selection 182 8.4 Final Remarks 183 8.5 Exercises 184 9 Classi
cation 187 9.1 Binary Classi
cation 188 9.2 Predictive Performance Measures for Classi
cation 192 9.3 Distance-based Learning Algorithms 199 9.3.1 K-nearest Neighbor Algorithms 199 9.3.2 Case-based Reasoning 202 9.4 Probabilistic Classi
cation Algorithms 203 9.4.1 Logistic Regression Algorithm 205 9.4.2 Naive Bayes Algorithm 207 9.5 Final Remarks 208 9.6 Exercises 210 10 Additional Predictive Methods 211 10.1 Search-based Algorithms 211 10.1.1 Decision Tree Induction Algorithms 212 10.1.2 Decision Trees for Regression 217 10.1.2.1 Model Trees 218 10.1.2.2 Multivariate Adaptive Regression Splines 219 10.2 Optimization-based Algorithms 221 10.2.1 Arti
cial Neural Networks 222 10.2.1.1 Backpropagation 224 10.2.1.2 Deep Networks and Deep Learning Algorithms 230 10.2.2 Support Vector Machines 233 10.2.2.1 SVM for Regression 237 10.3 Final Remarks 238 10.4 Exercises 239 11 Advanced Predictive Topics 241 11.1 Ensemble Learning 241 11.1.1 Bagging 243 11.1.2 Random Forests 244 11.1.3 AdaBoost 245 11.2 Algorithm Bias 246 11.3 Non-binary Classi
cation Tasks 248 11.3.1 One-class Classi
cation 248 11.3.2 Multi-class Classi
cation 249 11.3.3 Ranking Classi
cation 250 11.3.4 Multi-label Classi
cation 251 11.3.5 Hierarchical Classi
cation 252 11.4 Advanced Data Preparation Techniques for Prediction 253 11.4.1 Imbalanced Data Classi
cation 253 11.4.2 For Incomplete Target Labeling 254 11.4.2.1 Semi-supervised Learning 254 11.4.2.2 Active Learning 255 11.5 Description and Prediction with Supervised Interpretable Techniques 255 11.6 Exercises 256 12 Cheat Sheet and Project on Predictive Analytics 259 12.1 Cheat Sheet on Predictive Analytics 259 12.2 Project on Predictive Analytics 259 12.2.1 Business Understanding 260 12.2.2 Data Understanding 260 12.2.3 Data Preparation 265 12.2.4 Modeling 265 12.2.5 Evaluation 265 12.2.6 Deployment 266 Part IV Popular Data Analytics Applications 267 13 Applications for Text, Web and Social Media 269 13.1 Working with Texts 269 13.1.1 Data Acquisition 271 13.1.2 Feature Extraction 271 13.1.2.1 Tokenization 272 13.1.2.2 Stemming 272 13.1.2.3 Conversion to Structured Data 275 13.1.2.4 Is the Bag of Words Enough? 276 13.1.3 Remaining Phases 277 13.1.4 Trends 277 13.1.4.1 Sentiment Analysis 278 13.1.4.2 Web Mining 278 13.2 Recommender Systems 278 13.2.1 Feedback 279 13.2.2 Recommendation Tasks 280 13.2.3 Recommendation Techniques 281 13.2.3.1 Knowledge-based Techniques 281 13.2.3.2 Content-based Techniques 282 13.2.3.3 Collaborative Filtering Techniques 282 13.2.4 Final Remarks 289 13.3 Social Network Analysis 291 13.3.1 Representing Social Networks 291 13.3.2 Basic Properties of Nodes 294 13.3.2.1 Degree 294 13.3.2.2 Distance 294 13.3.2.3 Closeness 295 13.3.2.4 Betweenness 296 13.3.2.5 Clustering Coe
cient 297 13.3.3 Basic and Structural Properties of Networks 297 13.3.3.1 Diameter 297 13.3.3.2 Centralization 297 13.3.3.3 Cliques 299 13.3.3.4 Clustering Coe
cient 299 13.3.3.5 Modularity 299 13.3.4 Trends and Final Remarks 299 13.4 Exercises 300 Apendix A: Comprehensive Description of the CRISP-DM Methodology 303 References 311 Index 315
erent Scale Type 77 4.2.1 Converting Nominal to Relative 78 4.2.2 Converting Ordinal to Relative or Absolute 81 4.2.3 Converting Relative or Absolute to Ordinal or Nominal 82 4.3 Converting to a Di
erent Scale 83 4.4 Data Transformation 85 4.5 Dimensionality Reduction 86 4.5.1 Attribute Aggregation 88 4.5.1.1 Principal Component Analysis 88 4.5.1.2 Independent Component Analysis 91 4.5.1.3 Multidimensional Scaling 91 4.5.2 Attribute Selection 92 4.5.2.1 Filters 92 4.5.2.2 Wrappers 93 4.5.2.3 Embedded 94 4.5.2.4 Search Strategies 95 4.6 Final Remarks 96 4.7 Exercises 96 5 Clustering 99 5.1 Distance Measures 100 5.1.1 Di
erences between Values of Common Attribute Types 101 5.1.2 Distance Measures for Objects with Quantitative Attributes 103 5.1.3 Distance Measures for Non-conventional Attributes 104 5.2 Clustering Validation 107 5.3 Clustering Techniques 108 5.3.1 K-means 110 5.3.1.1 Centroids and Distance Measures 110 5.3.1.2 How K-means Works 111 5.3.2 DBSCAN 115 5.3.3 Agglomerative Hierarchical Clustering Technique 117 5.3.3.1 Linkage Criterion 119 5.3.3.2 Dendrograms 120 5.4 Final Remarks 122 5.5 Exercises 123 6 Frequent Pattern Mining 125 6.1 Frequent Itemsets 127 6.1.1 Setting the min_sup Threshold 128 6.1.2 Apriori - a Join-based Method 131 6.1.3 Eclat 133 6.1.4 FP-Growth 134 6.1.5 Maximal and Closed Frequent Itemsets 138 6.2 Association Rules 139 6.3 Behind Support and Con
dence 142 6.3.1 Cross-support Patterns 143 6.3.2 Lift 144 6.3.3 Simpson's Paradox 145 6.4 Other Types of Pattern 147 6.4.1 Sequential patterns 147 6.4.2 Frequent Sequence Mining 148 6.4.3 Closed and Maximal Sequences 148 6.5 Final Remarks 149 6.6 Exercises 149 7 Cheat Sheet and Project on Descriptive Analytics 151 7.1 Cheat Sheet of Descriptive Analytics 151 7.1.1 On Data Summarization 151 7.1.2 On Clustering 151 7.1.3 On Frequent Pattern Mining 153 7.2 Project on Descriptive Analytics 154 7.2.1 Business Understanding 154 7.2.2 Data Understanding 155 7.2.3 Data Preparation 155 7.2.4 Modeling 157 7.2.5 Evaluation 158 7.2.6 Deployment 158 Part III Predicting the Unknown 159 8 Regression 161 8.1 Predictive Performance Estimation 164 8.1.1 Generalization 164 8.1.2 Model Validation 165 8.1.3 Predictive Performance Measures for Regression 169 8.2 Finding the Parameters of the Model 171 8.2.1 Linear Regression 171 8.2.1.1 Empirical Error 173 8.2.2 The Bias-variance Trade-ö 175 8.2.3 Shrinkage Methods 177 8.2.3.1 Ridge Regression 179 8.2.3.2 Lasso Regression 180 8.2.4 Methods that use Linear Combinations of Attributes 181 8.2.4.1 Principal Components Regression 181 8.2.4.2 Partial Least Squares Regression 182 8.3 Technique and Model Selection 182 8.4 Final Remarks 183 8.5 Exercises 184 9 Classi
cation 187 9.1 Binary Classi
cation 188 9.2 Predictive Performance Measures for Classi
cation 192 9.3 Distance-based Learning Algorithms 199 9.3.1 K-nearest Neighbor Algorithms 199 9.3.2 Case-based Reasoning 202 9.4 Probabilistic Classi
cation Algorithms 203 9.4.1 Logistic Regression Algorithm 205 9.4.2 Naive Bayes Algorithm 207 9.5 Final Remarks 208 9.6 Exercises 210 10 Additional Predictive Methods 211 10.1 Search-based Algorithms 211 10.1.1 Decision Tree Induction Algorithms 212 10.1.2 Decision Trees for Regression 217 10.1.2.1 Model Trees 218 10.1.2.2 Multivariate Adaptive Regression Splines 219 10.2 Optimization-based Algorithms 221 10.2.1 Arti
cial Neural Networks 222 10.2.1.1 Backpropagation 224 10.2.1.2 Deep Networks and Deep Learning Algorithms 230 10.2.2 Support Vector Machines 233 10.2.2.1 SVM for Regression 237 10.3 Final Remarks 238 10.4 Exercises 239 11 Advanced Predictive Topics 241 11.1 Ensemble Learning 241 11.1.1 Bagging 243 11.1.2 Random Forests 244 11.1.3 AdaBoost 245 11.2 Algorithm Bias 246 11.3 Non-binary Classi
cation Tasks 248 11.3.1 One-class Classi
cation 248 11.3.2 Multi-class Classi
cation 249 11.3.3 Ranking Classi
cation 250 11.3.4 Multi-label Classi
cation 251 11.3.5 Hierarchical Classi
cation 252 11.4 Advanced Data Preparation Techniques for Prediction 253 11.4.1 Imbalanced Data Classi
cation 253 11.4.2 For Incomplete Target Labeling 254 11.4.2.1 Semi-supervised Learning 254 11.4.2.2 Active Learning 255 11.5 Description and Prediction with Supervised Interpretable Techniques 255 11.6 Exercises 256 12 Cheat Sheet and Project on Predictive Analytics 259 12.1 Cheat Sheet on Predictive Analytics 259 12.2 Project on Predictive Analytics 259 12.2.1 Business Understanding 260 12.2.2 Data Understanding 260 12.2.3 Data Preparation 265 12.2.4 Modeling 265 12.2.5 Evaluation 265 12.2.6 Deployment 266 Part IV Popular Data Analytics Applications 267 13 Applications for Text, Web and Social Media 269 13.1 Working with Texts 269 13.1.1 Data Acquisition 271 13.1.2 Feature Extraction 271 13.1.2.1 Tokenization 272 13.1.2.2 Stemming 272 13.1.2.3 Conversion to Structured Data 275 13.1.2.4 Is the Bag of Words Enough? 276 13.1.3 Remaining Phases 277 13.1.4 Trends 277 13.1.4.1 Sentiment Analysis 278 13.1.4.2 Web Mining 278 13.2 Recommender Systems 278 13.2.1 Feedback 279 13.2.2 Recommendation Tasks 280 13.2.3 Recommendation Techniques 281 13.2.3.1 Knowledge-based Techniques 281 13.2.3.2 Content-based Techniques 282 13.2.3.3 Collaborative Filtering Techniques 282 13.2.4 Final Remarks 289 13.3 Social Network Analysis 291 13.3.1 Representing Social Networks 291 13.3.2 Basic Properties of Nodes 294 13.3.2.1 Degree 294 13.3.2.2 Distance 294 13.3.2.3 Closeness 295 13.3.2.4 Betweenness 296 13.3.2.5 Clustering Coe
cient 297 13.3.3 Basic and Structural Properties of Networks 297 13.3.3.1 Diameter 297 13.3.3.2 Centralization 297 13.3.3.3 Cliques 299 13.3.3.4 Clustering Coe
cient 299 13.3.3.5 Modularity 299 13.3.4 Trends and Final Remarks 299 13.4 Exercises 300 Apendix A: Comprehensive Description of the CRISP-DM Methodology 303 References 311 Index 315