Jeffrey Aven
Data Analytics with Spark Using Python
Jeffrey Aven
Data Analytics with Spark Using Python
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Die 33jährige Angelika führt ein völlig normales Leben mit ihrem Ehemann, zwei halbwüchsigen Kindern und einem sicheren Job. Nach außen hin sieht es so aus, als ob sie glücklich wäre. Doch in ihrem Inneren, hinter der heilen Fassade, die sie allen zeigt, herrscht Leere. Aber erst als psychosomatische Bauchschmerzen auftreten, akzeptiert sie die Diagnose ¿Depression? und läßt sich in eine psychotherapeutische Klinik einweisen zu lassen. Belastet mit den größten Vorurteilen gegen die »Klinik für Bekloppte«, hat sie zunächst Schwierigkeiten, sich auf die Behandlung einzulassen. Doch nachdem…mehr
Andere Kunden interessierten sich auch für
- Alejandro VaismanData Warehouse Systems60,99 €
- Rudolf Plettenberg-LenhausenFrameworks for Distributed Big Data Processing44,99 €
- SanjayTaming the Heterogeneity: Building Reliable Link Repositories for the Web of Data26,59 €
- Zubair NabiPro Spark Streaming26,99 €
- Steven GatesTor Anonymity Network 10119,99 €
- Nomenclatura - Encyclopedia of modern Cryptography and Internet Security12,49 €
- JonnyPrivacy Paradox in the New IoT26,99 €
-
-
-
Die 33jährige Angelika führt ein völlig normales Leben mit ihrem Ehemann, zwei halbwüchsigen Kindern und einem sicheren Job. Nach außen hin sieht es so aus, als ob sie glücklich wäre. Doch in ihrem Inneren, hinter der heilen Fassade, die sie allen zeigt, herrscht Leere. Aber erst als psychosomatische Bauchschmerzen auftreten, akzeptiert sie die Diagnose ¿Depression? und läßt sich in eine psychotherapeutische Klinik einweisen zu lassen. Belastet mit den größten Vorurteilen gegen die »Klinik für Bekloppte«, hat sie zunächst Schwierigkeiten, sich auf die Behandlung einzulassen. Doch nachdem verdrängte Gefühle aus der Vergangenheit wieder hochgekommen sind und sie einen völligen Zusammenbruch erlebt hat, begreift sie allmählich, wie die Therapien wirken, und beginnt, aktiv mitzuarbeiten. Sie findet zu ihrem wirklichen Ich und geht den harten Weg zur Selbsterkenntnis ¿ hin zu neuem Lebensmut und Selbstvertrauen.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Addison Wesley / Addison-Wesley
- Seitenzahl: 320
- Erscheinungstermin: 6. Juni 2018
- Englisch
- Abmessung: 234mm x 179mm x 19mm
- Gewicht: 100g
- ISBN-13: 9780134846019
- ISBN-10: 013484601X
- Artikelnr.: 48432933
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
- Verlag: Addison Wesley / Addison-Wesley
- Seitenzahl: 320
- Erscheinungstermin: 6. Juni 2018
- Englisch
- Abmessung: 234mm x 179mm x 19mm
- Gewicht: 100g
- ISBN-13: 9780134846019
- ISBN-10: 013484601X
- Artikelnr.: 48432933
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
Jeffrey Aven is an independent Big Data, open source software and cloud computing professional based out of Melbourne, Australia. Jeffrey is a highly regarded consultant and instructor and has authored several other books including Teach Yourself Apache Spark in 24 Hours and Teach Yourself Hadoop in 24 Hours.
Preface xi
Introduction 1
PART I: SPARK FOUNDATIONS
Chapter 1 Introducing Big Data, Hadoop, and Spark 5
Introduction to Big Data, Distributed Computing, and Hadoop 5
A Brief History of Big Data and Hadoop 6
Hadoop Explained 7
Introduction to Apache Spark 13
Apache Spark Background 13
Uses for Spark 14
Programming Interfaces to Spark 14
Submission Types for Spark Programs 14
Input/Output Types for Spark Applications 16
The Spark RDD 16
Spark and Hadoop 16
Functional Programming Using Python 17
Data Structures Used in Functional Python Programming 17
Python Object Serialization 20
Python Functional Programming Basics 23
Summary 25
Chapter 2 Deploying Spark 27
Spark Deployment Modes 27
Local Mode 28
Spark Standalone 28
Spark on YARN 29
Spark on Mesos 30
Preparing to Install Spark 30
Getting Spark 31
Installing Spark on Linux or Mac OS X 32
Installing Spark on Windows 34
Exploring the Spark Installation 36
Deploying a Multi-Node Spark Standalone Cluster 37
Deploying Spark in the Cloud 39
Amazon Web Services (AWS) 39
Google Cloud Platform (GCP) 41
Databricks 42
Summary 43
Chapter 3 Understanding the Spark Cluster Architecture 45
Anatomy of a Spark Application 45
Spark Driver 46
Spark Workers and Executors 49
The Spark Master and Cluster Manager 51
Spark Applications Using the Standalone Scheduler 53
Spark Applications Running on YARN 53
Deployment Modes for Spark Applications Running on YARN 53
Client Mode 54
Cluster Mode 55
Local Mode Revisited 56
Summary 57
Chapter 4 Learning Spark Programming Basics 59
Introduction to RDDs 59
Loading Data into RDDs 61
Creating an RDD from a File or Files 61
Methods for Creating RDDs from a Text File or Files 63
Creating an RDD from an Object File 66
Creating an RDD from a Data Source 66
Creating RDDs from JSON Files 69
Creating an RDD Programmatically 71
Operations on RDDs 72
Key RDD Concepts 72
Basic RDD Transformations 77
Basic RDD Actions 81
Transformations on PairRDDs 85
MapReduce and Word Count Exercise 92
Join Transformations 95
Joining Datasets in Spark 100
Transformations on Sets 103
Transformations on Numeric RDDs 105
Summary 108
PART II: BEYOND THE BASICS
Chapter 5 Advanced Programming Using the Spark Core API 111
Shared Variables in Spark 111
Broadcast Variables 112
Accumulators 116
Exercise: Using Broadcast Variables and Accumulators 119
Partitioning Data in Spark 120
Partitioning Overview 120
Controlling Partitions 121
Repartitioning Functions 123
Partition-Specific or Partition-Aware API Methods 125
RDD Storage Options 127
RDD Lineage Revisited 127
RDD Storage Options 128
RDD Caching 131
Persisting RDDs 131
Choosing When to Persist or Cache RDDs 134
Checkpointing RDDs 134
Exercise: Checkpointing RDDs 136
Processing RDDs with External Programs 138
Data Sampling with Spark 139
Understanding Spark Application and Cluster Configuration 141
Spark Environment Variables 141
Spark Configuration Properties 145
Optimizing Spark 148
Filter Early, Filter Often 149
Optimizing Associative Operations 149
Understanding the Impact of Functions and Closures 151
Considerations for Collecting Data 152
Configuration Parameters for Tuning and Optimizing Applications
152
Avoiding Inefficient Partitioning 153
Diagnosing Application Performance Issues 155
Summary 159
Chapter 6 SQL and NoSQL Programming with Spark 161
Introduction to Spark SQL 161
Introduction to Hive 162
Spark SQL Architecture 166
Getting Started with DataFrames 168
Using DataFrames 179
Caching, Persisting, and Repartitioning DataFrames 187
Saving DataFrame Output 188
Accessing Spark SQL 191
Exercise: Using Spark SQL 194
Using Spark with NoSQL Systems 195
Introduction to NoSQL 196
Using Spark with HBase 197
Exercise: Using Spark with HBase 200
Using Spark with Cassandra 202
Using Spark with DynamoDB 204
Other NoSQL Platforms 206
Summary 206
Chapter 7 Stream Processing and Messaging Using Spark 209
Introducing Spark Streaming 209
Spark Streaming Architecture 210
Introduction to DStreams 211
Exercise: Getting Started with Spark Streaming 218
State Operations 219
Sliding Window Operations 221
Structured Streaming 223
Structured Streaming Data Sources 224
Structured Streaming Data Sinks 225
Output Modes 226
Structured Streaming Operations 227
Using Spark with Messaging Platforms 228
Apache Kafka 229
Exercise: Using Spark with Kafka 234
Amazon Kinesis 237
Summary 240
Chapter 8 Introduction to Data Science and Machine Learning Using
Spark 243
Spark and R 243
Introduction to R 244
Using Spark with R 250
Exercise: Using RStudio with SparkR 257
Machine Learning with Spark 259
Machine Learning Primer 259
Machine Learning Using Spark MLlib 262
Exercise: Implementing a Recommender Using Spark MLlib 267
Machine Learning Using Spark ML 271
Using Notebooks with Spark 275
Using Jupyter (IPython) Notebooks with Spark 275
Using Apache Zeppelin Notebooks with Spark 278
Summary 279
Index 281
Introduction 1
PART I: SPARK FOUNDATIONS
Chapter 1 Introducing Big Data, Hadoop, and Spark 5
Introduction to Big Data, Distributed Computing, and Hadoop 5
A Brief History of Big Data and Hadoop 6
Hadoop Explained 7
Introduction to Apache Spark 13
Apache Spark Background 13
Uses for Spark 14
Programming Interfaces to Spark 14
Submission Types for Spark Programs 14
Input/Output Types for Spark Applications 16
The Spark RDD 16
Spark and Hadoop 16
Functional Programming Using Python 17
Data Structures Used in Functional Python Programming 17
Python Object Serialization 20
Python Functional Programming Basics 23
Summary 25
Chapter 2 Deploying Spark 27
Spark Deployment Modes 27
Local Mode 28
Spark Standalone 28
Spark on YARN 29
Spark on Mesos 30
Preparing to Install Spark 30
Getting Spark 31
Installing Spark on Linux or Mac OS X 32
Installing Spark on Windows 34
Exploring the Spark Installation 36
Deploying a Multi-Node Spark Standalone Cluster 37
Deploying Spark in the Cloud 39
Amazon Web Services (AWS) 39
Google Cloud Platform (GCP) 41
Databricks 42
Summary 43
Chapter 3 Understanding the Spark Cluster Architecture 45
Anatomy of a Spark Application 45
Spark Driver 46
Spark Workers and Executors 49
The Spark Master and Cluster Manager 51
Spark Applications Using the Standalone Scheduler 53
Spark Applications Running on YARN 53
Deployment Modes for Spark Applications Running on YARN 53
Client Mode 54
Cluster Mode 55
Local Mode Revisited 56
Summary 57
Chapter 4 Learning Spark Programming Basics 59
Introduction to RDDs 59
Loading Data into RDDs 61
Creating an RDD from a File or Files 61
Methods for Creating RDDs from a Text File or Files 63
Creating an RDD from an Object File 66
Creating an RDD from a Data Source 66
Creating RDDs from JSON Files 69
Creating an RDD Programmatically 71
Operations on RDDs 72
Key RDD Concepts 72
Basic RDD Transformations 77
Basic RDD Actions 81
Transformations on PairRDDs 85
MapReduce and Word Count Exercise 92
Join Transformations 95
Joining Datasets in Spark 100
Transformations on Sets 103
Transformations on Numeric RDDs 105
Summary 108
PART II: BEYOND THE BASICS
Chapter 5 Advanced Programming Using the Spark Core API 111
Shared Variables in Spark 111
Broadcast Variables 112
Accumulators 116
Exercise: Using Broadcast Variables and Accumulators 119
Partitioning Data in Spark 120
Partitioning Overview 120
Controlling Partitions 121
Repartitioning Functions 123
Partition-Specific or Partition-Aware API Methods 125
RDD Storage Options 127
RDD Lineage Revisited 127
RDD Storage Options 128
RDD Caching 131
Persisting RDDs 131
Choosing When to Persist or Cache RDDs 134
Checkpointing RDDs 134
Exercise: Checkpointing RDDs 136
Processing RDDs with External Programs 138
Data Sampling with Spark 139
Understanding Spark Application and Cluster Configuration 141
Spark Environment Variables 141
Spark Configuration Properties 145
Optimizing Spark 148
Filter Early, Filter Often 149
Optimizing Associative Operations 149
Understanding the Impact of Functions and Closures 151
Considerations for Collecting Data 152
Configuration Parameters for Tuning and Optimizing Applications
152
Avoiding Inefficient Partitioning 153
Diagnosing Application Performance Issues 155
Summary 159
Chapter 6 SQL and NoSQL Programming with Spark 161
Introduction to Spark SQL 161
Introduction to Hive 162
Spark SQL Architecture 166
Getting Started with DataFrames 168
Using DataFrames 179
Caching, Persisting, and Repartitioning DataFrames 187
Saving DataFrame Output 188
Accessing Spark SQL 191
Exercise: Using Spark SQL 194
Using Spark with NoSQL Systems 195
Introduction to NoSQL 196
Using Spark with HBase 197
Exercise: Using Spark with HBase 200
Using Spark with Cassandra 202
Using Spark with DynamoDB 204
Other NoSQL Platforms 206
Summary 206
Chapter 7 Stream Processing and Messaging Using Spark 209
Introducing Spark Streaming 209
Spark Streaming Architecture 210
Introduction to DStreams 211
Exercise: Getting Started with Spark Streaming 218
State Operations 219
Sliding Window Operations 221
Structured Streaming 223
Structured Streaming Data Sources 224
Structured Streaming Data Sinks 225
Output Modes 226
Structured Streaming Operations 227
Using Spark with Messaging Platforms 228
Apache Kafka 229
Exercise: Using Spark with Kafka 234
Amazon Kinesis 237
Summary 240
Chapter 8 Introduction to Data Science and Machine Learning Using
Spark 243
Spark and R 243
Introduction to R 244
Using Spark with R 250
Exercise: Using RStudio with SparkR 257
Machine Learning with Spark 259
Machine Learning Primer 259
Machine Learning Using Spark MLlib 262
Exercise: Implementing a Recommender Using Spark MLlib 267
Machine Learning Using Spark ML 271
Using Notebooks with Spark 275
Using Jupyter (IPython) Notebooks with Spark 275
Using Apache Zeppelin Notebooks with Spark 278
Summary 279
Index 281
Preface xi
Introduction 1
PART I: SPARK FOUNDATIONS
Chapter 1 Introducing Big Data, Hadoop, and Spark 5
Introduction to Big Data, Distributed Computing, and Hadoop 5
A Brief History of Big Data and Hadoop 6
Hadoop Explained 7
Introduction to Apache Spark 13
Apache Spark Background 13
Uses for Spark 14
Programming Interfaces to Spark 14
Submission Types for Spark Programs 14
Input/Output Types for Spark Applications 16
The Spark RDD 16
Spark and Hadoop 16
Functional Programming Using Python 17
Data Structures Used in Functional Python Programming 17
Python Object Serialization 20
Python Functional Programming Basics 23
Summary 25
Chapter 2 Deploying Spark 27
Spark Deployment Modes 27
Local Mode 28
Spark Standalone 28
Spark on YARN 29
Spark on Mesos 30
Preparing to Install Spark 30
Getting Spark 31
Installing Spark on Linux or Mac OS X 32
Installing Spark on Windows 34
Exploring the Spark Installation 36
Deploying a Multi-Node Spark Standalone Cluster 37
Deploying Spark in the Cloud 39
Amazon Web Services (AWS) 39
Google Cloud Platform (GCP) 41
Databricks 42
Summary 43
Chapter 3 Understanding the Spark Cluster Architecture 45
Anatomy of a Spark Application 45
Spark Driver 46
Spark Workers and Executors 49
The Spark Master and Cluster Manager 51
Spark Applications Using the Standalone Scheduler 53
Spark Applications Running on YARN 53
Deployment Modes for Spark Applications Running on YARN 53
Client Mode 54
Cluster Mode 55
Local Mode Revisited 56
Summary 57
Chapter 4 Learning Spark Programming Basics 59
Introduction to RDDs 59
Loading Data into RDDs 61
Creating an RDD from a File or Files 61
Methods for Creating RDDs from a Text File or Files 63
Creating an RDD from an Object File 66
Creating an RDD from a Data Source 66
Creating RDDs from JSON Files 69
Creating an RDD Programmatically 71
Operations on RDDs 72
Key RDD Concepts 72
Basic RDD Transformations 77
Basic RDD Actions 81
Transformations on PairRDDs 85
MapReduce and Word Count Exercise 92
Join Transformations 95
Joining Datasets in Spark 100
Transformations on Sets 103
Transformations on Numeric RDDs 105
Summary 108
PART II: BEYOND THE BASICS
Chapter 5 Advanced Programming Using the Spark Core API 111
Shared Variables in Spark 111
Broadcast Variables 112
Accumulators 116
Exercise: Using Broadcast Variables and Accumulators 119
Partitioning Data in Spark 120
Partitioning Overview 120
Controlling Partitions 121
Repartitioning Functions 123
Partition-Specific or Partition-Aware API Methods 125
RDD Storage Options 127
RDD Lineage Revisited 127
RDD Storage Options 128
RDD Caching 131
Persisting RDDs 131
Choosing When to Persist or Cache RDDs 134
Checkpointing RDDs 134
Exercise: Checkpointing RDDs 136
Processing RDDs with External Programs 138
Data Sampling with Spark 139
Understanding Spark Application and Cluster Configuration 141
Spark Environment Variables 141
Spark Configuration Properties 145
Optimizing Spark 148
Filter Early, Filter Often 149
Optimizing Associative Operations 149
Understanding the Impact of Functions and Closures 151
Considerations for Collecting Data 152
Configuration Parameters for Tuning and Optimizing Applications
152
Avoiding Inefficient Partitioning 153
Diagnosing Application Performance Issues 155
Summary 159
Chapter 6 SQL and NoSQL Programming with Spark 161
Introduction to Spark SQL 161
Introduction to Hive 162
Spark SQL Architecture 166
Getting Started with DataFrames 168
Using DataFrames 179
Caching, Persisting, and Repartitioning DataFrames 187
Saving DataFrame Output 188
Accessing Spark SQL 191
Exercise: Using Spark SQL 194
Using Spark with NoSQL Systems 195
Introduction to NoSQL 196
Using Spark with HBase 197
Exercise: Using Spark with HBase 200
Using Spark with Cassandra 202
Using Spark with DynamoDB 204
Other NoSQL Platforms 206
Summary 206
Chapter 7 Stream Processing and Messaging Using Spark 209
Introducing Spark Streaming 209
Spark Streaming Architecture 210
Introduction to DStreams 211
Exercise: Getting Started with Spark Streaming 218
State Operations 219
Sliding Window Operations 221
Structured Streaming 223
Structured Streaming Data Sources 224
Structured Streaming Data Sinks 225
Output Modes 226
Structured Streaming Operations 227
Using Spark with Messaging Platforms 228
Apache Kafka 229
Exercise: Using Spark with Kafka 234
Amazon Kinesis 237
Summary 240
Chapter 8 Introduction to Data Science and Machine Learning Using
Spark 243
Spark and R 243
Introduction to R 244
Using Spark with R 250
Exercise: Using RStudio with SparkR 257
Machine Learning with Spark 259
Machine Learning Primer 259
Machine Learning Using Spark MLlib 262
Exercise: Implementing a Recommender Using Spark MLlib 267
Machine Learning Using Spark ML 271
Using Notebooks with Spark 275
Using Jupyter (IPython) Notebooks with Spark 275
Using Apache Zeppelin Notebooks with Spark 278
Summary 279
Index 281
Introduction 1
PART I: SPARK FOUNDATIONS
Chapter 1 Introducing Big Data, Hadoop, and Spark 5
Introduction to Big Data, Distributed Computing, and Hadoop 5
A Brief History of Big Data and Hadoop 6
Hadoop Explained 7
Introduction to Apache Spark 13
Apache Spark Background 13
Uses for Spark 14
Programming Interfaces to Spark 14
Submission Types for Spark Programs 14
Input/Output Types for Spark Applications 16
The Spark RDD 16
Spark and Hadoop 16
Functional Programming Using Python 17
Data Structures Used in Functional Python Programming 17
Python Object Serialization 20
Python Functional Programming Basics 23
Summary 25
Chapter 2 Deploying Spark 27
Spark Deployment Modes 27
Local Mode 28
Spark Standalone 28
Spark on YARN 29
Spark on Mesos 30
Preparing to Install Spark 30
Getting Spark 31
Installing Spark on Linux or Mac OS X 32
Installing Spark on Windows 34
Exploring the Spark Installation 36
Deploying a Multi-Node Spark Standalone Cluster 37
Deploying Spark in the Cloud 39
Amazon Web Services (AWS) 39
Google Cloud Platform (GCP) 41
Databricks 42
Summary 43
Chapter 3 Understanding the Spark Cluster Architecture 45
Anatomy of a Spark Application 45
Spark Driver 46
Spark Workers and Executors 49
The Spark Master and Cluster Manager 51
Spark Applications Using the Standalone Scheduler 53
Spark Applications Running on YARN 53
Deployment Modes for Spark Applications Running on YARN 53
Client Mode 54
Cluster Mode 55
Local Mode Revisited 56
Summary 57
Chapter 4 Learning Spark Programming Basics 59
Introduction to RDDs 59
Loading Data into RDDs 61
Creating an RDD from a File or Files 61
Methods for Creating RDDs from a Text File or Files 63
Creating an RDD from an Object File 66
Creating an RDD from a Data Source 66
Creating RDDs from JSON Files 69
Creating an RDD Programmatically 71
Operations on RDDs 72
Key RDD Concepts 72
Basic RDD Transformations 77
Basic RDD Actions 81
Transformations on PairRDDs 85
MapReduce and Word Count Exercise 92
Join Transformations 95
Joining Datasets in Spark 100
Transformations on Sets 103
Transformations on Numeric RDDs 105
Summary 108
PART II: BEYOND THE BASICS
Chapter 5 Advanced Programming Using the Spark Core API 111
Shared Variables in Spark 111
Broadcast Variables 112
Accumulators 116
Exercise: Using Broadcast Variables and Accumulators 119
Partitioning Data in Spark 120
Partitioning Overview 120
Controlling Partitions 121
Repartitioning Functions 123
Partition-Specific or Partition-Aware API Methods 125
RDD Storage Options 127
RDD Lineage Revisited 127
RDD Storage Options 128
RDD Caching 131
Persisting RDDs 131
Choosing When to Persist or Cache RDDs 134
Checkpointing RDDs 134
Exercise: Checkpointing RDDs 136
Processing RDDs with External Programs 138
Data Sampling with Spark 139
Understanding Spark Application and Cluster Configuration 141
Spark Environment Variables 141
Spark Configuration Properties 145
Optimizing Spark 148
Filter Early, Filter Often 149
Optimizing Associative Operations 149
Understanding the Impact of Functions and Closures 151
Considerations for Collecting Data 152
Configuration Parameters for Tuning and Optimizing Applications
152
Avoiding Inefficient Partitioning 153
Diagnosing Application Performance Issues 155
Summary 159
Chapter 6 SQL and NoSQL Programming with Spark 161
Introduction to Spark SQL 161
Introduction to Hive 162
Spark SQL Architecture 166
Getting Started with DataFrames 168
Using DataFrames 179
Caching, Persisting, and Repartitioning DataFrames 187
Saving DataFrame Output 188
Accessing Spark SQL 191
Exercise: Using Spark SQL 194
Using Spark with NoSQL Systems 195
Introduction to NoSQL 196
Using Spark with HBase 197
Exercise: Using Spark with HBase 200
Using Spark with Cassandra 202
Using Spark with DynamoDB 204
Other NoSQL Platforms 206
Summary 206
Chapter 7 Stream Processing and Messaging Using Spark 209
Introducing Spark Streaming 209
Spark Streaming Architecture 210
Introduction to DStreams 211
Exercise: Getting Started with Spark Streaming 218
State Operations 219
Sliding Window Operations 221
Structured Streaming 223
Structured Streaming Data Sources 224
Structured Streaming Data Sinks 225
Output Modes 226
Structured Streaming Operations 227
Using Spark with Messaging Platforms 228
Apache Kafka 229
Exercise: Using Spark with Kafka 234
Amazon Kinesis 237
Summary 240
Chapter 8 Introduction to Data Science and Machine Learning Using
Spark 243
Spark and R 243
Introduction to R 244
Using Spark with R 250
Exercise: Using RStudio with SparkR 257
Machine Learning with Spark 259
Machine Learning Primer 259
Machine Learning Using Spark MLlib 262
Exercise: Implementing a Recommender Using Spark MLlib 267
Machine Learning Using Spark ML 271
Using Notebooks with Spark 275
Using Jupyter (IPython) Notebooks with Spark 275
Using Apache Zeppelin Notebooks with Spark 278
Summary 279
Index 281