Produktbild: Data Analytics with Spark Using Python

Data Analytics with Spark Using Python

38,99 €

inkl. gesetzl. MwSt., Versandkostenfrei

Lieferung nach Hause

Beschreibung

Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

06.06.2018

Verlag

Addison Wesley Longman

Seitenzahl

320

Maße (L/B/H)

23,4/17,9/1,9 cm

Gewicht

521 g

Auflage

1. Auflage

Sprache

Englisch

ISBN

978-0-13-484601-9

Beschreibung

Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

06.06.2018

Verlag

Addison Wesley Longman

Seitenzahl

320

Maße (L/B/H)

23,4/17,9/1,9 cm

Gewicht

521 g

Auflage

1. Auflage

Sprache

Englisch

ISBN

978-0-13-484601-9

Herstelleradresse

Libri GmbH
Europaallee 1
36244 Bad Hersfeld
DE

Email: gpsr@libri.de

Kundinnen und Kunden meinen

0 Bewertungen

Informationen zu Bewertungen

Zur Abgabe einer Bewertung ist eine Anmeldung im Konto notwendig. Die Authentizität der Bewertungen wird von uns nicht überprüft. Wir behalten uns vor, Bewertungstexte, die unseren Richtlinien widersprechen, entsprechend zu kürzen oder zu löschen.

Die Bewertungen sind nach Format, Anzahl Sterne und Datum sortiert.

Verfassen Sie die erste Bewertung zu diesem Artikel

Helfen Sie anderen Kund*innen durch Ihre Meinung

Kundinnen und Kunden meinen

0 Bewertungen filtern

  • Produktbild: Data Analytics with Spark Using Python
  • Preface     xi
    Introduction     1

    PART I:  SPARK FOUNDATIONS
    Chapter 1  Introducing Big Data, Hadoop, and Spark     5

    Introduction to Big Data, Distributed Computing, and Hadoop     5
         A Brief History of Big Data and Hadoop     6
         Hadoop Explained     7
    Introduction to Apache Spark     13
         Apache Spark Background     13
         Uses for Spark     14
         Programming Interfaces to Spark     14
         Submission Types for Spark Programs     14
         Input/Output Types for Spark Applications     16
         The Spark RDD     16
         Spark and Hadoop     16
    Functional Programming Using Python     17
         Data Structures Used in Functional Python Programming     17
         Python Object Serialization     20
         Python Functional Programming Basics     23
    Summary     25
    Chapter 2  Deploying Spark     27
    Spark Deployment Modes     27
         Local Mode     28
         Spark Standalone     28
         Spark on YARN     29
         Spark on Mesos     30
    Preparing to Install Spark     30
    Getting Spark     31
    Installing Spark on Linux or Mac OS X     32
    Installing Spark on Windows     34
    Exploring the Spark Installation     36
    Deploying a Multi-Node Spark Standalone Cluster     37
    Deploying Spark in the Cloud     39
         Amazon Web Services (AWS)     39
         Google Cloud Platform (GCP)     41
         Databricks     42
    Summary     43
    Chapter 3  Understanding the Spark Cluster Architecture     45
    Anatomy of a Spark Application     45
         Spark Driver     46
         Spark Workers and Executors     49
         The Spark Master and Cluster Manager     51
    Spark Applications Using the Standalone Scheduler     53
         Spark Applications Running on YARN     53
    Deployment Modes for Spark Applications Running on YARN     53
         Client Mode     54
         Cluster Mode     55
         Local Mode Revisited     56
    Summary     57
    Chapter 4  Learning Spark Programming Basics     59
    Introduction to RDDs     59
    Loading Data into RDDs     61
         Creating an RDD from a File or Files     61
         Methods for Creating RDDs from a Text File or Files     63
         Creating an RDD from an Object File     66
         Creating an RDD from a Data Source     66
         Creating RDDs from JSON Files     69
         Creating an RDD Programmatically     71
    Operations on RDDs     72
         Key RDD Concepts     72
         Basic RDD Transformations     77
         Basic RDD Actions     81
         Transformations on PairRDDs     85
         MapReduce and Word Count Exercise     92
         Join Transformations     95
         Joining Datasets in Spark     100
         Transformations on Sets     103
         Transformations on Numeric RDDs     105
    Summary     108

    PART II:  BEYOND THE BASICS
    Chapter 5  Advanced Programming Using the Spark Core API     111

    Shared Variables in Spark     111
         Broadcast Variables     112
         Accumulators     116
         Exercise: Using Broadcast Variables and Accumulators     119
    Partitioning Data in Spark     120
         Partitioning Overview     120
         Controlling Partitions     121
         Repartitioning Functions     123
         Partition-Specific or Partition-Aware API Methods     125
    RDD Storage Options     127
         RDD Lineage Revisited     127
         RDD Storage Options     128
         RDD Caching     131
         Persisting RDDs     131
         Choosing When to Persist or Cache RDDs     134
         Checkpointing RDDs     134
         Exercise: Checkpointing RDDs     136
    Processing RDDs with External Programs     138
    Data Sampling with Spark     139
    Understanding Spark Application and Cluster Configuration     141
         Spark Environment Variables     141
         Spark Configuration Properties     145
    Optimizing Spark     148
         Filter Early, Filter Often     149
         Optimizing Associative Operations     149
         Understanding the Impact of Functions and Closures     151
         Considerations for Collecting Data     152
         Configuration Parameters for Tuning and Optimizing Applications     152
         Avoiding Inefficient Partitioning     153
         Diagnosing Application Performance Issues     155
    Summary     159
    Chapter 6  SQL and NoSQL Programming with Spark     161
    Introduction to Spark SQL     161
         Introduction to Hive     162
         Spark SQL Architecture     166
         Getting Started with DataFrames     168
         Using DataFrames     179
         Caching, Persisting, and Repartitioning DataFrames     187
         Saving DataFrame Output     188
         Accessing Spark SQL     191
         Exercise: Using Spark SQL     194
    Using Spark with NoSQL Systems     195
         Introduction to NoSQL     196
         Using Spark with HBase     197
         Exercise: Using Spark with HBase     200
         Using Spark with Cassandra     202
         Using Spark with DynamoDB     204
         Other NoSQL Platforms     206
    Summary     206
    Chapter 7  Stream Processing and Messaging Using Spark     209
    Introducing Spark Streaming     209
         Spark Streaming Architecture     210
         Introduction to DStreams     211
         Exercise: Getting Started with Spark Streaming     218
         State Operations     219
         Sliding Window Operations     221
    Structured Streaming     223
         Structured Streaming Data Sources     224
         Structured Streaming Data Sinks     225
         Output Modes     226
         Structured Streaming Operations     227
    Using Spark with Messaging Platforms     228
         Apache Kafka     229
         Exercise: Using Spark with Kafka     234
         Amazon Kinesis     237
    Summary     240
    Chapter 8  Introduction to Data Science and Machine Learning Using Spark     243
    Spark and R     243
         Introduction to R     244
         Using Spark with R     250
         Exercise: Using RStudio with SparkR     257
    Machine Learning with Spark     259
         Machine Learning Primer     259
         Machine Learning Using Spark MLlib     262
         Exercise: Implementing a Recommender Using Spark MLlib     267
         Machine Learning Using Spark ML     271
    Using Notebooks with Spark     275
         Using Jupyter (IPython) Notebooks with Spark     275
         Using Apache Zeppelin Notebooks with Spark     278
    Summary     279
    Index     281