Tom White
Hadoop: The Definitive Guide
4 Angebote ab € 4,95 €
Tom White
Hadoop: The Definitive Guide
- Buch
Produktdetails
- Verlag: O'Reilly Vlg. GmbH & Co.
- ISBN-13: 9781449389734
- ISBN-10: 1449389732
- Artikelnr.: 30842931
Chapter 1 Meet Hadoop
- Data!
- Data Storage and Analysis
- Comparison with Other Systems
- A Brief History of Hadoop
- The Apache Hadoop Project
Chapter 2 MapReduce
- A Weather Dataset
- Analyzing the Data with Unix Tools
- Analyzing the Data with Hadoop
- Scaling Out
- Hadoop Streaming
- Hadoop Pipes
Chapter 3 The Hadoop Distributed Filesystem
- The Design of HDFS
- HDFS Concepts
- The Command-Line Interface
- Hadoop Filesystems
- The Java Interface
- Data Flow
- Parallel Copying with distcp
- Hadoop Archives
Chapter 4 Hadoop I/O
- Data Integrity
- Compression
- Serialization
- File-Based Data Structures
Chapter 5 Developing a MapReduce Application
- The Configuration API
- Configuring the Development Environment
- Writing a Unit Test
- Running Locally on Test Data
- Running on a Cluster
- Tuning a Job
- MapReduce Workflows
Chapter 6 How MapReduce Works
- Anatomy of a MapReduce Job Run
- Failures
- Job Scheduling
- Shuffle and Sort
- Task Execution
Chapter 7 MapReduce Types and Formats
- MapReduce Types
- Input Formats
- Output Formats
Chapter 8 MapReduce Features
- Counters
- Sorting
- Joins
- Side Data Distribution
- MapReduce Library Classes
Chapter 9 Setting Up a Hadoop Cluster
- Cluster Specification
- Cluster Setup and Installation
- SSH Configuration
- Hadoop Configuration
- Post Install
- Benchmarking a Hadoop Cluster
- Hadoop in the Cloud
Chapter 10 Administering Hadoop
- HDFS
- Monitoring
- Maintenance
Chapter 11 Pig
- Installing and Running Pig
- An Example
- Comparison with Databases
- Pig Latin
- User-Defined Functions
- Data Processing Operators
- Pig in Practice
Chapter 12 HBase
- HBasics
- Concepts
- Installation
- Clients
- Example
- HBase Versus RDBMS
- Praxis
Chapter 13 ZooKeeper
- Installing and Running ZooKeeper
- An Example
- The ZooKeeper Service
- Building Applications with ZooKeeper
- ZooKeeper in Production
Chapter 14 Case Studies
- Hadoop Usage at Last.fm
- Hadoop and Hive at Facebook
- Nutch Search Engine
- Log Processing at Rackspace
- Cascading
- TeraByte Sort on Apache Hadoop
- Appendix Installing Apache Hadoop
- Prerequisites
- Installation
- Configuration
- Appendix Cloudera's Distribution for Hadoop
- Prerequisites
- Standalone Mode
- Pseudo-Distributed Mode
- Fully Distributed Mode
- Hadoop-Related Packages
- Appendix Preparing the NCDC Weather Data
- Colophon
- Data!
- Data Storage and Analysis
- Comparison with Other Systems
- A Brief History of Hadoop
- The Apache Hadoop Project
Chapter 2 MapReduce
- A Weather Dataset
- Analyzing the Data with Unix Tools
- Analyzing the Data with Hadoop
- Scaling Out
- Hadoop Streaming
- Hadoop Pipes
Chapter 3 The Hadoop Distributed Filesystem
- The Design of HDFS
- HDFS Concepts
- The Command-Line Interface
- Hadoop Filesystems
- The Java Interface
- Data Flow
- Parallel Copying with distcp
- Hadoop Archives
Chapter 4 Hadoop I/O
- Data Integrity
- Compression
- Serialization
- File-Based Data Structures
Chapter 5 Developing a MapReduce Application
- The Configuration API
- Configuring the Development Environment
- Writing a Unit Test
- Running Locally on Test Data
- Running on a Cluster
- Tuning a Job
- MapReduce Workflows
Chapter 6 How MapReduce Works
- Anatomy of a MapReduce Job Run
- Failures
- Job Scheduling
- Shuffle and Sort
- Task Execution
Chapter 7 MapReduce Types and Formats
- MapReduce Types
- Input Formats
- Output Formats
Chapter 8 MapReduce Features
- Counters
- Sorting
- Joins
- Side Data Distribution
- MapReduce Library Classes
Chapter 9 Setting Up a Hadoop Cluster
- Cluster Specification
- Cluster Setup and Installation
- SSH Configuration
- Hadoop Configuration
- Post Install
- Benchmarking a Hadoop Cluster
- Hadoop in the Cloud
Chapter 10 Administering Hadoop
- HDFS
- Monitoring
- Maintenance
Chapter 11 Pig
- Installing and Running Pig
- An Example
- Comparison with Databases
- Pig Latin
- User-Defined Functions
- Data Processing Operators
- Pig in Practice
Chapter 12 HBase
- HBasics
- Concepts
- Installation
- Clients
- Example
- HBase Versus RDBMS
- Praxis
Chapter 13 ZooKeeper
- Installing and Running ZooKeeper
- An Example
- The ZooKeeper Service
- Building Applications with ZooKeeper
- ZooKeeper in Production
Chapter 14 Case Studies
- Hadoop Usage at Last.fm
- Hadoop and Hive at Facebook
- Nutch Search Engine
- Log Processing at Rackspace
- Cascading
- TeraByte Sort on Apache Hadoop
- Appendix Installing Apache Hadoop
- Prerequisites
- Installation
- Configuration
- Appendix Cloudera's Distribution for Hadoop
- Prerequisites
- Standalone Mode
- Pseudo-Distributed Mode
- Fully Distributed Mode
- Hadoop-Related Packages
- Appendix Preparing the NCDC Weather Data
- Colophon
Chapter 1 Meet Hadoop
- Data!
- Data Storage and Analysis
- Comparison with Other Systems
- A Brief History of Hadoop
- The Apache Hadoop Project
Chapter 2 MapReduce
- A Weather Dataset
- Analyzing the Data with Unix Tools
- Analyzing the Data with Hadoop
- Scaling Out
- Hadoop Streaming
- Hadoop Pipes
Chapter 3 The Hadoop Distributed Filesystem
- The Design of HDFS
- HDFS Concepts
- The Command-Line Interface
- Hadoop Filesystems
- The Java Interface
- Data Flow
- Parallel Copying with distcp
- Hadoop Archives
Chapter 4 Hadoop I/O
- Data Integrity
- Compression
- Serialization
- File-Based Data Structures
Chapter 5 Developing a MapReduce Application
- The Configuration API
- Configuring the Development Environment
- Writing a Unit Test
- Running Locally on Test Data
- Running on a Cluster
- Tuning a Job
- MapReduce Workflows
Chapter 6 How MapReduce Works
- Anatomy of a MapReduce Job Run
- Failures
- Job Scheduling
- Shuffle and Sort
- Task Execution
Chapter 7 MapReduce Types and Formats
- MapReduce Types
- Input Formats
- Output Formats
Chapter 8 MapReduce Features
- Counters
- Sorting
- Joins
- Side Data Distribution
- MapReduce Library Classes
Chapter 9 Setting Up a Hadoop Cluster
- Cluster Specification
- Cluster Setup and Installation
- SSH Configuration
- Hadoop Configuration
- Post Install
- Benchmarking a Hadoop Cluster
- Hadoop in the Cloud
Chapter 10 Administering Hadoop
- HDFS
- Monitoring
- Maintenance
Chapter 11 Pig
- Installing and Running Pig
- An Example
- Comparison with Databases
- Pig Latin
- User-Defined Functions
- Data Processing Operators
- Pig in Practice
Chapter 12 HBase
- HBasics
- Concepts
- Installation
- Clients
- Example
- HBase Versus RDBMS
- Praxis
Chapter 13 ZooKeeper
- Installing and Running ZooKeeper
- An Example
- The ZooKeeper Service
- Building Applications with ZooKeeper
- ZooKeeper in Production
Chapter 14 Case Studies
- Hadoop Usage at Last.fm
- Hadoop and Hive at Facebook
- Nutch Search Engine
- Log Processing at Rackspace
- Cascading
- TeraByte Sort on Apache Hadoop
- Appendix Installing Apache Hadoop
- Prerequisites
- Installation
- Configuration
- Appendix Cloudera's Distribution for Hadoop
- Prerequisites
- Standalone Mode
- Pseudo-Distributed Mode
- Fully Distributed Mode
- Hadoop-Related Packages
- Appendix Preparing the NCDC Weather Data
- Colophon
- Data!
- Data Storage and Analysis
- Comparison with Other Systems
- A Brief History of Hadoop
- The Apache Hadoop Project
Chapter 2 MapReduce
- A Weather Dataset
- Analyzing the Data with Unix Tools
- Analyzing the Data with Hadoop
- Scaling Out
- Hadoop Streaming
- Hadoop Pipes
Chapter 3 The Hadoop Distributed Filesystem
- The Design of HDFS
- HDFS Concepts
- The Command-Line Interface
- Hadoop Filesystems
- The Java Interface
- Data Flow
- Parallel Copying with distcp
- Hadoop Archives
Chapter 4 Hadoop I/O
- Data Integrity
- Compression
- Serialization
- File-Based Data Structures
Chapter 5 Developing a MapReduce Application
- The Configuration API
- Configuring the Development Environment
- Writing a Unit Test
- Running Locally on Test Data
- Running on a Cluster
- Tuning a Job
- MapReduce Workflows
Chapter 6 How MapReduce Works
- Anatomy of a MapReduce Job Run
- Failures
- Job Scheduling
- Shuffle and Sort
- Task Execution
Chapter 7 MapReduce Types and Formats
- MapReduce Types
- Input Formats
- Output Formats
Chapter 8 MapReduce Features
- Counters
- Sorting
- Joins
- Side Data Distribution
- MapReduce Library Classes
Chapter 9 Setting Up a Hadoop Cluster
- Cluster Specification
- Cluster Setup and Installation
- SSH Configuration
- Hadoop Configuration
- Post Install
- Benchmarking a Hadoop Cluster
- Hadoop in the Cloud
Chapter 10 Administering Hadoop
- HDFS
- Monitoring
- Maintenance
Chapter 11 Pig
- Installing and Running Pig
- An Example
- Comparison with Databases
- Pig Latin
- User-Defined Functions
- Data Processing Operators
- Pig in Practice
Chapter 12 HBase
- HBasics
- Concepts
- Installation
- Clients
- Example
- HBase Versus RDBMS
- Praxis
Chapter 13 ZooKeeper
- Installing and Running ZooKeeper
- An Example
- The ZooKeeper Service
- Building Applications with ZooKeeper
- ZooKeeper in Production
Chapter 14 Case Studies
- Hadoop Usage at Last.fm
- Hadoop and Hive at Facebook
- Nutch Search Engine
- Log Processing at Rackspace
- Cascading
- TeraByte Sort on Apache Hadoop
- Appendix Installing Apache Hadoop
- Prerequisites
- Installation
- Configuration
- Appendix Cloudera's Distribution for Hadoop
- Prerequisites
- Standalone Mode
- Pseudo-Distributed Mode
- Fully Distributed Mode
- Hadoop-Related Packages
- Appendix Preparing the NCDC Weather Data
- Colophon