- Broschiertes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Although you don't need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).
Andere Kunden interessierten sich auch für
- Blockchain and AI100,99 €
- Richard BenjaminsA Data-Driven Company20,99 €
- Peter McGowanMastering Cloudforms Automation27,99 €
- Wen-mei W. Hwu (MulticoreWare CTO and professor specializing in coProgramming Massively Parallel Processors91,99 €
- David CalaveraLinux Observability with Bpf48,99 €
- Anudeep JuluruLet's Get IoT-fied!60,99 €
- Andrew AdamatzkyReaction-Diffusion Computers253,99 €
-
-
-
Although you don't need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: O'Reilly Media, Inc, USA
- Seitenzahl: 171
- Erscheinungstermin: 28. Januar 2014
- Englisch
- Abmessung: 233mm x 177mm x 12mm
- Gewicht: 306g
- ISBN-13: 9781449363628
- ISBN-10: 1449363628
- Artikelnr.: 37260211
- Verlag: O'Reilly Media, Inc, USA
- Seitenzahl: 171
- Erscheinungstermin: 28. Januar 2014
- Englisch
- Abmessung: 233mm x 177mm x 12mm
- Gewicht: 306g
- ISBN-13: 9781449363628
- ISBN-10: 1449363628
- Artikelnr.: 37260211
Kevin J. Schmidt is a senior manager at Dell SecureWorks, Inc., anindustry leading MSSP, which is part of Dell. He is responsible for the design and development of a major part of the company's SIEM platform. This includes data acquisition, correlation, and analysis of log data. Prior to SecureWorks, Kevin worked for Reflex Security, where he worked on an IPS engine and anti-virus software. And prior to this, he was a lead developer and architect at GuardedNet, Inc., which built one of the industry's first SIEM platforms. He is also a commissioned officer in the United States Navy Reserve (USNR). He has over 19 years of experience in software development and design, 11 of which have been in the network security space. He holds a Bachelor of Science in Computer Science. Kevin has spent time designing cloud services components at Dell, including virtualized components to run in Dell's own vCloud. These components are used to protect customers who use Dell's cloud infrastructure. Additionally, he has been working with Hadoop, machine learning, and other technology in the cloud. Kevin is co-author of Essential SNMP, second edition (O'Reilly and Associates, ISBN: 978-0-596-00840-6) and also Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3). Christopher Phillips is a manager and senior software developer at Dell SecureWorks, Inc, an industry leading MSSP, which is part of Dell. He is responsible for the design and development of the company's Threat Intelligence service platform. He also has responsibility for a team involved in integrating log and event information from many third-party providers that allow customers to have all of their core security information delivered to and analyzed by the Dell SecureWorks systems and security professionals. Prior to Dell SecureWorks, Chris worked for McKesson and Allscripts, where he worked with clients on HIPAA compliance, security, and healthcare systems integration. He has over 18 years of experience in software development and design. He holds a Bachelor of Science in Computer Science and an MBA. Chris has spent time designing and developing virtualization and cloud Infrastructure as a Service strategies at Dell to help our security services scale globally Additionally, he has been working with Hadoop, Pig scripting languages, and Amazon Elastic Map Reduce to develop strategies to gain insights and analyze Big Data issues in the cloud. Chris is co-author of Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3).
Preface
What Is AWS?
What's in This Book?
Sign Up for AWS
Code Samples in This Book
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter 1: Introduction to Amazon Elastic MapReduce
1.1 Amazon Web Services Used in This Book
1.2 Amazon Elastic MapReduce
1.3 Amazon EMR and the Hadoop Ecosystem
1.4 Amazon Elastic MapReduce Versus Traditional Hadoop Installs
1.5 Application Building Blocks
Chapter 2: Data Collection and Data Analysis with AWS
2.1 Log Analysis Application
2.2 Log Messages as a Data Set for Analytics
2.3 Understanding MapReduce
2.4 Collection Stage
2.5 Simulating Syslog Data
2.6 Developing a MapReduce Application
2.7 Custom JAR MapReduce Job
2.8 Running an Amazon EMR Cluster
2.9 Viewing Our Results
2.10 Debugging a Job Flow
2.11 Our Application and Real-World Uses
Chapter 3: Data Filtering Design Patterns and Scheduling Work
3.1 Extending the Application Example
3.2 Understanding Web Server Logs
3.3 Finding Errors in the Web Logs Using Data Filtering
3.4 Building Summary Counts in Data Sets
3.5 Job Flow Scheduling
3.6 Scheduling with AWS Data Pipeline
3.7 Real-World Uses
Chapter 4: Data Analysis with Hive and Pig in Amazon EMR
4.1 Amazon Job Flow Technologies
4.2 What Is Pig?
4.3 Utilizing Pig in Amazon EMR
4.4 What Is Hive?
4.5 Utilizing Hive in Amazon EMR
4.6 Our Application with Hive and Pig
Chapter 5: Machine Learning Using EMR
5.1 A Quick Tour of Machine Learning
5.2 Python and EMR
5.3 What's Next?
Chapter 6: Planning AWS Projects and Managing Costs
6.1 Developing a Project Cost Model
6.2 Optimizing AWS Resources to Reduce Project Costs
6.3 Amazon Tools for Estimating Your Project Costs
Amazon Web Services Resources and Tools
Amazon AWS Online Resources
Amazon AWS Cost Estimation Tools
AWS Best Practices and Architecture
Amazon EMR Distributions
Cloud Computing, Amazon Web Services, and Their Impacts
AWS Service Delivery Models
Performance
Elasticity and Growth
Security
Uptime and Availability
Installation and Setup
Prerequisites
Installing Hadoop
Building MapReduce Applications
Running MapReduce Applications Locally
Installing Pig
Installing Hive
Index
Colophon
What Is AWS?
What's in This Book?
Sign Up for AWS
Code Samples in This Book
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter 1: Introduction to Amazon Elastic MapReduce
1.1 Amazon Web Services Used in This Book
1.2 Amazon Elastic MapReduce
1.3 Amazon EMR and the Hadoop Ecosystem
1.4 Amazon Elastic MapReduce Versus Traditional Hadoop Installs
1.5 Application Building Blocks
Chapter 2: Data Collection and Data Analysis with AWS
2.1 Log Analysis Application
2.2 Log Messages as a Data Set for Analytics
2.3 Understanding MapReduce
2.4 Collection Stage
2.5 Simulating Syslog Data
2.6 Developing a MapReduce Application
2.7 Custom JAR MapReduce Job
2.8 Running an Amazon EMR Cluster
2.9 Viewing Our Results
2.10 Debugging a Job Flow
2.11 Our Application and Real-World Uses
Chapter 3: Data Filtering Design Patterns and Scheduling Work
3.1 Extending the Application Example
3.2 Understanding Web Server Logs
3.3 Finding Errors in the Web Logs Using Data Filtering
3.4 Building Summary Counts in Data Sets
3.5 Job Flow Scheduling
3.6 Scheduling with AWS Data Pipeline
3.7 Real-World Uses
Chapter 4: Data Analysis with Hive and Pig in Amazon EMR
4.1 Amazon Job Flow Technologies
4.2 What Is Pig?
4.3 Utilizing Pig in Amazon EMR
4.4 What Is Hive?
4.5 Utilizing Hive in Amazon EMR
4.6 Our Application with Hive and Pig
Chapter 5: Machine Learning Using EMR
5.1 A Quick Tour of Machine Learning
5.2 Python and EMR
5.3 What's Next?
Chapter 6: Planning AWS Projects and Managing Costs
6.1 Developing a Project Cost Model
6.2 Optimizing AWS Resources to Reduce Project Costs
6.3 Amazon Tools for Estimating Your Project Costs
Amazon Web Services Resources and Tools
Amazon AWS Online Resources
Amazon AWS Cost Estimation Tools
AWS Best Practices and Architecture
Amazon EMR Distributions
Cloud Computing, Amazon Web Services, and Their Impacts
AWS Service Delivery Models
Performance
Elasticity and Growth
Security
Uptime and Availability
Installation and Setup
Prerequisites
Installing Hadoop
Building MapReduce Applications
Running MapReduce Applications Locally
Installing Pig
Installing Hive
Index
Colophon
Preface
What Is AWS?
What's in This Book?
Sign Up for AWS
Code Samples in This Book
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter 1: Introduction to Amazon Elastic MapReduce
1.1 Amazon Web Services Used in This Book
1.2 Amazon Elastic MapReduce
1.3 Amazon EMR and the Hadoop Ecosystem
1.4 Amazon Elastic MapReduce Versus Traditional Hadoop Installs
1.5 Application Building Blocks
Chapter 2: Data Collection and Data Analysis with AWS
2.1 Log Analysis Application
2.2 Log Messages as a Data Set for Analytics
2.3 Understanding MapReduce
2.4 Collection Stage
2.5 Simulating Syslog Data
2.6 Developing a MapReduce Application
2.7 Custom JAR MapReduce Job
2.8 Running an Amazon EMR Cluster
2.9 Viewing Our Results
2.10 Debugging a Job Flow
2.11 Our Application and Real-World Uses
Chapter 3: Data Filtering Design Patterns and Scheduling Work
3.1 Extending the Application Example
3.2 Understanding Web Server Logs
3.3 Finding Errors in the Web Logs Using Data Filtering
3.4 Building Summary Counts in Data Sets
3.5 Job Flow Scheduling
3.6 Scheduling with AWS Data Pipeline
3.7 Real-World Uses
Chapter 4: Data Analysis with Hive and Pig in Amazon EMR
4.1 Amazon Job Flow Technologies
4.2 What Is Pig?
4.3 Utilizing Pig in Amazon EMR
4.4 What Is Hive?
4.5 Utilizing Hive in Amazon EMR
4.6 Our Application with Hive and Pig
Chapter 5: Machine Learning Using EMR
5.1 A Quick Tour of Machine Learning
5.2 Python and EMR
5.3 What's Next?
Chapter 6: Planning AWS Projects and Managing Costs
6.1 Developing a Project Cost Model
6.2 Optimizing AWS Resources to Reduce Project Costs
6.3 Amazon Tools for Estimating Your Project Costs
Amazon Web Services Resources and Tools
Amazon AWS Online Resources
Amazon AWS Cost Estimation Tools
AWS Best Practices and Architecture
Amazon EMR Distributions
Cloud Computing, Amazon Web Services, and Their Impacts
AWS Service Delivery Models
Performance
Elasticity and Growth
Security
Uptime and Availability
Installation and Setup
Prerequisites
Installing Hadoop
Building MapReduce Applications
Running MapReduce Applications Locally
Installing Pig
Installing Hive
Index
Colophon
What Is AWS?
What's in This Book?
Sign Up for AWS
Code Samples in This Book
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter 1: Introduction to Amazon Elastic MapReduce
1.1 Amazon Web Services Used in This Book
1.2 Amazon Elastic MapReduce
1.3 Amazon EMR and the Hadoop Ecosystem
1.4 Amazon Elastic MapReduce Versus Traditional Hadoop Installs
1.5 Application Building Blocks
Chapter 2: Data Collection and Data Analysis with AWS
2.1 Log Analysis Application
2.2 Log Messages as a Data Set for Analytics
2.3 Understanding MapReduce
2.4 Collection Stage
2.5 Simulating Syslog Data
2.6 Developing a MapReduce Application
2.7 Custom JAR MapReduce Job
2.8 Running an Amazon EMR Cluster
2.9 Viewing Our Results
2.10 Debugging a Job Flow
2.11 Our Application and Real-World Uses
Chapter 3: Data Filtering Design Patterns and Scheduling Work
3.1 Extending the Application Example
3.2 Understanding Web Server Logs
3.3 Finding Errors in the Web Logs Using Data Filtering
3.4 Building Summary Counts in Data Sets
3.5 Job Flow Scheduling
3.6 Scheduling with AWS Data Pipeline
3.7 Real-World Uses
Chapter 4: Data Analysis with Hive and Pig in Amazon EMR
4.1 Amazon Job Flow Technologies
4.2 What Is Pig?
4.3 Utilizing Pig in Amazon EMR
4.4 What Is Hive?
4.5 Utilizing Hive in Amazon EMR
4.6 Our Application with Hive and Pig
Chapter 5: Machine Learning Using EMR
5.1 A Quick Tour of Machine Learning
5.2 Python and EMR
5.3 What's Next?
Chapter 6: Planning AWS Projects and Managing Costs
6.1 Developing a Project Cost Model
6.2 Optimizing AWS Resources to Reduce Project Costs
6.3 Amazon Tools for Estimating Your Project Costs
Amazon Web Services Resources and Tools
Amazon AWS Online Resources
Amazon AWS Cost Estimation Tools
AWS Best Practices and Architecture
Amazon EMR Distributions
Cloud Computing, Amazon Web Services, and Their Impacts
AWS Service Delivery Models
Performance
Elasticity and Growth
Security
Uptime and Availability
Installation and Setup
Prerequisites
Installing Hadoop
Building MapReduce Applications
Running MapReduce Applications Locally
Installing Pig
Installing Hive
Index
Colophon