Programming Elastic MapReduce: Using AWS Services to Build an End-to-End Application

Schmidt, Kevin,Phillips, Christopher

  • 3.29 out of 5 stars
    7 ratings by Goodreads
ISBN 10: 1449363628 ISBN 13: 9781449363628
Published by O'Reilly Media, 2014
Used paperback

From HPB Inc., Dallas, TX, U.S.A. Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

AbeBooks Seller since September 15, 2017

This specific item is no longer available.

About this Item

Description:

Connecting readers with great books since 1972! Used books may not include companion materials, and may have some shelf wear or limited writing. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_413315000

  • 3.29 out of 5 stars
    7 ratings by Goodreads

Report this item

Synopsis:

Although you don’t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).

Authors Kevin Schmidt and Christopher Phillips demonstrate best practices for using EMR and various AWS and Apache technologies by walking you through the construction...

From the Author: Q&A with Chris Phillips, co-author of "Programming Elastic MapReduce"

Q. What makes “Programming Elastic MapReduce” important right now?
A. Big Data and Hadoop are hot technologies now with many companies exploring how they can use the technology to benefit their business and their customers. However, the upfront investment in a large Hadoop cluster and allocating space for racks of servers in the traditional data center can be a great barrier to entry for organizations that want to explore the technology and learn how it can benefit their business. Amazon Elastic MapReduce eliminates this barrier and allows organizations to explore the technology without the upfront costs and only pay for the resources they use.
NetFlix and Airbnb are among the well-known organizations that use Amazon Elastic MapReduce heavily.
 
Q. What do you hope that readers of your book will walk away with?
A. The hope in writing Programming Elastic MapReduce is to show the reader how easy it is to build an application in Amazon EMR and that they can start building their application today without building clusters of servers and finding space and resources to manage a Hadoop cluster. The reader will learn the multitude of language and technology options available to build and Amazon EMR application and can go from a development laptop to a running cloud based cluster in minutes.
 
Q. What's the most exciting and important thing happening in this space currently?
A. Data Science is a rapidly growing field with the fields of business intelligence, statistics, and computer science coming together to help business solve new problems. According to Gartner, the market will require 100,000+ data scientists by 2020. Companies like Kaggle.com now run data science competitions to source some of the best and brightest data scientists to help companies solve their data analysis problems. We are just starting to see businesses leverage this technology in examples like NetFlix's recommendation engine. The power of this technology is only now starting to be realized with tremendous growth in the future. Our book helps developers and programmer interested in this field a way to learn the technology and have a platform to start projects with low upfront costs.
 
Q. Can you provide a few tips on how to get started with Elastic MapReduce?
A. 1. Move your data to AWS: Before you can start processing data with Amazon Elastic MapReduce, you will need to move your data to Amazon S3. s3cmd and AWS Command Line are two easy to use command line utilities that can be used inside AWS or on individual servers in your data center to transfer data to S3 so it can be in a location to be processed by Amazon EMR. For very large data sets, organizations should explore the AWS Import/Export service to send their data to Amazon on physical storage.
 
2. Pick the right problem to solve: When people first learn about Hadoop or Elastic MapReduce, they think of the technology similar to database technology. Elastic MapReduce is more like a batch processing system. Elastic MapReduce can ingest a large amount of data and process it faster and more efficently than a traditional database. However, the way EMR processes this data is similar to a table scan where all of the data is processed and analyzed. EMR can not perform as efficently as a traditional database in retrieving a small number of rows from a large dataset. Additional technologies like Amazon Redshift and HBase can be used with Amazon EMR to get the benefits of both a traditional database and Hadoop.
 
3. Save money using spot instances: Amazon EMR's latest console released in November 2013 allows a user to resize a cluster quickly. A cost effective way of processing data in EMR is to start or increase the size of a running cluster with a number of task nodes that use spot instances. Spot instances let you name the price you are willing to pay for additional capacity and prices are typically far below Amazon's on-demand prices.
 
4. Set up persistent and transient Amazon EMR clusters: An Amazon EMR cluster can be set up to terminate once the cluster completes all the steps in the Job Flow. This type of Amazon EMR cluster is considered a transient cluster since it only lives for the life of the job flow it needs to complete. An Amazon EMR cluster can be set up to continue running and wait for additional steps. There are pros and cons to both of these cluster types and the use of these clusters will depend on your application. However, a few rules of thumb may help in the selection that’s right for you.
 
Transient clusters can be used to save money on Amazon EMR costs. If your data flow is sporadic, it may make sense to queue up a bunch of data in S3 and only start an EMR cluster once a week, day, or hour depending on your need. This allows you to save money on times your cluster sits idle waiting for work to arrive. You can use Amazon Cloudwatch to monitor your cluster to see if your data and workloads would benefit from using transient EMR clusters. Amazon Data Pipeline can help you build workflows that trigger EMR cluster creation when the right conditions exist to process data.
 
A persistent EMR cluster can be the right choice for your organization if the results of your data analysis are time critical or the data flow is consistent enough to necessitate constant data analysis processing. Your application and data processing will have lower processing overhead without the need to regularly build up and tear down EMR clusters.
 
5. Experiment with EMR Cluster node types: Throughout the book, we typically use the smallest and fewest number of instances in an Amazon EMR cluster. This helps reduce the costs associated with learning Amazon EMR. However, your application will need much more than this when running in a production setting with real world demands. Some applications will be more memory intensive, CPU intensive, or even disk read and write intensive. To find out what is right for your application, experiment with different instance types and number of instances with a small subset of your data to learn what size EMR cluster meets your data processing time and AWS cost requirements.

"About this title" may belong to another edition of this title.

Bibliographic Details

Title: Programming Elastic MapReduce: Using AWS ...
Publisher: O'Reilly Media
Publication Date: 2014
Binding: paperback
Condition: Very Good

Top Search Results from the AbeBooks Marketplace

Stock Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used paperback

Seller: Jenson Books Inc, Logan, UT, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

paperback. Condition: Very Good. A well-cared-for item that has seen limited use but remains in great condition. The item is complete, unmarked, and undamaged, but may show some limited signs of wear. Item works perfectly. Pages are intact and not marred by notes or highlighting. The spine is undamaged. Seller Inventory # 4BQMP30035W6_ns

Contact seller

Buy Used

US$ 7.29
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: New. Seller Inventory # 19351790-n

Contact seller

Buy New

US$ 26.04
Convert currency
Shipping: US$ 2.64
Within U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: As New. Unread book in perfect condition. Seller Inventory # 19351790

Contact seller

Buy Used

US$ 28.02
Convert currency
Shipping: US$ 2.64
Within U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Stock Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
New paperback

Seller: Orion Tech, Kingwood, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

paperback. Condition: New. Seller Inventory # 1449363628-11-30275710

Contact seller

Buy New

US$ 28.69
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Kevin Schmidt
Published by O'Reilly Media, 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
New PAP

Seller: PBShop.store US, Wood Dale, IL, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

PAP. Condition: New. New Book. Shipped from UK. Established seller since 2000. Seller Inventory # WO-9781449363628

Contact seller

Buy New

US$ 30.63
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Softcover

Seller: GreatBookPricesUK, Woodford Green, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: New. Seller Inventory # 19351790-n

Contact seller

Buy New

US$ 32.46
Convert currency
Shipping: US$ 19.96
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Stock Image

Kevin Schmidt
Published by O'Reilly Media, Inc, USA, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Paperback / softback

Seller: THE SAINT BOOKSTORE, Southport, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback / softback. Condition: New. New copy - Usually dispatched within 4 working days. 334. Seller Inventory # B9781449363628

Contact seller

Buy New

US$ 32.47
Convert currency
Shipping: US$ 13.42
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Stock Image

Kevin Schmidt
Published by O'Reilly Media, 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
New PAP

Seller: PBShop.store UK, Fairford, GLOS, United Kingdom

Seller rating 4 out of 5 stars 4-star rating, Learn more about seller ratings

PAP. Condition: New. New Book. Shipped from UK. Established seller since 2000. Seller Inventory # WO-9781449363628

Contact seller

Buy New

US$ 32.85
Convert currency
Shipping: US$ 5.53
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used Softcover

Seller: GreatBookPricesUK, Woodford Green, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: As New. Unread book in perfect condition. Seller Inventory # 19351790

Contact seller

Buy Used

US$ 32.95
Convert currency
Shipping: US$ 19.96
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin|Phillips, Christopher
Published by O'Reilly Media, Inc., 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Softcover

Seller: moluna, Greven, Germany

Seller rating 4 out of 5 stars 4-star rating, Learn more about seller ratings

Condition: New. Although you don t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazo. Seller Inventory # 4186530

Contact seller

Buy New

US$ 34.76
Convert currency
Shipping: US$ 54.83
From Germany to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

There are 6 more copies of this book

View all search results for this book