Items related to Programming Elastic Mapreduce: Using Aws Services to...

Programming Elastic Mapreduce: Using Aws Services to Build an End-To-End Application - Softcover

  • 3.29 out of 5 stars
    7 ratings by Goodreads
 
Image Not Available

Synopsis

Although you don’t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).

Authors Kevin Schmidt and Christopher Phillips demonstrate best practices for using EMR and various AWS and Apache technologies by walking you through the construction of a sample MapReduce log analysis application. Using code samples and example configurations, you’ll learn how to assemble the building blocks necessary to solve your biggest data analysis problems.

  • Get an overview of the AWS and Apache software tools used in large-scale data analysis
  • Go through the process of executing a Job Flow with a simple log analyzer
  • Discover useful MapReduce patterns for filtering and analyzing data sets
  • Use Apache Hive and Pig instead of Java to build a MapReduce Job Flow
  • Learn the basics for using Amazon EMR to run machine learning algorithms
  • Develop a project cost model for using Amazon EMR and other AWS tools

"synopsis" may belong to another edition of this title.

From the Author

Q&A with Chris Phillips, co-author of "Programming Elastic MapReduce"

Q. What makes “Programming Elastic MapReduce” important right now?
A. Big Data and Hadoop are hot technologies now with many companies exploring how they can use the technology to benefit their business and their customers. However, the upfront investment in a large Hadoop cluster and allocating space for racks of servers in the traditional data center can be a great barrier to entry for organizations that want to explore the technology and learn how it can benefit their business. Amazon Elastic MapReduce eliminates this barrier and allows organizations to explore the technology without the upfront costs and only pay for the resources they use.
NetFlix and Airbnb are among the well-known organizations that use Amazon Elastic MapReduce heavily.
 
Q. What do you hope that readers of your book will walk away with?
A. The hope in writing Programming Elastic MapReduce is to show the reader how easy it is to build an application in Amazon EMR and that they can start building their application today without building clusters of servers and finding space and resources to manage a Hadoop cluster. The reader will learn the multitude of language and technology options available to build and Amazon EMR application and can go from a development laptop to a running cloud based cluster in minutes.
 
Q. What's the most exciting and important thing happening in this space currently?
A. Data Science is a rapidly growing field with the fields of business intelligence, statistics, and computer science coming together to help business solve new problems. According to Gartner, the market will require 100,000+ data scientists by 2020. Companies like Kaggle.com now run data science competitions to source some of the best and brightest data scientists to help companies solve their data analysis problems. We are just starting to see businesses leverage this technology in examples like NetFlix's recommendation engine. The power of this technology is only now starting to be realized with tremendous growth in the future. Our book helps developers and programmer interested in this field a way to learn the technology and have a platform to start projects with low upfront costs.
 
Q. Can you provide a few tips on how to get started with Elastic MapReduce?
A. 1. Move your data to AWS: Before you can start processing data with Amazon Elastic MapReduce, you will need to move your data to Amazon S3. s3cmd and AWS Command Line are two easy to use command line utilities that can be used inside AWS or on individual servers in your data center to transfer data to S3 so it can be in a location to be processed by Amazon EMR. For very large data sets, organizations should explore the AWS Import/Export service to send their data to Amazon on physical storage.
 
2. Pick the right problem to solve: When people first learn about Hadoop or Elastic MapReduce, they think of the technology similar to database technology. Elastic MapReduce is more like a batch processing system. Elastic MapReduce can ingest a large amount of data and process it faster and more efficently than a traditional database. However, the way EMR processes this data is similar to a table scan where all of the data is processed and analyzed. EMR can not perform as efficently as a traditional database in retrieving a small number of rows from a large dataset. Additional technologies like Amazon Redshift and HBase can be used with Amazon EMR to get the benefits of both a traditional database and Hadoop.
 
3. Save money using spot instances: Amazon EMR's latest console released in November 2013 allows a user to resize a cluster quickly. A cost effective way of processing data in EMR is to start or increase the size of a running cluster with a number of task nodes that use spot instances. Spot instances let you name the price you are willing to pay for additional capacity and prices are typically far below Amazon's on-demand prices.
 
4. Set up persistent and transient Amazon EMR clusters: An Amazon EMR cluster can be set up to terminate once the cluster completes all the steps in the Job Flow. This type of Amazon EMR cluster is considered a transient cluster since it only lives for the life of the job flow it needs to complete. An Amazon EMR cluster can be set up to continue running and wait for additional steps. There are pros and cons to both of these cluster types and the use of these clusters will depend on your application. However, a few rules of thumb may help in the selection that’s right for you.
 
Transient clusters can be used to save money on Amazon EMR costs. If your data flow is sporadic, it may make sense to queue up a bunch of data in S3 and only start an EMR cluster once a week, day, or hour depending on your need. This allows you to save money on times your cluster sits idle waiting for work to arrive. You can use Amazon Cloudwatch to monitor your cluster to see if your data and workloads would benefit from using transient EMR clusters. Amazon Data Pipeline can help you build workflows that trigger EMR cluster creation when the right conditions exist to process data.
 
A persistent EMR cluster can be the right choice for your organization if the results of your data analysis are time critical or the data flow is consistent enough to necessitate constant data analysis processing. Your application and data processing will have lower processing overhead without the need to regularly build up and tear down EMR clusters.
 
5. Experiment with EMR Cluster node types: Throughout the book, we typically use the smallest and fewest number of instances in an Amazon EMR cluster. This helps reduce the costs associated with learning Amazon EMR. However, your application will need much more than this when running in a production setting with real world demands. Some applications will be more memory intensive, CPU intensive, or even disk read and write intensive. To find out what is right for your application, experiment with different instance types and number of instances with a small subset of your data to learn what size EMR cluster meets your data processing time and AWS cost requirements.

About the Author

Kevin J. Schmidt is a senior manager at Dell SecureWorks, Inc., anindustry leading MSSP, which is part of Dell. He is responsible for the design and development of a major part of the company’s SIEM platform. This includes data acquisition, correlation, and analysis of log data. Prior to SecureWorks, Kevin worked for Reflex Security, where he worked on an IPS engine and anti-virus software. And prior to this, he was a lead developer and architect at GuardedNet, Inc., which built one of the industry’s first SIEM platforms.

He is also a commissioned officer in the United States Navy Reserve (USNR). He has over 19 years of experience in software development and design, 11 of which have been in the network security space. He holds a Bachelor of Science in Computer Science.

Kevin has spent time designing cloud services components at Dell, including virtualized components to run in Dell’s own vCloud. These components are used to protect customers who use Dell’s cloud infrastructure. Additionally, he has been working with Hadoop, machine learning, and other technology in the cloud.

Kevin is co-author of Essential SNMP, second edition (O’Reilly and Associates, ISBN: 978-0-596-00840-6) and also Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3).

Christopher Phillips is a manager and senior software developer at Dell SecureWorks, Inc, an industry leading MSSP, which is part of Dell. He is responsible for the design and development of the company’s Threat Intelligence service platform. He also has responsibility for a team involved in integrating log and event information from many third-party providers that allow customers to have all of their core security information delivered to and analyzed by the Dell SecureWorks systems and security professionals.

Prior to Dell SecureWorks, Chris worked for McKesson and Allscripts, where he worked with clients on HIPAA compliance, security, and healthcare systems integration. He has over 18 years of experience in software development and design. He holds a Bachelor of Science in Computer Science and an MBA.

Chris has spent time designing and developing virtualization and cloud Infrastructure as a Service strategies at Dell to help our security services scale globally Additionally, he has been working with Hadoop, Pig scripting languages, and Amazon Elastic Map Reduce to develop strategies to gain insights and analyze Big Data issues in the cloud.

Chris is co-author of Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3).

"About this title" may belong to another edition of this title.

  • PublisherO'Reilly Media
  • Publication date2013
  • ISBN 10 1449363628
  • ISBN 13 9781449363628
  • BindingPaperback
  • LanguageEnglish
  • Edition number1
  • Number of pages174
  • Rating
    • 3.29 out of 5 stars
      7 ratings by Goodreads

Buy Used

Condition: Very Good
A well-cared-for item that has...
View this item

FREE shipping within U.S.A.

Destination, rates & speeds

Other Popular Editions of the Same Title

Image Not Available

Featured Edition

ISBN 10:  935110432X ISBN 13:  9789351104322
Publisher: Shroff Publishers & Distribu...
Softcover

Search results for Programming Elastic Mapreduce: Using Aws Services to...

Stock Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used paperback

Seller: Jenson Books Inc, Logan, UT, U.S.A.

Seller rating 4 out of 5 stars 4-star rating, Learn more about seller ratings

paperback. Condition: Very Good. A well-cared-for item that has seen limited use but remains in great condition. The item is complete, unmarked, and undamaged, but may show some limited signs of wear. Item works perfectly. Pages are intact and not marred by notes or highlighting. The spine is undamaged. Seller Inventory # 4BQMP30035W6_ns

Contact seller

Buy Used

US$ 7.29
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Schmidt, Kevin,Phillips, Christopher
Published by O'Reilly Media, 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used paperback

Seller: HPB Inc., Dallas, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

paperback. Condition: Very Good. Connecting readers with great books since 1972! Used books may not include companion materials, and may have some shelf wear or limited writing. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_413315000

Contact seller

Buy Used

US$ 13.50
Convert currency
Shipping: US$ 3.75
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2014
ISBN 10: 1449363628 ISBN 13: 9781449363628
New paperback

Seller: Orion Tech, Kingwood, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

paperback. Condition: New. Seller Inventory # 1449363628-11-30275710

Contact seller

Buy New

US$ 29.25
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: New. Seller Inventory # 19351790-n

Contact seller

Buy New

US$ 26.62
Convert currency
Shipping: US$ 2.64
Within U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
Used Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: As New. Unread book in perfect condition. Seller Inventory # 19351790

Contact seller

Buy Used

US$ 28.06
Convert currency
Shipping: US$ 2.64
Within U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Seller Image

Schmidt, Kevin
Published by O'Reilly Media 12/29/2013, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Paperback or Softback

Seller: BargainBookStores, Grand Rapids, MI, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback or Softback. Condition: New. Programming Elastic Mapreduce: Using Aws Services to Build an End-To-End Application 0.67. Book. Seller Inventory # BBS-9781449363628

Contact seller

Buy New

US$ 39.33
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 5 available

Add to basket

Stock Image

Schmidt, Kevin; Phillips, Christopher
Published by O'Reilly Media, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Softcover

Seller: Lucky's Textbooks, Dallas, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: New. Seller Inventory # ABLIING23Mar2411530330439

Contact seller

Buy New

US$ 36.46
Convert currency
Shipping: US$ 3.99
Within U.S.A.
Destination, rates & speeds

Quantity: Over 20 available

Add to basket

Stock Image

Kevin Schmidt
Published by O'Reilly Media, Inc, USA, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Paperback / softback

Seller: THE SAINT BOOKSTORE, Southport, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback / softback. Condition: New. New copy - Usually dispatched within 4 working days. 334. Seller Inventory # B9781449363628

Contact seller

Buy New

US$ 32.50
Convert currency
Shipping: US$ 13.43
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Stock Image

Schmidt, Kevin/ Phillips, Christopher
Published by Oreilly & Associates Inc, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Paperback
Print on Demand

Seller: Revaluation Books, Exeter, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback. Condition: Brand New. 155 pages. 9.25x7.00x0.50 inches. In Stock. This item is printed on demand. Seller Inventory # __1449363628

Contact seller

Buy New

US$ 44.40
Convert currency
Shipping: US$ 13.31
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

Stock Image

Schmidt, Kevin/ Phillips, Christopher
Published by Oreilly & Associates Inc, 2013
ISBN 10: 1449363628 ISBN 13: 9781449363628
New Paperback

Seller: Revaluation Books, Exeter, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback. Condition: Brand New. 155 pages. 9.25x7.00x0.50 inches. In Stock. Seller Inventory # x-1449363628

Contact seller

Buy New

US$ 55.54
Convert currency
Shipping: US$ 13.31
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 2 available

Add to basket

There are 6 more copies of this book

View all search results for this book