The New Predictive Analytics: Data Science with H2O and Apache Spark

0 avg rating
( 0 ratings by Goodreads )
9781484212851: The New Predictive Analytics: Data Science with H2O and Apache Spark

Two open source platforms for predictive analytics offer data scientists the power to work with virtually unlimited data: Apache Spark and H2O.

Apache Spark is a general-purpose in-memory cluster computing system with built-in libraries for SQL, machine learning, graph analytics and streaming analytics. First released in 2012, Spark graduated to top-level Apache project status in 2013, and is now included in every major Hadoop distribution. Interest in Apache Spark has exploded and has effectively dethroned the map-reduce paradigm.

H2O is less widely known outside of those data scientists who work on the cutting edge. H2O is an open source project dedicated to machine learning, adopted by more than two thousand users worldwide (including companies such as Cisco, eBay, Nielsen and Paypal,) H2O's rapidly growing user base speaks to the strengths and capabilities of the platform.

Individually, each of these platforms provides data scientists with powerful capabilities; working together, they provide "best-in-breed" tooling across a wide range of analytic use cases and applications. In a combined solution, users can leverage Spark SQL and Spark Streaming for data ingestion together with H2O for the most advanced ensemble modeling and model deployment tools.

Thomas W. Dinsmore, an analytics expert at The Boston Consulting Group reviews each of these platforms in depth from a practical, hands-on perspective. You will learn:

  • How to choose among deployment platforms: freestanding, Hadoop or in the cloud
  • Strengths of each platform: what Spark does well and what H2O does well
  • Details about the most widely used techniques for predictive analytics
  • How to build predictive models with Spark and H2O
  • How to deploy your predictive models into production applications
Throughout this book, Dinsmore offers practical – not theoretical – guidance to the data scientist. With examples of code provided by developers, contributors and lead users, this book provides you with the tools to reproduce the business benefits realized by leaders in predictive analytics – the people who are putting these tools to work today.

What you ll learn

  • Why scalable machine learning is necessary
  • History and background of Apache Spark and H2O
  • How to deploy Spark and H2O for maximum effectiveness
  • Detailed steps through the predictive modeling process in each tool
  • Valuable insight into the most important data science techniques

Who this book is for

This book is for Data Scientists seeking to leverage the most advanced machine learning platform available today.

"synopsis" may belong to another edition of this title.

About the Author:

Author Bio:   

Thomas W. Dinsmore currently serves as a Knowledge Expert in Customer Analytics at The Boston Consulting Group.  Previously, Thomas served as Director of Product Management for Revolution Analytics; as an Analytics Solution Architect for IBM Big Data Solutions; and as a Principal Consultant for SAS Professional Services.

Thomas brings to his current role more than twenty-five years of experience in predictive analytics.   He has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citbank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS and Vodafone.   His international experience includes work for clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, The United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia and Singapore.

Although his roots are in hands-on customer analytics, in the past fifteen years Thomas has expanded the scope of his experience to include analytic software applications and broader solutions including database integration and web applications.   As a project lead, he has worked with DB2, Oracle, Netezza, SQL Server and Teradata.

Thomas is certified in SAS, and has working experience with the leading analytic tools available in the market today, including  SAS. R, SPSS,  and Oracle Data Mining.

"About this title" may belong to another edition of this title.

(No Available Copies)

Search Books:

Create a Want

If you know the book but cannot find it on AbeBooks, we can automatically search for it on your behalf as new inventory is added. If it is added to AbeBooks by one of our member booksellers, we will notify you!

Create a Want