Apache Flume: Distributed Log Collection for Hadoop (What You Need to Know)

3 avg rating
( 5 ratings by Goodreads )
 
9781782167921: Apache Flume: Distributed Log Collection for Hadoop (What You Need to Know)
View all copies of this ISBN edition:
 
 

In Detail

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.

Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.

Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.

It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.

By the end, you should be able to construct a series of Flume agents to transport your streaming data and logs from your systems into Hadoop in near real time.

Approach

A starter guide that covers Apache Flume in detail.

Who this book is for

Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.

"synopsis" may belong to another edition of this title.

From the Author:

There is an updated and expanded second edition so please be sure to purchase that one instead. Search for ISBN:  978-1784392178 until I can get this one marked as old. Thanks!

About the Author:

Steve Hoffman

Steve Hoffman has 30 years of software development experience and holds a B.S. in computer engineering from the University of Illinois Urbana-Champaign and a M.S. in computer science from the DePaul University. He is currently a Principal Engineer at Orbitz Worldwide.

More information on Steve can be found at http://bit.ly/bacoboy or on Twitter @bacoboy.

This is Steve's first book.

"About this title" may belong to another edition of this title.

(No Available Copies)

Search Books:



Create a Want

If you know the book but cannot find it on AbeBooks, we can automatically search for it on your behalf as new inventory is added. If it is added to AbeBooks by one of our member booksellers, we will notify you!

Create a Want

Other Popular Editions of the Same Title

9781782167914: Apache Flume: Distributed Log Collection for Hadoop (What You Need to Know)

Featured Edition

ISBN 10:  1782167919 ISBN 13:  9781782167914
Publisher: Packt Publishing, 2013
Softcover

9789351102519: Apache Flume: Distributed Log Collection for Hadoop

Shroff..., 2014
Softcover