A hands-on guide to automating data and modeling pipelines for faster machine learning applications
Key Features
- Build automated modules for different machine learning components
- Develop in-depth understanding for each component of a machine learning pipeline
- Learn to use different open source AutoML and feature engineering platforms
Book Description
AutoML is designed to automate parts of machine learning. The readily available AutoML tools are easing the tasks of Data Science practitioners and are being well-received in the advanced analytics community. This book covers the necessary foundations needed to create automated machine learning modules, and how you can get up to speed with them in the most practical way possible.
You will learn to automate different tasks in the machine learning pipeline such as data pre-processing, feature selection, model training, model optimization and much more. The book also demonstrates you how to use the already available automation libraries such as auto-sklearn and auto-weka, or create and extend your own custom AutoML components for machine learning.
By the end of this book, you will have a clearer understanding of what the different aspects of automated machine learning are, and incorporate the automation tasks using practical datasets. The learning you get from this book can be leveraged to implement machine learning in your projects or get a step closer to win various machine learning competitions.
What you will learn
- Understand the fundamentals of Automated Machine Learning systems
- Explore auto-sklearn and auto-weka for AutoML tasks
- Automate your pre-processing methods along with feature transformation
- Enhance feature selection and generation using the Python stack
- Join all of the individual components into a complete AutoML framework
- Demystify hyperparameter tuning to use them to optimize your ML models
- Dive into concepts such as neural networks and autoencoders
- Understand the information costs and trade-offs associated with AutoML
Who This Book Is For
This book is ideal for budding data scientists, data analysts and machine learning enthusiasts who are new to the concept of automated machine learning. ML engineers and data professionals who are interested in developing quick machine learning pipelines for their projects will also find this book to be useful. Prior exposure to Python programming is required to get the best out of this book.
Sibanjan Das is a Business Analytics and Data Science consultant. He has extensive experience in IT industry working on ERP systems, implementing predictive analytics solutions in business systems and Internet of Things. An enthusiastic and passionate professional about technology & innovation, he has the passion for wrangling with data from early days of his career. His writings have appeared in various Analytics Magazines and have previously authored a book "Data Science using Oracle Data Miner and Oracle R Enterprise."
Sibanjan holds a Master of IT degree with a major in Business Analytics from Singapore Management University, Singapore and is a Computer Science Engineering graduate from Institute of Technical Education and Research, India. He is a Six Sigma Green Belt from Institute Of Industrial Engineers and also holds several industry certifications such as OCA, OCP, CSCMS, and ITIL V3.
Umit Cakmak is a Data Scientist at IBM, extensively focusing on IBM Data Science Experience and IBM Watson Machine Learning to solve complex business problems. His research spans across many areas from statistical modeling of financial asset prices to using evolutionary algorithms to improve the performance of machine learning models. Before joining to IBM, he worked on various domains such as high-frequency trading, supply chain management and consulting. He likes to learn from others and also share his insights at universities, conferences and local meet-ups.