Title: Hands-On LLM Serving and Optimization: Hosting LLMs at Scale
Publisher: O'Reilly Media
Publication Date: 2026
Language: English
ISBN 13: 9798341621497
Binding: Paperback
Condition: New
Dimensions: 17.78 centimeters width by 5.08 centimeters height by 23.34 centimeters depth
Weight: 594 grams

About this title

Synopsis

Large language models (LLMs) are the reasoning engines of modern AI. Today, a major inflection point has arrived: as the world races to deploy AI at scale, model inference has moved to the center of the stack. Welcome to the inference era.

Without proper optimization, however, LLMs can be expensive and slow to serve. Hands-On LLM Serving and Optimization is a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.

In this hands-on, engineering-focused book, authors Chi Wang and Peiheng Hu combine practical examples, code, and strategies for building robust, performant, and cost-efficient AI token factories. Whether you’re building the LLM inference infrastructure or the applications that consume it, a deep understanding of LLM serving will make you a more effective, future-ready engineer as AI transforms how we work and build.

Learn the foundations of model serving with core concepts, design paradigms, and industry best practices
Understand the common challenges of hosting LLMs at scale
Balance latency and throughput to meet the demands of AI applications and business requirements
Host LLMs cost-effectively with practical, code-backed techniques

About the Authors

Chi Wang is a director of engineering at Salesforce's Einstein AI group, with over 18 years of experience in artificial intelligence and distributed systems. He leads the development of large-scale AI platforms that enable model training, inference, and optimization for hundreds of internal teams and power AI capabilities used by millions of Salesforce customers. At Salesforce, Chi oversees multiple engineering teams focused on model inference and optimization, and data science platforms. His work spans building multi-tenant AI infrastructure, scaling distributed compute systems, and improving the performance and cost-efficiency of large language model workloads in production. Chi is the lead inventor on 12 patents across areas including model serving and optimization, data access control, and large-scale system design. He is also a passionate technical writer, focused on making complex AI systems practical and accessible for engineers.

Peiheng Hu is an accomplished machine learning engineer with over 10 years of industry experience and expertise in building large-scale AI systems. He currently works at NVIDIA, where he focuses on the cutting-edge distributed LLM inference, pushing the boundaries of high-performance inference engines on the latest NVIDIA GPUs. He holds a master of science in computational science and engineering from Harvard University and a bachelor of science in industrial engineering operations research from Georgia Institute of Technology. Previously, Peiheng served as a principal member of technical staff at Salesforce, where he led the development of the company's only unified serving platform, handling thousands of per-tenant models and LLM optimizations for Agentforce that saved millions in AI infrastructure expenses. Prior to that, he was a senior ML engineer at Microsoft Azure, where he architected distributed ML processing solutions for cloud security detection and analytics, handling billions of transactions per hour.

"About this title" may belong to another edition of this title.

Store Description

Visit Seller's Storefront

Seller's business information

RAREWAVES.COM LIMITED
Elsley Court, 20-22 Great Titchfield Street, London, W1W 8BE, United Kingdom

Sale & Shipping Terms

Terms of Sale

Shipping Terms

Please note that we do not offer Priority shipping to any country.

We currently do not ship to the below countries:

Russia
Belarus
Ukraine
Israel

Please do not attempt to place orders with any of these countries as a ship to address - they will be cancelled.

Shipping rates from United Kingdom to U.S.A.

Shipping rates from United Kingdom to U.S.A.
Order quantity	11 to 16 business days	11 to 16 business days
First item	US$ 0.00	US$ 0.00

Delivery times are set by sellers and vary by carrier and location. Orders passing through Customs may face delays and buyers are responsible for any associated duties or fees. Sellers may contact you regarding additional charges to cover any increased costs to ship your items.