Machine Learning Systems for Large Language Models: Transformer Architecture, Training Infrastructure, and Inference Engineering
Master the Engineering Behind the AI Revolution Move beyond simple API calls and learn how to engineer the massive systems that power modern AI. We are currently witnessing a paradigm shift in computing. The rise of Large Language Models (LLMs) like GPT-4, Claude, and Llama has transformed artificial intelligence from a niche research field into a global utility. But for the engineers and architects tasked with building these tools, the challenge has shifted. It is no longer just about designing a neural network; it is about designing the system. How do you train a model with a trillion parameters when it doesn't fit on a single GPU? How do you serve millions of users with millisecond latency? How do you ensure safety, compliance, and cost-efficiency at a global scale? Machine Learning Systems for Large Language Models is the definitive guide to answering these questions. Written by Henry Keel, this comprehensive resource bridges the critical gap between academic research papers and the harsh reality of production engineering. It dissects the full lifecycle of an LLM, from the raw mathematics of the Transformer architecture to the distributed infrastructure required to train it, and finally to the complex inference engines that bring it to life. Inside this book, you will discover: Transformer Architecture Fundamentals: Go deep into the mechanics of Self-Attention, Multi-Head Attention, and the emerging dominance of Mixture-of-Experts (MoE) architectures. Training Infrastructure at Scale: Learn the secrets of distributed training, including Data Parallelism, Tensor Parallelism, and Pipeline Parallelism. Understand how to orchestrate thousands of NVIDIA H100s or Google TPUs to work in perfect synchronization. Advanced Inference Engineering: Master the art of latency reduction with techniques like PagedAttention, KV Cache management, Continuous Batching, and Speculative Decoding. Optimization & Quantization: Discover how to shrink massive models using INT8 and 4-bit quantization (QLoRA) without sacrificing intelligence, allowing you to run state-of-the-art models on consumer hardware. The Modern MLOps Stack: Build robust pipelines for Retrieval-Augmented Generation (RAG), Reinforcement Learning from Human Feedback (RLHF), and automated safety guardrails. Future-Proofing Your Career: Explore the frontiers of Multimodal Systems (Vision-Language Models), Multilingual training, and the next generation of autonomous AI agents. Who is this book for? This book is written for Machine Learning Engineers, Data Scientists, System Architects, and Technical Leads who are ready to move beyond "using" AI and start "building" it. Whether you are fine-tuning an open-source model for a specific enterprise task or architecting a ground-up pre-training cluster, this book provides the blueprints, code concepts, and strategic insights you need to succeed. Don't just watch the AI revolution happen. Engineer it. Scroll up and grab your copy today to master the systems that are defining the future of technology.
-
Autore:
-
Anno edizione:2026
-
Editore:
-
Formato:
-
Lingua:Inglese
Formato:
Gli eBook venduti da Feltrinelli.it sono in formato ePub e possono essere protetti da Adobe DRM. In caso di download di un file protetto da DRM si otterrà un file in formato .acs, (Adobe Content Server Message), che dovrà essere aperto tramite Adobe Digital Editions e autorizzato tramite un account Adobe, prima di poter essere letto su pc o trasferito su dispositivi compatibili.
Cloud:
Gli eBook venduti da Feltrinelli.it sono sincronizzati automaticamente su tutti i client di lettura Kobo successivamente all’acquisto. Grazie al Cloud Kobo i progressi di lettura, le note, le evidenziazioni vengono salvati e sincronizzati automaticamente su tutti i dispositivi e le APP di lettura Kobo utilizzati per la lettura.
Clicca qui per sapere come scaricare gli ebook utilizzando un pc con sistema operativo Windows