Operating Modern Data Systems: Architectural Control, Reliability, and Evolution in Production Distributed Systems (The Long-Lived Systems Series: ... and Judgment for Senior Engineers) - Softcover

Book 2 of 3: The Long-Lived Systems Series: Architecture, Operations, and Judgment for Senior Engineers

Prescott, Jonah

  • 5.00 out of 5 stars
    9 ratings by Goodreads
 
9798241706195: Operating Modern Data Systems: Architectural Control, Reliability, and Evolution in Production Distributed Systems (The Long-Lived Systems Series: ... and Judgment for Senior Engineers)

Synopsis

Operational architecture for long-lived data systems.
Modern data systems rarely fail because of broken code. They fail because architectural intent erodes under time, pressure, and continuous change. Operating Modern Data Systems is a deep, architecture-first examination of what happens after systems leave the whiteboard and enter production. It reframes operations as an architectural discipline—where guarantees are defended or lost, authority is exercised under stress, and reliability is proven over years rather than releases.

This book is not about tools, dashboards, or incident checklists. It focuses instead on the structural forces that shape real production systems: failure as a normal condition, time and ordering ambiguity, load and pressure propagation, migration risk, cost as a signal, security as operational trust, and the human and organizational realities embedded in every system.

Written for experienced practitioners, the book develops architectural judgment rather than prescribing solutions. It examines how systems drift, how meaning degrades silently, and how design decisions are continuously rewritten through operational action.

What This Book Covers
  • Why correct designs still fail after deployment

  • How operational shortcuts quietly become architectural commitments

  • Reliability as preserved meaning—not just uptime

  • Failure as a continuous condition, not an exceptional event

  • Time, ordering, and partial truth in distributed systems

  • Recovery, migration, and change as extended failure modes

  • Load, pressure, backpressure, and containment

  • Observability as the ability to explain behavior

  • Cost, security, and governance as architectural signals

  • Human judgment and organizational structure as part of the system

  • How systems age—and what allows architecture to hold over time

What Makes This Book Different
  • Operations treated as architecture
    Production behavior, failure, and recovery are examined as structural concerns, not operational afterthoughts.

  • Decision- and consequence-focused
    The book emphasizes how choices accumulate, constrain future change, and shape long-term reliability.

  • Tool-agnostic and durable
    Concepts are designed to remain relevant as platforms, frameworks, and AI systems evolve.

  • Reliability redefined
    Availability alone is not success. Reliability is the preservation of meaning, guarantees, and trust under stress.

  • Written for the AI era without hype
    The book situates modern data and AI-driven systems within the same architectural forces, showing where automation amplifies risk and responsibility.

Who This Book Is For

This book is written for:

  • Software engineers operating backend and platform systems

  • Data engineers responsible for production pipelines and storage

  • Senior, staff, and principal engineers shaping system architecture

  • Architects and technical leaders accountable for long-term reliability

  • Practitioners working with distributed data systems and AI platforms

It assumes familiarity with production systems and distributed environments.


Who This Book Is Not For
  • Beginners seeking introductions or tutorials

  • Readers looking for step-by-step guides or tool-specific instructions

  • Those expecting quick fixes, patterns, or checklists

Order now to develop architectural judgment for systems that must endure pressure, change, and time.

"synopsis" may belong to another edition of this title.