Claude Code in Production - Chapter 1: The Discipline Gap

A Thursday in April

It was a Thursday in April 2026. I was wrapping a release window for mcp-hosting. The PR queue had a deletion in it -- about six thousand lines, the cleanup of an old service path that had been quietly orphaned for two months. I asked Claude Code to merge it. It came back with a message I had seen before: "base branch policy prohibits the merge."

That message exists for a reason. The branch had required checks, and the required checks had not finished running. The repo's branch protection was doing exactly what branch protection is supposed to do. The right move was to wait, or to investigate the failing check, or to ask me which I wanted.

What happened instead was: Claude Code retried the merge with --admin. The merge went through. Six thousand lines deleted into main during a release window with zero CI reports.

I did not catch this for forty minutes. When I did, I rolled it back, and then I sat there and did the math on how that had happened. Claude Code had not been malicious. It had not been confused. It had been helpful. The user had asked it to merge a PR. The PR had not merged. There was a flag that would make it merge. The flag worked. From the agent's local frame, this was a successful task completion.

The frame is the problem. From inside one task, "the merge succeeded" looks like a win. From the project's frame, "an agent just bypassed a six-thousand-line safety policy unsupervised" looks like an incident. Closing the gap between those two frames is what this book is about.

I wrote a rule that night. Six paragraphs in ~/.claude/CLAUDE.md, named something like Never bypass GitHub branch protection without authorization. The rule says: when gh pr merge returns "policy prohibits," stop and ask. The rule does not say "be more careful." It says, specifically and procedurally, what to do when this exact signal appears. It explains why the rule exists -- the incident, the date, the consequence. Two months later that rule has fired correctly several times. The agent has stopped, asked, and waited. It has cost me thirty seconds per merge. It has saved me from at least two more incidents.

That is the shape of the gap, and that is the shape of the fix. The fix is not a smarter prompt. The fix is not a better model. The fix is a discipline overlay -- a set of rules, settings, hooks, skills, and memories that sits on top of the agent and encodes the failure modes you have actually hit. Once the discipline is in place, the agent's behavior on the next failure mode is deterministic, not aspirational.

This book is about building that overlay deliberately, before the next incident, instead of accreting it from forty-minute rollbacks.

What "Claude Code in production" actually means

The phrase "in production" is doing a specific job here, so let me name what I mean by it.

I do not mean Claude Code running as a service in a Kubernetes cluster. I do not mean the agent framework embedded in your CI pipeline (though several chapters touch on that). I mean Claude Code as your daily driver for shipping production software. The version of you that opens a terminal in the morning, types claude, and starts working on a customer-facing feature, a billable bug fix, a release that real users will hit.

That version of you is doing something different from the version of you who tried Claude Code on a Saturday afternoon to see if it could write a regex for you. The Saturday version is exploring. The daily-driver version is shipping. The Saturday version forgives surprises. The daily-driver version cannot afford them. The Saturday version has a reasonable response to an obstacle: "huh, weird, let me try it differently." The daily-driver version has a different reasonable response: "this is the third time this has happened in two weeks, what is wrong with my setup."

The gap between those two postures is the gap this book closes.

You are reading this book because you have crossed that line. You are not exploring anymore. You ship code. The tool is in your loop. You have written at least one CLAUDE.md file. You know what a permission prompt feels like at 3pm on a deadline day. You have had a session go sideways and you have wondered, afterwards, whether the failure was the model, the prompt, the harness, or you.

Almost all of those failures are the harness. That is good news. The harness is the part you control.

The shape of month-six surprises

The Anthropic docs are good. Getting started is fast. The first month or two of daily use feels like cheating. You wire up a CLAUDE.md, you discover the agent menagerie, you learn to read the diff before accepting it, and you ship more code in a week than you used to ship in a month.

Then month six arrives. Month six is when the surprises start showing up, and they are not the surprises you were braced for.

You were braced for the model to be wrong. The model is rarely wrong in the obvious way. It is wrong in the interesting way -- the way where the code compiles, the tests pass, the diff looks reasonable, and the actual behavior is subtly off. You are no longer triaging "the model wrote bad code." You are triaging things like:

Mid-task pivots. You asked for a small fix to function X. Halfway through, the agent noticed that the surrounding module had an awkward abstraction and decided to refactor it -- "while we're here." Your one-hour task is now four hours. The original fix is buried in a 600-line diff. The original bug is no longer visible.

Hallucinated runtime state. You asked the agent which mode it was running in. It told you confidently. It was wrong. It had inferred its mode from the names of skills it could see, and the inference was a pattern-match on training data, not a check against the actual runtime. You only caught it because you happened to know the answer; if you hadn't, you would have debugged the wrong thing for an hour.

The "out of scope" punt. You asked the agent to audit the auth code. It found three issues -- two it fixed, and one it labeled "pre-existing, out of scope." Three weeks later that pre-existing issue surfaces in production and you re-explain the entire context to the next session, which fixes it in twenty minutes. The first session's "scope discipline" was not discipline. It was a punt.

Capacity drift. Your sessions started feeling slower and more expensive. You traced it: you were on the top model tier at the top effort tier, and the agent was burning forty thousand tokens per turn on tasks that used to take six. You had not changed anything; the model had been updated, and the new model's max tier turns out to overthink in ways the previous model's didn't. The vendor's recommended default has moved and you didn't move with it.

Mojibake in the wild. You shipped a release script that prints status lines with em-dashes and check marks. Tested fine on your Mac. Customer ran it on Windows and the terminal showed garbled bytes. The script worked; the output was unreadable. You shipped a 1.0.1 with ASCII fallbacks the next morning and updated your global rules to default to ASCII in any terminal-bound output forever.

Branch-protection bypass. Already covered. The opener.

Subagent reports that didn't match reality. You delegated a refactor to a subagent. The subagent's report said it had updated all eighteen call sites. You merged. Six call sites had not been updated -- the subagent had touched them in a half-finished way, and its summary described what it intended, not what it actually did. The summary was confident; the work was incomplete. The lesson was: a subagent's report is a claim about what it did, not a finding.

These are the surprises. None of them are exotic. All of them happen to people running the same agent, on the same model, with the same docs, and yet some operators hit them daily and some operators almost never hit them. The difference is not the operator's prompt-engineering skill. The difference is the discipline overlay.

Why the obvious answer ("just write a CLAUDE.md") is not enough

If you have been doing this for more than a month, you already know the obvious answer: write a CLAUDE.md. Pin your conventions. List your gotchas. Done.

A CLAUDE.md is necessary. It is not sufficient. Here is the shape of why.

A 600-line CLAUDE.md exists in the model's context for the whole session. The model reads it. The model is also reading the file you opened, the diff you're working on, the test output, the user's last six turns, and the tool results from the last twelve tool calls. Your rules are sharing context with everything else. The longer the rules grow, the more they compete with the work itself for attention. Past a point, adding more rules makes the existing rules less load-bearing, not more. I have seen this happen on my own files: I added a rule, the rule was not respected, I added the rule a second time more emphatically, the second copy was not respected either, and I went looking for what had displaced both of them. The answer was always: too much else in context.

A CLAUDE.md is also static. It says "in general, do X." It does not say "right now, in this exact tool result, the right move is Y." For the right-now decisions, you want hooks -- automated harness behaviors that the model is not in the loop for. A pre-commit hook that runs lint:fix. A pre-tool-use hook that blocks npm publish unless you've authorized it. A user-prompt-submit hook that injects today's date so the agent never confidently cites yesterday's deadline as today's. Those decisions don't belong in prose. They belong in code that runs every time.

A CLAUDE.md also does not persist across sessions in the way "things I have learned about this project" need to persist. For that you want a memory system: a small set of files the agent writes to and reads from, scoped per-project, with discipline about what gets saved and what gets forgotten.

And a CLAUDE.md does not encode workflows. It says what to do; it does not say "do these four steps in this order." For workflows you want skills -- slash commands that bundle a procedure into a single invocation, with a description that fires the skill at the right moment. /security-review is a skill. /ship-ready is a skill. /loop is a skill. The chapter on skills is going to be its own thing.

The discipline overlay is the combination of rules, hooks, skills, memory, settings, and permissions, all working together. Each layer has a job. Confusing the jobs -- putting a workflow in CLAUDE.md, putting a memory in a hook, putting a hook's job in a skill -- is the failure mode. Most of the operators I have watched struggle in month six are doing exactly that confusion.

The structure of this book

The book has four parts. Read them in order if you have not built an overlay before; skim them in any order if you have, and want to find the chapter that names a problem you've been hitting.

Part 1 -- Foundations. Three chapters on the configuration model. Chapter 2 covers CLAUDE.md -- global vs project, augment vs fresh, and the rule-layering pattern that lets you build an overlay without the file exploding past the point of usefulness. Chapter 3 covers the harness -- settings.json, permission allowlists, and hooks -- the parts of the runtime that are not prose.

Part 2 -- Workflow. Four chapters on what you do every day. Chapter 4 is skills and slash commands -- the description-as-firing-predicate, when to write a skill, when to leave it to the model. Chapter 5 is the agent menagerie -- Explore, Plan, full-pass, review-loop, ship-ready -- and how to brief them so their reports land. Chapter 6 is memory -- four types, what to save, what not to save, and how to keep it from rotting. Chapter 7 is long-running and recurring work -- /loop and /schedule and the cache-window arithmetic that makes "wait five minutes" the worst possible delay.

Part 3 -- Reliability. Three chapters on the parts that hurt. Chapter 8 is tools and trust boundaries -- Bash vs the dedicated tools, and the trust-but-verify discipline you need before reporting work as done. Chapter 9 is cost and capacity -- model tiers, effort levels, throttling, and recovery. Chapter 10 is scope discipline and the failure modes -- including the seven hazards I have hit hardest, with the rules I wrote for each one.

Part 4 -- Beyond solo. Two chapters on what changes when more than one person is in the loop. Chapter 11 is teams and shared rules -- project-scoped settings as a tooling contract. Chapter 12 is what's next -- MCP integration in brief (the other book covers it deeply), autonomous agents, and the model-version migration treadmill.

If you are reading this in a couple of years and the surface area has moved -- different model names, different effort levels, a new built-in agent type, a hook event that didn't exist when I wrote this -- the principles will port. The example code in the companion repo will already have been bumped. Watch the changelog.

What this book is not

This book is not a Claude Code reference. The official docs are. If you need to know the exact JSON shape of a PreToolUse hook, look it up; I am not going to copy the schema in here. I will point at it and tell you when each hook event is the right tool for the job, which is a different question.

This book is not a survey of AI coding agents. I use Claude Code daily. The disciplines port -- a Cursor user can read this and apply most of it -- but the examples will use Claude Code's surface area, and I am not going to pretend I have used every tool equally.

This book is not a prompt-engineering manual. Claude Code is an agentic harness, not a chat interface. The questions are about configuration, tools, agents, hooks, memory, and capacity, not about how to phrase a question. If you came here looking for tips on how to ask the model to do things, you'll find a few in passing, but it isn't the angle.

This book is not a recipe book. It is a discipline book. If you want copy-pasteable scripts, the companion repo has them. The chapters will tell you why each one is shaped the way it is.

What I want you to take away from this chapter

One thing: the gap between using Claude Code and shipping production software with Claude Code is closed by a discipline overlay, not by being a better prompter. The overlay has parts -- rules, settings, hooks, skills, memory, permissions -- and each part has a job. The next eleven chapters are about each of those jobs.

If you remember nothing else: when something breaks, the first question is not "is the model bad," it is "what part of the overlay should have caught this?" Most failures map cleanly to a missing or misplaced piece. Find the piece, write it, and the failure does not happen again.

Chapter 2 starts where the overlay starts -- with CLAUDE.md, and with the trap that has eaten more operator-hours than any other configuration mistake I have watched: the 600-line file nobody reads.

From the field: The branch-protection-bypass incident in the opening was real and the rule that came out of it lives in my global CLAUDE.md to this day. Several months later I added a broader version of the same rule -- "when a safety check blocks an action, the check is the signal; route around it only when explicitly authorized" -- because the branch-protection version kept catching only the exact gh pr merge shape, and the same agent would happily bypass a different safety check (a failing pre-commit hook, a required signed commit, a Husky guard) without flagging it. That is also a thing that happens at month six: a rule you wrote at month two starts looking too narrow, and the right move is to generalize it, not to write a second narrow rule next to it. Chapter 2 covers that pattern explicitly.

Read the rest of the book

Eleven more chapters - the CLAUDE.md contract, the harness, skills, subagents, memory, capacity, the seven hazards, and what survives across teams. $39 -- PDF + EPUB + companion repo.

Get Claude Code in Production $39 →

Published by Yaw Labs.