I run a small fleet of automated AI agents. Content pipelines, analytics jobs, scheduled workflows. Recently two of them started failing the same way, and the root cause was something I do not see discussed enough: context-window exhaustion as a production failure mode.

If you only ever build with AI in a chat window, this failure is invisible to you. It only shows up once a model is doing real, repeated work on a schedule, with files and tool outputs piling into its working memory. That is exactly where I found it.

What was happening

The first job force-loaded around 141KB of reference files before it did any actual work. A style guide, a performance log that kept growing, a few content banks. Then the tool outputs landed on top of that, and the agent's working memory overflowed mid-run. It compacted, re-read the same files, overflowed again. Three loops, then it gave up.

The second job loaded a 25KB CSV, rewrote the whole file back at the end, and crashed in the gap between posting and recording that it had posted. That is the dangerous one. It left the system a single rerun away from publishing the same thing twice.

Neither of these is an AI problem in the way people mean when they say "the model got confused." The model was fine. The system around the model was wasteful, and the waste had nowhere to go but the context budget.

The fixes were classic ops, not AI magic

  1. Read slim. The 63KB style guide became a 3KB cheatsheet. The full document still exists, but it is reference-only now. The agent reads the short version to do its job.
  2. Logs that grow must rotate. Anything over roughly 400 lines archives itself, and the jobs read the tail instead of the whole history.
  3. Data files get edited by scripts, not by the model. One cell changes through code. The file never enters the agent's memory at all.
  4. Idempotency first. Post, then immediately record that you posted, with nothing in between, plus a duplicate guard that checks before it acts.

Why this felt familiar

Ten years in payment operations taught me these exact patterns under different names. Batch windows. Replay protection. Reconciliation. A failed batch is one problem shared by thousands of payments; an overflowing agent is one constraint shared by every step it tries to take. The discipline transfers cleanly.

Running AI agents in production turns out to be operations work with a new constraint budget. The context window is just another finite resource you have to respect, the way you respect a settlement cutoff or a memory limit. Treat it as free and the whole thing falls over. It did. Then it stopped.

The model was never the bottleneck. The system around the model was.

If you are running agents on real workflows, the failure mode that surprised you is probably worth writing down. That is the artifact almost nobody has, and it is the one I would want to read.

Putting AI into production against a real problem?

That is the work I do. Book thirty minutes and we will scope the one build worth doing.

Book an AI consultation