Context Window

Act 1 · ~5 min

Theory

The context window is the maximum number of tokens the model can process in a single forward pass: your input plus the answer it generates.

Small, focused context

Goal, the one relevant document section, audience, format. The model attends to what matters. Answers tend to be specific and grounded.

Huge, unfiltered context

Whole archives pasted in. The model spreads attention thin, may grab the wrong paragraph, and can quietly drop the middle.

A few useful facts beneath the surface:

The window is shared. System prompt + history + your turn + the answer all live in the same budget.
Caching helps cost, not attention. Tools cache repeated prefixes to save money. The model still has to attend to everything in the window.
Long-context evaluations matter. "Needle in a haystack" tests measure whether a model can find a single planted fact inside a long input. Most still slip in the middle.

Two habits make the window work for you. First, put the most important text at the start or the end of the prompt. Second, summarize before you stuff. A clean three-line brief of last week's thread beats pasting the whole inbox.

Foundations

Context Window

Theory