Context Window

Act 1 · ~4 min

Theory

The context window is the amount of text an LLM can "see" at once. It includes everything: system prompt, conversation history, and the response being generated. If the total exceeds the limit, the earliest content is dropped.

system prompt

history

user message

...

response

Context window (e.g. 128K tokens) — when full, oldest tokens silently fall off.

Storage

Persistent memory across calls. The model remembers yesterday.

Working memory

Fixed-size buffer per request. What's out of the window is invisible.

Larger windows hold more history but cost more per request — and attention scales quadratically. Context budgeting is a practical constraint, not a theoretical one.

Foundations

Context Window

Theory