0
Act 1

Foundations

5 / 8

Context Window

Act 1 · ~4 min

Theory

The context window is the amount of text an LLM can "see" at once. It includes everything: system prompt, conversation history, and the response being generated. If the total exceeds the limit, the earliest content is dropped.

system prompt
history
user message
...
response
Context window (e.g. 128K tokens) — when full, oldest tokens silently fall off.
Storage
Persistent memory across calls. The model remembers yesterday.
Working memory
Fixed-size buffer per request. What's out of the window is invisible.

Larger windows hold more history but cost more per request — and attention scales quadratically. Context budgeting is a practical constraint, not a theoretical one.