Temperature
Theory
Temperature rescales the logits before softmax, controlling how peaked or flat the resulting probability distribution becomes. The shape of that distribution determines how predictable each next token is.
T=0.2 — precise
Prompt: "List Python comparison operators."
Output: ==, !=, <, >, <=, >=
Same answer every run. Boring is the point.
T=1.3 — creative
Prompt: "Name a quirky coffee shop."
Run 1: "The Drowsy Otter" Run 2: "Bean There, Done That" Run 3: "Kaffee Klatsch & Co."
Default for most APIs is 1.0default T. Lower for structure (code, JSON), raise for ideation. Note: T=0 reduces variance but doesn't guarantee bit-identical output — batching and GPU kernels still introduce drift.