0
Act 2

Understanding

3 / 8

Top-P & Top-K Sampling

Act 2 · ~4 min

Theory

Two standard ways to clip the tail: top-k keeps a fixed shortlist; top-p (nucleus) adapts to each step.

PropertyTop-kTop-p (nucleus)
Candidate countFixed: always K tokensVariable: shape-adaptive
When model is confidentMay keep weak tokensShrinks to strong ones
When model is uncertainMay cut valid optionsExpands to cover the spread
Typical default40–500.9–0.95
Step 1 — Temperature first

Temperature rescales raw logits.

logits / T → softmax → distribution

Lower T sharpens the peak; higher T flattens it.

Step 2 — Top-p second

Nucleus carves the truncated set.

sort → cumsum until ≥ p → renormalize → sample

The reshaped distribution is what top-p sees — order is fixed.

Paris0.82 · cum 0.82
Lyon0.09 · cum 0.91
Europe0.06 · excluded
tail · excluded
Nucleus at p=0.90: 2 tokens qualify; top-k=50 would drag in 48 near-zero tokens.