Top-P & Top-K Sampling

Act 2 · ~4 min

Theory

Two standard ways to clip the tail: top-k keeps a fixed shortlist; top-p (nucleus) adapts to each step.

Step 1 — Temperature first

Temperature rescales raw logits.

logits / T → softmax → distribution

Lower T sharpens the peak; higher T flattens it.

Step 2 — Top-p second

Nucleus carves the truncated set.

sort → cumsum until ≥ p → renormalize → sample

The reshaped distribution is what top-p sees — order is fixed.

Paris0.82 · cum 0.82

Lyon0.09 · cum 0.91

Europe0.06 · excluded

…tail · excluded

Nucleus at p=0.90: 2 tokens qualify; top-k=50 would drag in 48 near-zero tokens.