Token Cost & Budgeting

Act 1 · ~4 min

Theory

Providers charge separately for input (your prompt) and output (the reply). Output is typically 2–4× pricier because generation is more compute-intensive than reading.

cost = (input_tokens  / 1_000_000) × input_rate
     + (output_tokens / 1_000_000) × output_rate

Scenario	Tokens/call	Calls/day	Daily cost @ $3/M in
Short Q&A	500	10,000	$15
Doc summary	4,000	1,000	$12
Code review	8,000	500	$12

Naive

Full system prompt + raw history + entire document attached every call.

Optimized

Trim system prompt, summarize history, retrieve only relevant chunks (RAG), batch when offline.

Foundations

Token Cost & Budgeting

Theory