0
Act 1

Foundations

3 / 8

Token Cost & Budgeting

Act 1 · ~4 min

Theory

Providers charge separately for input (your prompt) and output (the reply). Output is typically 2–4× pricier because generation is more compute-intensive than reading.

cost = (input_tokens  / 1_000_000) × input_rate
     + (output_tokens / 1_000_000) × output_rate
ScenarioTokens/callCalls/dayDaily cost @ $3/M in
Short Q&A50010,000$15
Doc summary4,0001,000$12
Code review8,000500$12
Naive
Full system prompt + raw history + entire document attached every call.
Optimized
Trim system prompt, summarize history, retrieve only relevant chunks (RAG), batch when offline.