Token Cost & Budgeting
Theory
Providers charge separately for input (your prompt) and output (the reply). Output is typically 2–4× pricier because generation is more compute-intensive than reading.
cost = (input_tokens / 1_000_000) × input_rate
+ (output_tokens / 1_000_000) × output_rate
| Scenario | Tokens/call | Calls/day | Daily cost @ $3/M in |
|---|---|---|---|
| Short Q&A | 500 | 10,000 | $15 |
| Doc summary | 4,000 | 1,000 | $12 |
| Code review | 8,000 | 500 | $12 |
Naive
Full system prompt + raw history + entire document attached every call.
Optimized
Trim system prompt, summarize history, retrieve only relevant chunks (RAG), batch when offline.