Tokens, Context, and Cost
LLM pricing is almost always quoted per million tokens, with separate rates for input and output. A token is roughly 0.75 English words or 3-4 characters of code. What looks like a small per-token price multiplies fast in agentic workflows.
The shape of a bill
For a typical coding agent turn:
- Input tokens — system prompt (5k) + tool schemas (3k) + conversation history (20k) + retrieved code (30k) = ~58k.
- Output tokens — a few thousand for an edit plan plus patches.
- Cached input tokens — ideally 90%+ of the stable prefix, billed at 10-50% of full rate.
Input tokens dominate. Output is usually 5-10% of the bill. That surprises people used to chat apps, where output dominates because history is short.
Current rates (per million tokens, April 2026)
- Claude Opus 4.7 — input $15 / output $75 / cached $1.50.
- Claude Sonnet 4.6 — input $3 / output $15 / cached $0.30.
- GPT-5 — input $10 / output $30, automatic cache at $1.
- DeepSeek V3 — input $0.27 / output $1.10. An order of magnitude cheaper; quality trails Sonnet on agentic tasks.
Prices move every quarter. Always re-quote from the provider before a capacity decision.
How to cut costs without losing quality
- Cache the system prompt. See the prompt-caching article. Usually the single biggest win.
- Prune tool schemas. Tools you added in month one that nobody calls still bill on every turn.
- Down-route easy queries. Route classification and summarization to Haiku / DeepSeek; keep Opus for hard reasoning.
- Batch async work. Batch API endpoints offer 50% off for 24-hour-turnaround jobs.
- Stop early. If the agent has answered, end the turn. Many loops continue producing null text that still bills output tokens.
Failure modes
- Unbounded output.
max_tokensnot set; a runaway response burns $5 of Opus on a single turn. - Context reinjection. Accidentally including the entire conversation plus full tool outputs each turn; cost grows quadratically.
- Debug loops. A retry-on-error path retries tool-call errors forever while you're asleep.
When NOT to optimize
Pre-PMF, do not spend a week on prompt caching to save $3 a day. Build the product. Once you have monthly spend greater than rent, start optimizing — the numbers compound fast.