Token Optimization
**Token optimization** refers to strategies and techniques for reducing the number of tokens consumed when interacting with large language models (LLMs), directly impacting both cost and performance.
Why It Matters
LLM APIs charge per token (both input and output). A single Claude Opus 4 request with 100k context can cost several dollars. Optimizing token usage can reduce costs by 10-100x.
Common Strategies
1. Targeted Retrieval (vs. Context Stuffing)
Instead of loading entire documents into context, use semantic search to retrieve only relevant snippets.
- Tools: QMD, RAG pipelines, vector databases
2. Prompt Compression
Remove unnecessary words, whitespace, and formatting from prompts without losing meaning.
3. Caching
Cache LLM responses for repeated queries. Many providers offer prompt caching (reduced cost for repeated context).
4. Model Selection
Use smaller/cheaper models for routine tasks, expensive models for complex reasoning.
- Routine: Sonnet, Kimi K2, local models
- Complex: Opus, o1-preview
5. Output Length Control
Explicitly request concise responses when brevity suffices.
Resources
Build AI agents with Chipp
Create custom AI agents with knowledge, actions, and integrations—no coding required.
Learn more