Memorize this layered list as your default cost-optimization stack:
1. Semantic caching (deflect calls entirely)
2. Model routing (cheap model for easy queries)
3. Context reduction
a. System prompt audit (often bloated, often easy wins)
b. Reranking + fewer chunks (top-3 with reranker > top-10 raw)
c. Conversation history summarization
4. Prompt compression (advanced; LLMLingua etc.)
5. Output length control (max_tokens, structured output)
When the customer's pain is cost, your structured answer has five parts:
This is the cost-optimization version of your RRK Universal Framework.