Memorize this layered list as your default cost-optimization stack:

1. Semantic caching                    (deflect calls entirely)
2. Model routing                       (cheap model for easy queries)
3. Context reduction
   a. System prompt audit              (often bloated, often easy wins)
   b. Reranking + fewer chunks         (top-3 with reranker > top-10 raw)
   c. Conversation history summarization
4. Prompt compression                  (advanced; LLMLingua etc.)
5. Output length control               (max_tokens, structured output)

The pattern to internalize

When the customer's pain is cost, your structured answer has five parts:

  1. Decompose where the cost lives (input/output ratio, tokens per call, query distribution)
  2. Name the levers by canonical category (caching, routing, context reduction, history summarization, output control)
  3. For each lever: impact × risk × test plan
  4. Stage the rollout (week 1, month 1, quarter)
  5. Name what you won't do and why (push back on the wrong fixes)

This is the cost-optimization version of your RRK Universal Framework.