For any multi-agent design discussion in the RRK, you should touch:
Reliability
- Timeout per agent call (don't let one stuck agent hang the whole system)
- Retry with exponential backoff on transient errors; circuit breaker on persistent
- Max-step cap on agent loops to prevent runaway
Latency
- Parallelize independent agent calls (fan-out/gather)
- Stream supervisor output even while workers are still running where possible
- TTFT optimization: smaller model for the streaming "thinking out loud" layer
Cost
- Model routing per role (Flash for triage, Pro for synthesis)
- Aggressive context pruning between agents — don't pass the whole history downstream
- Cache tool results (semantic cache for retrieval, exact cache for deterministic tools)
- Per-agent token budgets + circuit breakers
Observability — the hardest one for multi-agent
- Distributed tracing across agent boundaries (one trace ID per user request, spans per agent + per tool call)
- Per-agent token + cost attribution (which agent is burning budget?)
- Outcome metrics (did the workflow complete? was the final answer correct?) separately from process metrics (which agents fired, in what order, how long each took)
Evaluation
- End-to-end task success rate is the north star