Model routing

Adaptive, cost-aware routing: try a cheap model first and escalate to a stronger one only when the result isn't good enough. You pay for the big model only when you need it.

A model string

The simplest model is a 'provider/model' string. The prefix selects the provider package, imported lazily.

model: 'anthropic/claude-opus-4-8'
model: 'openai/gpt-4o'

Tiered routing

Wrap a list of tiers with tieredModel. Each tier has an accept predicate that decides whether its result is good enough; if not, Redtuma escalates to the next tier. The last tier is always accepted.

routing.ts

import { Agent } from '@redtuma/core/agent'
import { tieredModel } from '@redtuma/core'

const agent = new Agent({
  id: 'assistant',
  instructions: 'Answer the question.',
  model: tieredModel({
    tiers: [
      {
        model: 'anthropic/claude-haiku-4-5',
        accept: (r) => r.text.length > 0 && !r.text.includes('I cannot'),
      },
      { model: 'anthropic/claude-opus-4-8' }, // fallback, always accepted
    ],
    onEscalate: ({ from, to }) => console.log(`escalated ${from} -> ${to}`),
  }),
})

Inspecting the decision

When a tiered policy chose the result, generate reports which tier won and how many attempts it took.

const res = await agent.generate('Summarize this.')
res.routing // { tier: 0, attempts: 1 }  — the cheap tier was enough
            // { tier: 1, attempts: 2 }  — escalated to the strong model

Good accept predicates

Reject empty output or refusals.
Reject when structured output fails to parse / validate.
Gate on a confidence score your tool returns.
Gate on length, format, or required keywords.

Streaming

Because a stream can't be judged before it commits, stream uses the cheapest tier. Adaptive escalation applies to generate.

← Memory RAG →