Routing modes

A Smart Balancer runs in one of two modes. The mode is a field on the balancer; it picks the strategy the platform uses to choose a destination for every request.

ModeHow it routes
intent_classifierThe router model reads the prompt and the candidate route descriptions and picks the best match. Best for many specialised instances where the right backend depends on what the user is asking about.
orderedRoutes are tried in priority order. The first healthy instance wins. Best for failover between primary and backup, or for simple weighted routing where the choice does not depend on the prompt content.

Choosing between them

Use intent_classifier when the choice depends on the prompt content: a support router that distinguishes billing questions from technical bugs, for example. The router model reads the prompt and the route intents and picks the best match. Latency is the cost of one extra router-model call plus the chosen backend; budget for it in your SLOs.

Use ordered when the choice does not depend on the prompt: a primary/backup failover between two regions, for example. The first enabled route with a healthy destination wins; if it goes down, the next one is tried. No extra LLM call is involved.

The router model in intent_classifier mode can be a regular Compute instance, a small public model, or a target group of small public models. A small public model (such as a GPT-OSS 20B variant) is usually the right pick for a classifier.