Smart Balancers

A Smart Balancer is a serving endpoint that routes prompts across multiple backends. Use it to specialise by intent, to fail over between primary and backup, or to share a pool of instances across many clients. The balancer is exposed under the same OpenAI base URL as a regular instance.

Three pieces

Smart Balancers sit on top of two other primitives: serving endpoints (a name + a single instance or balancer) and target groups (a named pool of instances with weights and priorities). Build those first if you need them.

Concepts

ConceptDescription
Routing modeintent_classifier (an LLM picks the best route) or ordered (deterministic, top-to-bottom).
Router modelSmall, fast chat instance that the intent classifier uses to score routes. Required for intent_classifier mode.
RouteA name + intent + destination + priority. Routes are evaluated in priority order for ordered mode, in parallel for intent_classifier.
Destinationinstance, target_group, or public_model. Target groups let you share a pool of instances across many balancers.
Default routeThe fallback when no rule matches. Always set one. The OpenAI client still gets a valid response.

Calling a balancer

Once enabled, a balancer accepts the same OpenAI payload as a regular instance. The modelfield is the balancer's serving name. Latency is the cost of one extra router-model call plus the chosen backend; budget for it in your SLOs.

curl
1curl https://api.qdiv0.com/v1/chat/completions \
2  -H "Authorization: Bearer $QDIV0_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "support-router",
6    "messages": [{"role": "user", "content": "Where is my May invoice?"}]
7  }'

Where to go next