Smart Balancers

A Smart Balancer is a serving endpoint that routes prompts across multiple backends. Use it to specialise by intent, to fail over between primary and backup, or to share a pool of instances across many clients. The balancer is exposed under the same OpenAI base URL as a regular instance.

Three pieces

Smart Balancers sit on top of two other primitives: serving endpoints (a name + a single instance or balancer) and target groups (a named pool of instances with weights and priorities). Build those first if you need them.

Concepts

Concept	Description
Routing mode	intent_classifier (an LLM picks the best route) or ordered (deterministic, top-to-bottom).
Router model	Small, fast chat instance that the intent classifier uses to score routes. Required for intent_classifier mode.
Route	A name + intent + destination + priority. Routes are evaluated in priority order for ordered mode, in parallel for intent_classifier.
Destination	instance, target_group, or public_model. Target groups let you share a pool of instances across many balancers.
Default route	The fallback when no rule matches. Always set one. The OpenAI client still gets a valid response.

Calling a balancer

Once enabled, a balancer accepts the same OpenAI payload as a regular instance. The modelfield is the balancer's serving name. Latency is the cost of one extra router-model call plus the chosen backend; budget for it in your SLOs.

curl

1curl https://api.qdiv0.com/v1/chat/completions \
2  -H "Authorization: Bearer $QDIV0_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "support-router",
6    "messages": [{"role": "user", "content": "Where is my May invoice?"}]
7  }'

Smart Balancers

Concepts

Calling a balancer

Where to go next

Routing modes →

API and create payload →

Create a balancer →

Firewalls →