Smart Balancers
A Smart Balancer is a serving endpoint that routes prompts across multiple backends. Use it to specialise by intent, to fail over between primary and backup, or to share a pool of instances across many clients. The balancer is exposed under the same OpenAI base URL as a regular instance.
Three pieces
Concepts
| Concept | Description |
|---|---|
| Routing mode | intent_classifier (an LLM picks the best route) or ordered (deterministic, top-to-bottom). |
| Router model | Small, fast chat instance that the intent classifier uses to score routes. Required for intent_classifier mode. |
| Route | A name + intent + destination + priority. Routes are evaluated in priority order for ordered mode, in parallel for intent_classifier. |
| Destination | instance, target_group, or public_model. Target groups let you share a pool of instances across many balancers. |
| Default route | The fallback when no rule matches. Always set one. The OpenAI client still gets a valid response. |
Calling a balancer
Once enabled, a balancer accepts the same OpenAI payload as a regular instance. The modelfield is the balancer's serving name. Latency is the cost of one extra router-model call plus the chosen backend; budget for it in your SLOs.
1curl https://api.qdiv0.com/v1/chat/completions \
2 -H "Authorization: Bearer $QDIV0_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "model": "support-router",
6 "messages": [{"role": "user", "content": "Where is my May invoice?"}]
7 }'Where to go next
Routing modes →
intent_classifier vs ordered: how to pick the right strategy.
API and create payload →
The HTTP surface for balancers, target groups, and serving endpoints, plus the create payload.
Create a balancer →
Walk through the create wizard end to end.
Firewalls →
Pre-prompt evaluation for the routes that need it.