Inference
The inference plane accepts OpenAI-compatible chat completions and routes them to the right backend — a Compute instance you launched, a Public Model operated by QDivZero, or a Smart Balancer endpoint. The same call shape works against all three; you only change the model field. Optional fields let you attach a firewall and enable tools.
OpenAI-compatible
/v1/chat/completions work without changes — just point base_url to https://api.qdiv0.com/v1.Request shape
The minimum request is a model and a messages array. Everything else is optional: streaming, the firewall, and the list of tools. Each optional capability is covered in its own guide below.
1{
2 "model": "<serving-name | public:model | balancer-name>",
3 "messages": [
4 { "role": "user", "content": "Hello" }
5 ],
6 "stream": false,
7 "firewall": "<optional slug>",
8 "tools": [
9 { "type": "search" },
10 { "type": "flexible_vector_database_search", "vector_database": "support-kb", "query": "...", "top_k": 5 }
11 ]
12}Choosing the backend
The modelfield picks the backend. The same SDK, the same auth, the same response shape — only the identifier changes.
| Backend | Model identifier | Billing |
|---|---|---|
| Compute instance | your-serving-name | Per active GPU hour |
| Public model | public:<catalog-name> | Per input / output token |
| Smart balancer | balancer-serving-name | Per active GPU hour of the matched route |
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.qdiv0.com/v1",
5 api_key="your-api-key",
6)
7
8# Compute instance
9client.chat.completions.create(
10 model="qwen35-demo",
11 messages=[{"role": "user", "content": "Hello"}],
12)
13
14# Public model
15client.chat.completions.create(
16 model="public:deepseek-v3.2-european",
17 messages=[{"role": "user", "content": "Hello"}],
18)
19
20# Smart balancer (uses the balancer's serving name)
21client.chat.completions.create(
22 model="router-prod",
23 messages=[{"role": "user", "content": "Hello"}],
24)Native tools and extra_body
/v1/chat/completions works. With the OpenAI SDK, use the native tools parameter and pass non-standard fields like firewall through extra_body.Where to go next
Attaching a firewall →
Pre-prompt policy evaluation with rule catalogs and LLM judges.
Enabling tools →
Web search, browser, and vector database search the model can invoke.
Streaming →
Stream Server-Sent Events and aggregate usage for billing.
Error handling →
HTTP status codes for the chat completions API.