Inference

The inference plane accepts OpenAI-compatible chat completions and routes them to the right backend — a Compute instance you launched, a Public Model operated by QDivZero, or a Smart Balancer endpoint. The same call shape works against all three; you only change the model field. Optional fields let you attach a firewall and enable tools.

OpenAI-compatible

Every example in this section uses the standard OpenAI request shape. The OpenAI Python SDK, the official Node SDK, and any HTTP client that targets /v1/chat/completions work without changes — just point base_url to https://api.qdiv0.com/v1.

Request shape

The minimum request is a model and a messages array. Everything else is optional: streaming, the firewall, and the list of tools. Each optional capability is covered in its own guide below.

request.json

1{
2  "model": "<serving-name | public:model | balancer-name>",
3  "messages": [
4    { "role": "user", "content": "Hello" }
5  ],
6  "stream": false,
7  "firewall": "<optional slug>",
8  "tools": [
9    { "type": "search" },
10    { "type": "flexible_vector_database_search", "vector_database": "support-kb", "query": "...", "top_k": 5 }
11  ]
12}

Choosing the backend

The modelfield picks the backend. The same SDK, the same auth, the same response shape — only the identifier changes.

Backend	Model identifier	Billing
Compute instance	your-serving-name	Per active GPU hour
Public model	public:<catalog-name>	Per input / output token
Smart balancer	balancer-serving-name	Per active GPU hour of the matched route

backends.py

1from openai import OpenAI
2
3client = OpenAI(
4    base_url="https://api.qdiv0.com/v1",
5    api_key="your-api-key",
6)
7
8# Compute instance
9client.chat.completions.create(
10    model="qwen35-demo",
11    messages=[{"role": "user", "content": "Hello"}],
12)
13
14# Public model
15client.chat.completions.create(
16    model="public:deepseek-v3.2-european",
17    messages=[{"role": "user", "content": "Hello"}],
18)
19
20# Smart balancer (uses the balancer's serving name)
21client.chat.completions.create(
22    model="router-prod",
23    messages=[{"role": "user", "content": "Hello"}],
24)

Native tools and extra_body

The platform speaks the OpenAI chat completions shape verbatim, so any client that targets /v1/chat/completions works. With the OpenAI SDK, use the native tools parameter and pass non-standard fields like firewall through extra_body.

Inference

Request shape

Choosing the backend

Where to go next

Attaching a firewall →

Enabling tools →

Streaming →

Error handling →