Compute
QDivZero Compute is the managed inference platform. It defines the runtimes, the pricing model, the OpenAI-compatible contract, the observability surface, and the day-to-day operations on the instances you launch against it. One concept, one place.
What Compute solves
- Skip provider procurement and contract negotiation — capacity is provisioned on demand.
- Avoid vendor lock-in — the smart scheduler can move workloads to whichever provider has the best cost / latency at the moment of launch.
- Stop guessing your inference bill — every active instance bills a fixed hourly rate.
- Ship faster — the OpenAI contract means your existing tools, SDKs, and evaluation pipelines keep working unchanged.
Capability matrix
| Capability | Detail |
|---|---|
| Model deployment | Hugging Face repos validated against runtime requirements before launch |
| Multi-provider routing | Smart scheduler picks capacity across providers for cost, availability, and latency |
| OpenAI-compatible contract | Exposed at /v1 with chat, completions, embeddings, models, and streaming semantics |
| Capacity pricing | Pay per active GPU hour. No token meters, no per-request fees. |
| Scheduled operations | Cron-based start/stop rules per instance, evaluated by the scheduler worker |
| Security controls | Regional pinning, verified runtime images, and pre-prompt firewalls |
| Resource observability | CPU, RAM, GPU, VRAM, disk, and temperature samples on running instances |
| Embeddings workloads | Text-only and multimodal (text + image) embedding instances |
OpenAI-compatible contract
Every running instance answers at https://api.qdiv0.com/v1 using the standard OpenAI REST contract. Authentication is a Bearer API key; the model id is the instance serving name. Streaming, function calling, JSON mode, and stop sequences are supported when the underlying runtime exposes them.
| Operation | Method | Endpoint |
|---|---|---|
| List models | GET | /v1/models |
| Chat completions | POST | /v1/chat/completions |
| Legacy completions | POST | /v1/completions |
| Embeddings | POST | /v1/embeddings |
1curl https://api.qdiv0.com/v1/chat/completions \
2 -H "Authorization: Bearer $QDIV0_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "model": "<serving-name>",
6 "messages": [{"role": "user", "content": "Hello"}]
7 }'Pricing model
Compute bills capacity, not tokens. The model below defines how every instance is charged; the Billing page describes the ledger mechanics.
| Term | Definition |
|---|---|
| Hourly rate | Locked at launch from the catalog price snapshot. Provider cost changes do not affect active instances. |
| Capacity tier | secure (guaranteed capacity, higher rate) or community (spot or shared capacity, lower rate, may be reclaimed). |
| Billable unit | One billable second = one second the instance spent in running state. Billed in EUR cents, no rounding tricks. |
| Grace window | If the balance goes negative, new launches are blocked but existing instances keep running until the next top-up. |
Observability surface
Compute collects a small, opinionated set of signals from every instance. The retention is short for high-cardinality signals and indefinite for things that should never be lost.
| Signal | Retention | Where to find it |
|---|---|---|
| Cost usage | 24h, 7d, 30d, 90d | Instance detail → Usage tab |
| Resource metrics (CPU, RAM, GPU, VRAM, disk, temperature) | Last 24h | Instance detail → Usage tab, range = 24h |
| Schedule rule history | Indefinite | Instance detail → Schedule tab |
| Runtime config snapshot | Indefinite | Instance detail → Settings tab |
| Failure reason + remediation hint | Indefinite | Instance detail → Details tab (failed state) |
The instances list
The Instances page is the day-to-day surface for the models you have deployed. It lists every instance in the account, with state, serving name, and creation time, and exposes the actions you can take on each one.
The list auto-refreshes
pending, state changes are visible within one poll cycle, and debits from the billing scheduler show up on the Usage tab.| Column | What it shows |
|---|---|
| name | Display name set at launch. Falls back to a short ID when empty. |
| state | Current lifecycle state (running, pending, stopped, failed, non_bootstrap). See the launch flow page for the full state machine. |
| serving_name | Public model name used by the OpenAI-compatible endpoint. |
| created | Wall-clock time when the instance was created in the account. |
Per-instance actions
Start
Resumes a stopped instance. Smart mode re-runs the provider selection.
Stop
Stops the billing meter. The pod may be reclaimed by the provider.
Settings
Edit name, description, serving name, GPU preferences, and smart constraints.
Playground
Open a built-in chat or embeddings playground against the live endpoint.
Delete
Removes the instance and any associated schedule rules. Billed hours are not refunded.
Bulk operations
The checkbox column supports multi-select. Use it to stop or delete many instances at once.
Stop many
Stops every selected instance in a single transaction. Useful before a planned maintenance window.
Delete many
Removes the instances. Prompts a confirmation dialog. Schedule rules are deleted with the instance.
Usage tracking
The Usage tab on each instance tracks two parallel series: cost in EUR and billable seconds of active time. Summary cards at the top of the tab show the month-to-date total, the average daily cost, and the active hours.
24h
Last 24 hours with hourly buckets. Includes CPU/GPU/VRAM/disk resource samples.
7d
Last 7 days with hourly buckets for the first day, daily for the rest.
30d
Last 30 days, daily buckets. Recommended for monthly forecasting.
90d
Last 90 days, daily buckets. Used for long-term trend analysis.
Why only 24h for resource samples?
Built-in playground
The Playground button on a running instance opens a sheet with three panels: chat, embeddings, and (for embeddings instances) a multimodal input that takes an image URL.
Chat
Send an OpenAI-shaped request to the running instance. The dropdown lets you pick a firewall to evaluate each prompt.
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.qdiv0.com/v1",
5 api_key="your-api-key",
6)
7
8resp = client.chat.completions.create(
9 model="<serving-name>",
10 messages=[{"role": "user", "content": "Hello"}],
11)
12print(resp.choices[0].message.content)Embeddings (text)
Single string or list of strings. The same endpoint serves multimodal models with an image_url field.
1import requests
2
3resp = requests.post(
4 "https://api.qdiv0.com/v1/embeddings",
5 headers={"Authorization": f"Bearer {api_key}"},
6 json={"model": "<serving-name>", "input": "hello world"},
7)
8print(resp.json()["data"][0]["embedding"][:8], "...")Embeddings (multimodal)
Qwen3-VL-Embedding and other multimodal models accept an image alongside the text.
1import requests
2
3resp = requests.post(
4 "https://api.qdiv0.com/v1/embeddings",
5 headers={"Authorization": f"Bearer {api_key}"},
6 json={
7 "model": "qwen3-vl-2b-demo",
8 "input": [
9 {"type": "text", "text": "diagram description"},
10 {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}},
11 ],
12 },
13)Schedule rules
The Schedule tab on the instance detail page stores cron-based start and stop rules. Rules are evaluated by the scheduler worker using the rule's own timezone. The last execution (success or error) is shown inline. See Settings for the full cron syntax and supported actions.
Where to go next
Launch a new instance →
The full launch wizard, field by field, with the runtime presets and smart-mode defaults.
Launch flow in detail →
Pre-flight checks, instance states, failure reasons, and recovery patterns.
Edit instance settings →
Rename, retarget GPUs, and adjust smart constraints.
Billing →
How the hourly rate ends up on the ledger and the monthly invoice.