Compute

QDivZero Compute is the managed inference platform. It defines the runtimes, the pricing model, the OpenAI-compatible contract, the observability surface, and the day-to-day operations on the instances you launch against it. One concept, one place.

What Compute solves

Skip provider procurement and contract negotiation — capacity is provisioned on demand.
Avoid vendor lock-in — the smart scheduler can move workloads to whichever provider has the best cost / latency at the moment of launch.
Stop guessing your inference bill — every active instance bills a fixed hourly rate.
Ship faster — the OpenAI contract means your existing tools, SDKs, and evaluation pipelines keep working unchanged.

Capability matrix

Capability	Detail
Model deployment	Hugging Face repos validated against runtime requirements before launch
Multi-provider routing	Smart scheduler picks capacity across providers for cost, availability, and latency
OpenAI-compatible contract	Exposed at /v1 with chat, completions, embeddings, models, and streaming semantics
Capacity pricing	Pay per active GPU hour. No token meters, no per-request fees.
Scheduled operations	Cron-based start/stop rules per instance, evaluated by the scheduler worker
Security controls	Regional pinning, verified runtime images, and pre-prompt firewalls
Resource observability	CPU, RAM, GPU, VRAM, disk, and temperature samples on running instances
Embeddings workloads	Text-only and multimodal (text + image) embedding instances

OpenAI-compatible contract

Every running instance answers at https://api.qdiv0.com/v1 using the standard OpenAI REST contract. Authentication is a Bearer API key; the model id is the instance serving name. Streaming, function calling, JSON mode, and stop sequences are supported when the underlying runtime exposes them.

Operation	Method	Endpoint
List models	GET	/v1/models
Chat completions	POST	/v1/chat/completions
Legacy completions	POST	/v1/completions
Embeddings	POST	/v1/embeddings

curl

1curl https://api.qdiv0.com/v1/chat/completions \
2  -H "Authorization: Bearer $QDIV0_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "<serving-name>",
6    "messages": [{"role": "user", "content": "Hello"}]
7  }'

Pricing model

Compute bills capacity, not tokens. The model below defines how every instance is charged; the Billing page describes the ledger mechanics.

Term	Definition
Hourly rate	Locked at launch from the catalog price snapshot. Provider cost changes do not affect active instances.
Capacity tier	secure (guaranteed capacity, higher rate) or community (spot or shared capacity, lower rate, may be reclaimed).
Billable unit	One billable second = one second the instance spent in running state. Billed in EUR cents, no rounding tricks.
Grace window	If the balance goes negative, new launches are blocked but existing instances keep running until the next top-up.

Observability surface

Compute collects a small, opinionated set of signals from every instance. The retention is short for high-cardinality signals and indefinite for things that should never be lost.

Signal	Retention	Where to find it
Cost usage	24h, 7d, 30d, 90d	Instance detail → Usage tab
Resource metrics (CPU, RAM, GPU, VRAM, disk, temperature)	Last 24h	Instance detail → Usage tab, range = 24h
Schedule rule history	Indefinite	Instance detail → Schedule tab
Runtime config snapshot	Indefinite	Instance detail → Settings tab
Failure reason + remediation hint	Indefinite	Instance detail → Details tab (failed state)

The instances list

The Instances page is the day-to-day surface for the models you have deployed. It lists every instance in the account, with state, serving name, and creation time, and exposes the actions you can take on each one.

The list auto-refreshes

The Instances list polls every 10 seconds. New instances appear as they leave pending, state changes are visible within one poll cycle, and debits from the billing scheduler show up on the Usage tab.

Column	What it shows
name	Display name set at launch. Falls back to a short ID when empty.
state	Current lifecycle state (running, pending, stopped, failed, non_bootstrap). See the launch flow page for the full state machine.
serving_name	Public model name used by the OpenAI-compatible endpoint.
created	Wall-clock time when the instance was created in the account.

Per-instance actions

Start

Resumes a stopped instance. Smart mode re-runs the provider selection.

Stop

Stops the billing meter. The pod may be reclaimed by the provider.

Settings

Edit name, description, serving name, GPU preferences, and smart constraints.

Playground

Open a built-in chat or embeddings playground against the live endpoint.

Delete

Removes the instance and any associated schedule rules. Billed hours are not refunded.

Bulk operations

The checkbox column supports multi-select. Use it to stop or delete many instances at once.

Stop many

Stops every selected instance in a single transaction. Useful before a planned maintenance window.

Delete many

Removes the instances. Prompts a confirmation dialog. Schedule rules are deleted with the instance.

Usage tracking

The Usage tab on each instance tracks two parallel series: cost in EUR and billable seconds of active time. Summary cards at the top of the tab show the month-to-date total, the average daily cost, and the active hours.

24h

Last 24 hours with hourly buckets. Includes CPU/GPU/VRAM/disk resource samples.

7d

Last 7 days with hourly buckets for the first day, daily for the rest.

30d

Last 30 days, daily buckets. Recommended for monthly forecasting.

90d

Last 90 days, daily buckets. Used for long-term trend analysis.

Why only 24h for resource samples?

Resource samples come from the runtime sidecar. We keep the retention short to control platform cost while still being enough to debug throughput issues. Cost and active-hour series have longer retention.

Built-in playground

The Playground button on a running instance opens a sheet with three panels: chat, embeddings, and (for embeddings instances) a multimodal input that takes an image URL.

Chat

Send an OpenAI-shaped request to the running instance. The dropdown lets you pick a firewall to evaluate each prompt.

chat.py

1from openai import OpenAI
2
3client = OpenAI(
4    base_url="https://api.qdiv0.com/v1",
5    api_key="your-api-key",
6)
7
8resp = client.chat.completions.create(
9    model="<serving-name>",
10    messages=[{"role": "user", "content": "Hello"}],
11)
12print(resp.choices[0].message.content)

Embeddings (text)

Single string or list of strings. The same endpoint serves multimodal models with an image_url field.

embeddings.py

1import requests
2
3resp = requests.post(
4    "https://api.qdiv0.com/v1/embeddings",
5    headers={"Authorization": f"Bearer {api_key}"},
6    json={"model": "<serving-name>", "input": "hello world"},
7)
8print(resp.json()["data"][0]["embedding"][:8], "...")

Embeddings (multimodal)

Qwen3-VL-Embedding and other multimodal models accept an image alongside the text.

embeddings-multimodal.py

1import requests
2
3resp = requests.post(
4    "https://api.qdiv0.com/v1/embeddings",
5    headers={"Authorization": f"Bearer {api_key}"},
6    json={
7        "model": "qwen3-vl-2b-demo",
8        "input": [
9            {"type": "text", "text": "diagram description"},
10            {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}},
11        ],
12    },
13)

Schedule rules

The Schedule tab on the instance detail page stores cron-based start and stop rules. Rules are evaluated by the scheduler worker using the rule's own timezone. The last execution (success or error) is shown inline. See Settings for the full cron syntax and supported actions.