Compute

QDivZero Compute is the managed inference platform. It defines the runtimes, the pricing model, the OpenAI-compatible contract, the observability surface, and the day-to-day operations on the instances you launch against it. One concept, one place.

What Compute solves

  • Skip provider procurement and contract negotiation — capacity is provisioned on demand.
  • Avoid vendor lock-in — the smart scheduler can move workloads to whichever provider has the best cost / latency at the moment of launch.
  • Stop guessing your inference bill — every active instance bills a fixed hourly rate.
  • Ship faster — the OpenAI contract means your existing tools, SDKs, and evaluation pipelines keep working unchanged.

Capability matrix

CapabilityDetail
Model deploymentHugging Face repos validated against runtime requirements before launch
Multi-provider routingSmart scheduler picks capacity across providers for cost, availability, and latency
OpenAI-compatible contractExposed at /v1 with chat, completions, embeddings, models, and streaming semantics
Capacity pricingPay per active GPU hour. No token meters, no per-request fees.
Scheduled operationsCron-based start/stop rules per instance, evaluated by the scheduler worker
Security controlsRegional pinning, verified runtime images, and pre-prompt firewalls
Resource observabilityCPU, RAM, GPU, VRAM, disk, and temperature samples on running instances
Embeddings workloadsText-only and multimodal (text + image) embedding instances

OpenAI-compatible contract

Every running instance answers at https://api.qdiv0.com/v1 using the standard OpenAI REST contract. Authentication is a Bearer API key; the model id is the instance serving name. Streaming, function calling, JSON mode, and stop sequences are supported when the underlying runtime exposes them.

OperationMethodEndpoint
List modelsGET/v1/models
Chat completionsPOST/v1/chat/completions
Legacy completionsPOST/v1/completions
EmbeddingsPOST/v1/embeddings
curl
1curl https://api.qdiv0.com/v1/chat/completions \
2  -H "Authorization: Bearer $QDIV0_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "<serving-name>",
6    "messages": [{"role": "user", "content": "Hello"}]
7  }'

Pricing model

Compute bills capacity, not tokens. The model below defines how every instance is charged; the Billing page describes the ledger mechanics.

TermDefinition
Hourly rateLocked at launch from the catalog price snapshot. Provider cost changes do not affect active instances.
Capacity tiersecure (guaranteed capacity, higher rate) or community (spot or shared capacity, lower rate, may be reclaimed).
Billable unitOne billable second = one second the instance spent in running state. Billed in EUR cents, no rounding tricks.
Grace windowIf the balance goes negative, new launches are blocked but existing instances keep running until the next top-up.

Observability surface

Compute collects a small, opinionated set of signals from every instance. The retention is short for high-cardinality signals and indefinite for things that should never be lost.

SignalRetentionWhere to find it
Cost usage24h, 7d, 30d, 90dInstance detail → Usage tab
Resource metrics (CPU, RAM, GPU, VRAM, disk, temperature)Last 24hInstance detail → Usage tab, range = 24h
Schedule rule historyIndefiniteInstance detail → Schedule tab
Runtime config snapshotIndefiniteInstance detail → Settings tab
Failure reason + remediation hintIndefiniteInstance detail → Details tab (failed state)

The instances list

The Instances page is the day-to-day surface for the models you have deployed. It lists every instance in the account, with state, serving name, and creation time, and exposes the actions you can take on each one.

The list auto-refreshes

The Instances list polls every 10 seconds. New instances appear as they leave pending, state changes are visible within one poll cycle, and debits from the billing scheduler show up on the Usage tab.
ColumnWhat it shows
nameDisplay name set at launch. Falls back to a short ID when empty.
stateCurrent lifecycle state (running, pending, stopped, failed, non_bootstrap). See the launch flow page for the full state machine.
serving_namePublic model name used by the OpenAI-compatible endpoint.
createdWall-clock time when the instance was created in the account.

Per-instance actions

Start

Resumes a stopped instance. Smart mode re-runs the provider selection.

Stop

Stops the billing meter. The pod may be reclaimed by the provider.

Settings

Edit name, description, serving name, GPU preferences, and smart constraints.

Playground

Open a built-in chat or embeddings playground against the live endpoint.

Delete

Removes the instance and any associated schedule rules. Billed hours are not refunded.

Bulk operations

The checkbox column supports multi-select. Use it to stop or delete many instances at once.

Stop many

Stops every selected instance in a single transaction. Useful before a planned maintenance window.

Delete many

Removes the instances. Prompts a confirmation dialog. Schedule rules are deleted with the instance.

Usage tracking

The Usage tab on each instance tracks two parallel series: cost in EUR and billable seconds of active time. Summary cards at the top of the tab show the month-to-date total, the average daily cost, and the active hours.

24h

Last 24 hours with hourly buckets. Includes CPU/GPU/VRAM/disk resource samples.

7d

Last 7 days with hourly buckets for the first day, daily for the rest.

30d

Last 30 days, daily buckets. Recommended for monthly forecasting.

90d

Last 90 days, daily buckets. Used for long-term trend analysis.

Why only 24h for resource samples?

Resource samples come from the runtime sidecar. We keep the retention short to control platform cost while still being enough to debug throughput issues. Cost and active-hour series have longer retention.

Built-in playground

The Playground button on a running instance opens a sheet with three panels: chat, embeddings, and (for embeddings instances) a multimodal input that takes an image URL.

Chat

Send an OpenAI-shaped request to the running instance. The dropdown lets you pick a firewall to evaluate each prompt.

chat.py
1from openai import OpenAI
2
3client = OpenAI(
4    base_url="https://api.qdiv0.com/v1",
5    api_key="your-api-key",
6)
7
8resp = client.chat.completions.create(
9    model="<serving-name>",
10    messages=[{"role": "user", "content": "Hello"}],
11)
12print(resp.choices[0].message.content)

Embeddings (text)

Single string or list of strings. The same endpoint serves multimodal models with an image_url field.

embeddings.py
1import requests
2
3resp = requests.post(
4    "https://api.qdiv0.com/v1/embeddings",
5    headers={"Authorization": f"Bearer {api_key}"},
6    json={"model": "<serving-name>", "input": "hello world"},
7)
8print(resp.json()["data"][0]["embedding"][:8], "...")

Embeddings (multimodal)

Qwen3-VL-Embedding and other multimodal models accept an image alongside the text.

embeddings-multimodal.py
1import requests
2
3resp = requests.post(
4    "https://api.qdiv0.com/v1/embeddings",
5    headers={"Authorization": f"Bearer {api_key}"},
6    json={
7        "model": "qwen3-vl-2b-demo",
8        "input": [
9            {"type": "text", "text": "diagram description"},
10            {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}},
11        ],
12    },
13)

Schedule rules

The Schedule tab on the instance detail page stores cron-based start and stop rules. Rules are evaluated by the scheduler worker using the rule's own timezone. The last execution (success or error) is shown inline. See Settings for the full cron syntax and supported actions.

Where to go next