Run any model without per-token billing.

The Compute engine abstracts GPU selection, provider routing, and runtime configuration behind one API. Pick a model, deploy, and start serving at predictable hourly rates.

Access the platform See pricing

From model to API in three steps.

Pick a model

Choose any Hugging Face model. The control plane validates VRAM requirements and prepares the runtime. No manual GPU configuration.

Compute deploys it

The scheduler selects the optimal capacity across available providers. Runtime, networking, and scaling are provisioned automatically.

Call the endpoint

Your model is exposed as an OpenAI-compatible endpoint. Use the Python SDK or any HTTP client. Same API, any model.

Capacity pricing. No token meters.

Pay a fixed hourly rate for each model. Scale up when you need more capacity. Stop when you don't. Your invoice is the sum of active compute hours.

Example workload

Qwen 3.6 35B A3B MTP

8 h/day · 20 days/month

0,25 €/h

Similar capacity to GPT-5.4-mini for general and agentic tasks.

Estimated monthly40 €/month

Calculated at 8 hours/day, 20 working days/month. Actual costs depend on your usage pattern and can be lower with scheduled start/stop rules.

Example workload

Qwen3-VL-Embedding-2B

8 h/day · 20 days/month

0,10 €/h

Multimodal embedding model. Generate embeddings from text and images at predictable cost.

Estimated monthly16 €/month

Calculated at 8 hours/day, 20 working days/month. Actual costs depend on your usage pattern and can be lower with scheduled start/stop rules.

Example workload

DeepSeek V4 Flash

8 h/day · 20 days/month

1,11 €/h

Similar capacity to Grok 4.3 High for reasoning and instruction-following workloads.

Estimated monthly178 €/month

Calculated at 8 hours/day, 20 working days/month. Actual costs depend on your usage pattern and can be lower with scheduled start/stop rules.

Example workload

Kimi K2.6

8 h/day · 20 days/month

13 €/h

Similar capacity to Opus 4.7 for complex, long-context enterprise workloads.

Estimated monthly2.080 €/month

Calculated at 8 hours/day, 20 working days/month. Actual costs depend on your usage pattern and can be lower with scheduled start/stop rules.

External provider estimates

Token-priced equivalents.

Estimated per user at 15M input + 10M output tokens/month. Output is usually the expensive side.

Claude 3.7 Sonnet

200k context

$195.00

USD / user / month

GPT-5.4 mini

OpenAI fast tier

$56.25

USD / user / month

GPT-5.5

OpenAI standard tier

$375.00

USD / user / month

Opus 4.7

Anthropic reasoning tier

$975.00

USD / user / month

QDivZero stays the same price as you add users; only compute capacity is limited.

Performance comparisons sourced from Artificial Analysis. Prices are indicative and depend on actual resource allocation.

Security? First, please.

Regional controls, verified capacity, and private-by-default operations for production AI workloads. Add Firewall when policy must sit in the request path.

Explore Firewall

Regions
Trusted regions
European deployments for GDPR-sensitive workloads, with the option to disable unverified servers.
Runtime
Verified runtime
QDivZero Agent, mTLS-authenticated traffic, and GPU security module checks before capacity is accepted.
Privacy
Private by default
We do not store conversation logs.

Start when work starts. Stop when it doesn't.

Create cron rules to turn instances on before traffic and off after hours. Lower the invoice, cut idle energy use, and keep operations predictable.

Lower the invoice
Pay only while compute is actually running.
Reduce idle energy use
Avoid overnight and weekend GPUs consuming power without work.
Keep capacity ready
Bring instances up before users, jobs, or office hours begin.

Time-based start and stop rules for this instance.

Enabled

Action

Start

Timezone

UTC

Cron expression

0 9 * * 1-5

Action

Stop

Timezone

UTC

Cron expression

0 19 * * 1-5

What Compute already includes.

Deployment, routing, pricing, security, scheduling, and API compatibility in one runtime.

Model deployment

Pick a model and let Compute validate runtime requirements and prepare capacity for serving.

Multi-provider routing

The scheduler selects capacity across available providers for cost, availability, and latency.

OpenAI-compatible API

Expose deployed models behind the standard OpenAI contract so existing clients keep working.

Scheduled operations

Use cron-based start and stop rules so workloads run when needed and stay off when they do not.

Capacity pricing

Pay for compute hours instead of tokens, with pricing that stays predictable as usage grows.

Security controls

Apply regional controls, verified runtime checks, and private-by-default operations for production workloads.

OpenAI SDK. Your models. Our infrastructure.

Point existing OpenAI-compatible code at QDivZero and run production AI workloads on capacity-priced infrastructure.

View docs

quickstart.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.qdiv0.com/v1",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="your-model",
    messages=[
        {"role": "user", "content": "Hello world"}
    ],
)

Deploy your first model in minutes.

Compute abstracts infrastructure decisions so you can focus on your product. Start with a model, add products as your workload grows.

Access the platform Explore Flexible Vector Database

Run any model without per-token billing.

From model to API in three steps.

Pick a model

Compute deploys it

Call the endpoint

Capacity pricing. No token meters.

Qwen 3.6 35B A3B MTP

Qwen3-VL-Embedding-2B

DeepSeek V4 Flash

Kimi K2.6

Token-priced equivalents.

Security? First, please.

Trusted regions

Verified runtime

Private by default

Start when work starts. Stop when it doesn't.

Lower the invoice

Reduce idle energy use

Keep capacity ready

What Compute already includes.

Model deployment

Multi-provider routing

OpenAI-compatible API

Scheduled operations

Capacity pricing

Security controls

OpenAI SDK. Your models. Our infrastructure.

Deploy your first model in minutes.