Public Models

Public models are pre-deployed inference endpoints operated by QDivZero. You call them by name — no launch wizard, no GPU selection, no scheduler — and pay per token consumed. Under the hood, the platform routes the request across the providers that host each model.

Public models vs. Compute instances

A Compute instance is a model you deploy against a Hugging Face repo, billed per active GPU hour, with a serving name you pick. A public model is a model QDivZero deploys, billed per token, with a fixed name from the catalog. Use Compute for private or experimental models; use public models for production traffic against the most common frontier models without managing capacity.

Catalog

The current catalog. The platform adds models as providers expose them; the pricing is in EUR per million tokens (input and output priced separately).

Model	Workload	European	Input / M	Output / M
deepseek-v3.2	chat	No	€0.35	€1.43
deepseek-v3.2-european	chat	Yes	€0.81	€2.18
gpt-oss-120b-european	chat	Yes	€0.34	€1.37
gpt-oss-safeguard-120b-european	chat	Yes	€0.34	€1.37
gpt-oss-20b-european	chat	Yes	€0.09	€0.39
gpt-oss-safeguard-20b-european	chat	Yes	€0.09	€0.39

Variants ending in -european

European variants are pinned to a single EU region and stay inside the EU for data-residency. They are billed at a different rate than the global variant of the same model.

Public Models

Catalog

Where to go next

Calling public models →

Pricing model →

Use in Smart Balancers →

Inference →