Launch flow

A detailed walkthrough of what happens between clicking Launch on a new instance and the model answering its first request. Use this guide to debug launches that hang, fail, or land in the wrong state.

Pre-flight

Before the launch request is sent, the platform verifies:

  • the model repo exists, is public, and the architecture is supported by the chosen runtime preset
  • the runtime image is allowed by the account policy (e.g. verified, signed)
  • the account balance is above the minimum threshold to bill at least one hour
  • if Smart mode is used, at least one provider matches the constraints (region, max price, capacity tier)

VRAM is estimated, not measured

The launch screen shows an estimate based on the model config and quantization. Real VRAM depends on the actual weights, the runtime batcher, and the chosen context size. If the estimate is close to the GPU limit, prefer the next tier up.

Instance states

StateMeaning
non_bootstrapInitial state. The instance is being registered against the provider.
pendingThe provider accepted the request and is allocating capacity. Typical duration: 30s – 5min.
runningThe model is loaded and the OpenAI endpoint is serving traffic.
stoppedThe user stopped the instance. The provider may still hold the underlying pod until it is deleted.
failedA non-recoverable error happened. The instance is preserved with a `failure_reason` for inspection.

Common failure reasons

ReasonWhat it means
OOM_KILLEDThe runtime exceeded available VRAM. Pick a larger GPU or reduce context size.
PROVIDER_TIMEOUTThe provider did not respond within the allocation window. The scheduler will retry on the next start.
IMAGE_PULL_FAILEDThe runtime image could not be pulled. Usually a transient provider issue; retry.
MODEL_LOAD_ERRORThe model files failed integrity checks. Confirm the repo and quantization are supported.
BILLING_BLOCKEDThe account balance is below the launch threshold. Top up and try again.

Recovery patterns

  1. If the instance is in failed, open it and read failure_reason. The platform also surfaces a one-line remediation hint.
  2. Most provider-side failures are transient. Stop and start the instance again; the scheduler will re-route.
  3. For OOM, delete the instance and re-launch with a larger GPU or a smaller context size.
  4. For billing blocks, open the Top up page and confirm the ledger entry before retrying the launch.
  5. Still stuck? Use Troubleshooting for diagnostic flows or contact support from the instance detail page.