Firewalls

A firewall evaluates every prompt before it reaches the model. Rules are LLM-based — you write a short classification prompt and the firewall calls a judge instance to score the input. The resulting decision is one of block, allow, or audit.

Why an LLM judge

Rule-based pattern matching is brittle and easy to bypass. A small judge model gives nuanced decisions for prompts the rule author did not anticipate. The judge is a separate instance so you can size it independently and pay only for the firewall traffic.

How it fits together

Three primitives work together. A rule is a classification prompt the judge runs against the user message. A firewall is a named bundle of rules plus a mode (block / allow / audit) and a judge instance. An OpenAI client attaches a firewall by slug; the platform evaluates it before the request reaches the model.

The full configuration shape, the rule catalog, and the HTTP surface are covered in the guides below.

Firewalls

How it fits together

Where to go next

Rules →

Firewall configuration →

API surface →

Create a firewall →