Extensions & quirks¶
The long tail of "almost spec" behavior. Things every client eventually trips over even though no single behavior is a hard violation.
Implication tests (planned)¶
Beyond per-endpoint shape checks, there are implication checks the prober treats as a third phase: properties that should hold across endpoints if a server is internally consistent. Examples:
- list → retrieve. If
/v1/modelslistsid, then/v1/models/{id}should return the same id (or 404, but never a different id). - chat ↔ completion logprobs. If
/v1/chat/completionsacceptslogprobs: true, then a basic chat call withlogprobs: trueshould havechoices[0].logprobspopulated. Servers that accept the parameter but never populate the field violate the implication. - streaming finish_reason. A streamed chat must emit a final delta
with
finish_reasonset and then[DONE]— a stream that ends without afinish_reasonchunk leaves clients hanging. - temperature 0 determinism. Two calls with
temperature: 0, seed: 42should return identical outputs. A server that doesn't honorseedwill fail this even when both responses are individually spec-valid.
These don't all run today (some need stable-output models that aren't worth assuming), but the catalog reserves them as a future Phase C.
Quirks worth flagging¶
Streaming gotchas¶
- No
[DONE]sentinel. Real bug; clients hang. - Trailing whitespace deltas.
delta.content: " "after afinish_reasonchunk. Spec doesn't allow content after the final reason, but a few servers do this. - Heartbeat / keepalive frames. Some servers send
: keep-alive\ncomment lines mid-stream. Spec-allowed; but breaks a hand-rolled SSE parser that expects onlydata:frames.
Tool calls¶
- Function vs. tool naming. The legacy
function_callfield is deprecated; servers should emittool_calls. Many OSS servers still emitfunction_callfor backward compatibility — spec-discouraged but not invalid. Catalog tracks this as a WARN. - Tool argument JSON drift.
tool_calls[i].function.argumentsis a string (JSON-as-text). Some servers emit it as a parsed object, which breaks every client that doesJSON.parse(args).
response_format¶
Three values land:
{type: "text"}— the default.{type: "json_object"}— model must produce valid JSON. Most OSS servers approximate by injecting a system prompt; few fail-fast on invalid output the way OpenAI does.{type: "json_schema", json_schema: {...}}— schema-constrained generation. vLLM and llama.cpp implement this via grammar-based sampling. Many shims either ignore the schema or hard-error.
The prober doesn't currently send response_format (cost: needs a
real model loaded that supports JSON mode). It's an excellent
follow-up for a Phase B+ check.
Authentication¶
OpenAI uses Authorization: Bearer <key>. Most OSS servers either
ignore the header or accept any non-empty bearer. A server that also
accepts ?api_key=<key> URL parameters or X-API-Key headers is
adding extension behavior — spec-allowed but worth knowing about for
client fallbacks.
WebSocket: /v1/realtime¶
OpenAI's Realtime API uses a WebSocket bidirectional event stream
rather than HTTP-SSE. aioc speaks WS for any catalog row with
protocol="ws" set; today that's only /v1/realtime but the path
generalizes.
Probe behavior:
- Phase A = "did the upgrade succeed?". Grading mirrors HTTP:
101 Switching Protocols(accepted) → PASS404on the upgrade → FAIL (route absent)401/403→ WARN — "auth required" — endpoint exists, we just don't have a bearer. Pass--openai-api-key(or themodel:action input's siblingopenai-api-key:) to authenticate.- Anything else → FAIL with the upgrade status
- Connection refused → SKIP (same as HTTP liveness short-circuit)
- Phase B sends a
session.updateevent with empty modalities and waits up to 10s for asession.createdevent back. Grading: session.createdreceived → PASS- Connected, other events seen, no
session.created→ WARN - Connected, no events at all in budget → WARN
- The probe sets
openai-beta: realtime=v1on every upgrade — the subprotocol header OpenAI's server gates on. Servers that don't care about it ignore it.
OSS-server status: essentially nobody implements the Realtime API
shape today. Forks that ship a WS event surface (Hume EVI, Sesame
CSM, in-house S2S) almost always use their own event vocabulary;
HT-compat could pin a vendor-neutral version in a future v1.1 if a
reference implementation emerges. For now the catalog probes the
OpenAI shape as kind="ext".
Context compaction (/v1/responses/compact)¶
OpenAI shipped server-side compaction in 2026 as part of the Responses
API; Codex CLI's compact_remote.rs calls it on long sessions to
shed history without losing model latent state. Two surfaces:
- Standalone —
POST /v1/responses/compactwith{model, input: [ResponseItem...], tools, parallel_tool_calls, ...}. Returns{output: [...]}where one item is{type: "compaction", encrypted_content: "<opaque-AES-blob>"}. The client passes the blob back as part of the next/v1/responsesinputarray; the server decrypts and restores latent state. - Inline —
POST /v1/responseswithcontext_management: {type: "compaction", compact_threshold: <int>}. When the rendered token count crosses the threshold the server emits the same compaction output item in-stream — no separate call.
encrypted_content is intentionally opaque (prevents client-side
tampering / prompt injection on summaries; carries internal state
markers OpenAI doesn't expose). Clients MUST NOT inspect or modify
it.
OSS-server status: essentially nobody implements this today. The
catalog probes it as kind="ext"; expect ❌ on every non-OpenAI
endpoint. Implementing it server-side without OpenAI's encryption
key requires either:
- A plaintext fallback (drops the tamper-resistance property but preserves the shape and the "shed-history" semantics — useful enough to be worth a try for forks).
- Or a public-key envelope the fork controls (much heavier).
If/when an OSS implementation emerges, the matrix flips ⚠️ (shape ok, no encryption) or ✅ (full equivalence).
Error envelope¶
The OpenAI error shape:
{
"error": {
"message": "...",
"type": "invalid_request_error",
"param": "messages",
"code": "invalid_value"
}
}
Servers that respond to a 400 with {"detail": "..."} (FastAPI
default) are deviating; the catalog calls this a WARN because
every Python OSS server starts there until someone wires up the
canonical envelope.
Where to add yours¶
Open a PR adding to
docs/spec/extensions.md
with:
- The behavior, in one or two sentences.
- The server(s) that do it.
- Whether the catalog should flag it as
WARNor stay silent.