Implementations¶

Per-server deep-dives. Each page covers what the server actually does relative to the canonical spec — what's there, what's missing, what's extended, and what to watch out for in production.

These pages are opinionated and current as of the date in the page footer. OSS servers move fast; if a deviation here doesn't match what your build does, that's a bug — please open a PR.

Pages¶

OpenAI (reference) — the moving target.
llama.cpp — llama-server, llamacpp-python.
vLLM — full surface, fast.
Ollama — opinionated subset.
LM Studio — desktop-first.
TabbyAPI — Exllama-based.

How we score deviations¶

Each page summarizes the server's behavior against three questions:

What does aioc probe say? A snapshot of the latest known run.
Where does it deviate? Spec-vs-actual differences worth flagging, with the catalog kind (core / ext / ours) so you can tell defects from "didn't implement".
What does it add? Genuine value-adds outside spec — e.g. llama.cpp's cache_prompt, vLLM's batched logprobs.

A high-deviation server isn't automatically worse — sometimes the deviation is a feature. The point of these pages is to describe behavior, not to rank servers.

Implementations¶

Pages¶

How we score deviations¶

See also¶