Implementations¶
Per-server deep-dives. Each page covers what the server actually does relative to the canonical spec — what's there, what's missing, what's extended, and what to watch out for in production.
These pages are opinionated and current as of the date in the page footer. OSS servers move fast; if a deviation here doesn't match what your build does, that's a bug — please open a PR.
Pages¶
- OpenAI (reference) — the moving target.
- llama.cpp —
llama-server,llamacpp-python. - vLLM — full surface, fast.
- Ollama — opinionated subset.
- LM Studio — desktop-first.
- TabbyAPI — Exllama-based.
How we score deviations¶
Each page summarizes the server's behavior against three questions:
- What does
aioc probesay? A snapshot of the latest known run. - Where does it deviate? Spec-vs-actual differences worth
flagging, with the catalog kind (
core/ext/ours) so you can tell defects from "didn't implement". - What does it add? Genuine value-adds outside spec — e.g.
llama.cpp's
cache_prompt, vLLM's batched logprobs.
A high-deviation server isn't automatically worse — sometimes the deviation is a feature. The point of these pages is to describe behavior, not to rank servers.
See also¶
- Compatibility matrix — at-a-glance comparison.