Skip to content

LM Studio

Desktop-first inference UI; ships an OpenAI-compatible HTTP server behind a "Local Server" toggle.

Surface (current)

Endpoint Status Notes
/v1/models Lists loaded model + downloaded models
/v1/models/{id} ⚠️ Often 404; LM Studio has been improving
/v1/chat/completions Tools, JSON mode (template-dependent)
/v1/chat/completions (stream)
/v1/completions Maintained
/v1/embeddings When an embedding model is loaded
/v1/audio/*
/v1/images/*

Notes

  • GUI required. The server only runs when the LM Studio app is running and the toggle is on. Headless server mode exists but is less mature than llama.cpp's.
  • Model identity. LM Studio uses HF-style ids verbatim (e.g. mistralai/Mistral-7B-Instruct-v0.2). Make sure your client sends the exact id.
  • Probe carefully. LM Studio's server is slower to warm than bare llama-server; the first request after startup can take multiple seconds before it accepts requests.

See also