Inference runtime

llama.cpp

The C++ inference engine under most of the local stack

macOS · Linux · Windows · iOS · AndroidMIT

Georgi Gerganov's llama.cpp is the lingua franca of local inference — pure C/C++ with CUDA / Metal / OpenCL / Vulkan backends, GGUF model format, and a wire-compatible server mode. Most desktop wrappers (Ollama, LM Studio, KoboldCPP, etc.) ship it under the hood.

Report an issue with llama.cpp

Posts to your status feed

Pick the closest match below, edit the body, and post. Your report carries the #llama-cpp tag automatically so it surfaces here + in the trending-tags rail. Reporting also follows llama.cpp so you’ll get status updates.

Down Very Slow Hallucinating Refusing Prompts Rate-Limited Other

llama.cpp

Report an issue with llama.cpp

Recent coverage

Tag aliases

Tags2

Community tags

Edit history

Community status