LocalGPT Assistant โ Docker Model Runner Edition
Disclaimer : Docker Model Runner only works on Apple Workstations with Silicon (M Series) at the time of recording this module.
You're part of the "LLM Enablement" team within a developer tools org. Your squad is exploring new ways to empower AI/ML developers to build locally-hosted applications without depending on cloud APIs. You're assigned to evaluate Docker Model Runner and demonstrate its practical integration into an existing multi-service AI app stack.
Integrate Docker Model Runner as the local LLM backend for an existing ChatGPT-style app (FastAPI + Gradio), replacing external API calls.
This will allow:
Full local inference using open-source LLMs (e.g. SmolLM, TinyLlama, Gemma)
Reproducible deployment using Docker Compose
Offline or air-gapped development
You are given the following repository:
localgpt/ โโโ LICENSE โโโ README.md โโโ app/ โ โโโ main.py # FastAPI backend for prompt handling โ โโโ Dockerfile โ โโโ requirements.txt โโโ ui/ โ โโโ app.py # Gradio frontend โ โโโ Dockerfile โ โโโ requirements.txt โโโ docker-compose.yaml # [To be created by YOU]
Before integrating, understand how Docker Model Runner works:
โ Enable it in Docker Desktop:
Settings > Features in Development > Enable Docker Model Runner
Restart Docker Desktop
โ Try out basic commands:
โ
docker model pull ai/smollm2 docker model run ai/smollm2 "How do you work?"
๐ Observe how it pulls, loads, and responds with no external API involved.
Update app/main.py
to interact with the Model Runner's OpenAI-compatible endpoint:
LLM_URL = "http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions" @app.post("/chat") async def chat(req: Request): data = await req.json() prompt = data.get("prompt") payload = { "model": "ai/smollm2", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] } response = requests.post(LLM_URL, json=payload) return response.json()
โ This allows FastAPI to relay prompts to the local LLM using a standard OpenAI-style API.
Create a docker-compose.yaml
that:
Starts the Docker Model Runner model provider
Boots the FastAPI and UI containers in correct sequence
Automatically injects model metadata to services
version: "3" services: model: provider: type: model options: model: ai/smollm2 fastapi: build: context: ./app ports: - "8000:8000" depends_on: - model ui: build: context: ./ui ports: - "8501:8501" depends_on: - fastapi
๐ Note: No changes are needed in the UI โ it communicates with the FastAPI backend as before.
docker compose up --build
Visit:
http://localhost:8501 โ UI
http://localhost:8000/chat โ API
By the end of this project, youโll:
โ Understand how Docker Model Runner manages and runs LLMs locally
โ Replace hosted LLM APIs with local inference endpoints
โ Learn how to package model providers in Docker Compose
โ Build confidence in open-source model deployment workflows