Project Brief: Integrate Local LLMs with Docker Model Runner

๐Ÿ“Œ Project Name: LocalGPT Assistant โ€“ Docker Model Runner Edition

Disclaimer : Docker Model Runner only works on Apple Workstations with Silicon (M Series) at the time of recording this module.

๐Ÿ‘ฅ Team Context

You're part of the "LLM Enablement" team within a developer tools org. Your squad is exploring new ways to empower AI/ML developers to build locally-hosted applications without depending on cloud APIs. You're assigned to evaluate Docker Model Runner and demonstrate its practical integration into an existing multi-service AI app stack.

๐ŸŽฏ Mission Objective

Integrate Docker Model Runner as the local LLM backend for an existing ChatGPT-style app (FastAPI + Gradio), replacing external API calls.

This will allow:

๐Ÿงฑ Provided Project Structure

You are given the following repository:

localgpt/
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ main.py              # FastAPI backend for prompt handling
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ ui/
โ”‚   โ”œโ”€โ”€ app.py               # Gradio frontend
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ””โ”€โ”€ requirements.txt
โ””โ”€โ”€ docker-compose.yaml      # [To be created by YOU]

๐Ÿงฉ Your Deliverables

๐Ÿ” Phase 1: Learn and Explore Docker Model Runner

Before integrating, understand how Docker Model Runner works:

  1. โœ… Enable it in Docker Desktop:

  2. โœ… Try out basic commands:

โ €

docker model pull ai/smollm2
docker model run ai/smollm2 "How do you work?"

๐Ÿ‘‰ Observe how it pulls, loads, and responds with no external API involved.

๐Ÿ› ๏ธ Phase 2: Modify FastAPI to Use Local Model

Update app/main.py to interact with the Model Runner's OpenAI-compatible endpoint:

LLM_URL = "http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions"

@app.post("/chat")
async def chat(req: Request):
    data = await req.json()
    prompt = data.get("prompt")
    payload = {
        "model": "ai/smollm2",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    }
    response = requests.post(LLM_URL, json=payload)
    return response.json()

โœ… This allows FastAPI to relay prompts to the local LLM using a standard OpenAI-style API.

โš™๏ธ Phase 3: Write Docker Compose Spec

Create a docker-compose.yaml that:

version: "3"

services:
  model:
    provider:
      type: model
      options:
        model: ai/smollm2

  fastapi:
    build:
      context: ./app
    ports:
      - "8000:8000"
    depends_on:
      - model

  ui:
    build:
      context: ./ui
    ports:
      - "8501:8501"
    depends_on:
      - fastapi

๐Ÿ“Œ Note: No changes are needed in the UI โ€” it communicates with the FastAPI backend as before.

๐Ÿงช How to Run the Stack

docker compose up --build

Visit:

๐Ÿง  Learning Goals

By the end of this project, youโ€™ll:

โœ… Understand how Docker Model Runner manages and runs LLMs locally

โœ… Replace hosted LLM APIs with local inference endpoints

โœ… Learn how to package model providers in Docker Compose

โœ… Build confidence in open-source model deployment workflows