LocalMind: a privacy-first interface for local LLMs

LocalMind is a chat interface for Ollama. The premise is that if you are running a local LLM to keep your conversations private, the last thing you want is a frontend that phones home or stores your history in someone else’s cloud.

Everything in LocalMind runs on your machine. The React frontend talks to a local FastAPI proxy which talks to Ollama. History is in SQLite. Nothing leaves localhost.

Why wrap Ollama at all

Ollama has a usable API and even a basic web UI. You could use it directly. I wrapped it for a few reasons.

The Ollama API is designed for one-shot completions. Getting streaming responses into a React chat interface requires some plumbing: SSE or WebSocket from the backend, an async generator on the frontend, and careful state management so tokens append to the right message as they arrive. Doing that plumbing once in a FastAPI proxy is cleaner than rebuilding it in every client.

The proxy also gives you a place to add things Ollama does not have by default: conversation history, usage logging, model aliases, and request middleware.

Streaming tokens to the UI

Ollama supports streaming completions via its stream: true option. The FastAPI proxy reads from the Ollama response stream and forwards chunks as server-sent events:

async def stream_completion(prompt: str, model: str):
    async with httpx.AsyncClient() as client:
        async with client.stream("POST", OLLAMA_URL, json={
            "model": model, "prompt": prompt, "stream": True
        }) as response:
            async for line in response.aiter_lines():
                if line:
                    yield f"data: {line}\n\n"

On the React side, EventSource reads the stream and appends each token to the active message in the conversation state. The key implementation detail is that the message object exists in state before the first token arrives, which avoids a flash where the UI shows nothing while waiting for the first chunk.

Conversation history

Each conversation is a row in SQLite. Messages are stored as a JSON array in a text column, which is simple and good enough for local use. I did not bother with a proper messages table and foreign keys because the access pattern is always “load all messages for conversation X” and SQLite handles that fine with a single query.

The system prompt is prepended to every request. Users can edit it per-conversation, which is useful for setting the model’s persona or constraints.

Model switching

LocalMind queries Ollama’s /api/tags endpoint on load to get the list of available models. Switching models mid-conversation is allowed but the history stays in the UI as context for the user. Whether to send the full history to the new model is a user setting. Some models have short context windows where this would cause problems; others benefit from the full context.

What it is not

LocalMind is not a LangChain integration or a RAG system. It is a thin, clean shell around a single Ollama instance. If you want retrieval, tool use, or multi-model pipelines, LLamaPG or a more capable frontend is the right tool.

The constraint of doing less is what keeps it fast and easy to understand. The whole FastAPI backend is under 300 lines.

Source: github.com/zraisan/localmind