Ollama: The Fastest Way to Run LLMs Locally - Complete 2026 Guide

Install Ollama on any platform, pull models in one command, and serve an OpenAI-compatible REST API - all without sending data to the cloud.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 1, 2026

7 min read

// tags

#ollama#local-llm#self-hosted#privacy#gpu

FIG. ART-27

7 min read

“

Ollama: The Fastest Way to Run LLMs Locally - Complete 2026 Guide

// reading plan

sections

474

words

min read

// Developer Tools

What is SpaceX Is Buying Cursor? A Practical Overview

SpaceX is buying Cursor, the AI-powered code editor. The deal signals a shift in how AI coding tools are valued and deployed. Here's a practical breakdown of what's happening and what it means for developers.

4 min read

// Developer Tools

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Pulling and Running Your First Model

Once installed, pull a model from the Ollama model library:

ollama pull llama3.1:8b
ollama run llama3.1:8b

The run command opens an interactive chat. Type /bye to exit. For a one-shot query:

ollama run llama3.1:8b "Summarize the CAP theorem in two sentences"

REST API at localhost:11434

Ollama exposes a REST API automatically when the daemon is running:

curl http://localhost:11434/api/generate   -d '{"model":"llama3.1:8b","prompt":"What is PagedAttention?","stream":false}'

The API docs cover /api/chat, /api/embeddings, /api/pull, and more.

OpenAI-Compatible Endpoint

Ollama ships a /v1/ endpoint that matches the OpenAI API surface exactly. This means any library or tool that works with OpenAI will work with Ollama by changing base_url:

curl http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"Hello"}]}'

In Python with the openai package:

pip install openai

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "What is entropy?"}],
)
print(response.choices[0].message.content)

Modelfile: Custom System Prompts and Defaults

A Modelfile lets you bake in a system prompt, adjust temperature, or set a stop sequence, then ollama create a named model from it:

cat > Modelfile <<'EOF'
FROM llama3.1:8b
SYSTEM "You are a concise technical writer. Never use bullet points."
PARAMETER temperature 0.3
EOF
ollama create tech-writer -f Modelfile
ollama run tech-writer

Docker Compose Example

For a containerised setup with GPU support:

version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
volumes:
  ollama_data:

Memory Requirements by Model

Model	Quantization	VRAM / RAM
Llama 3.1 8B	Q4_K_M	~5 GB
Llama 3.1 70B	Q4_K_M	~40 GB
Mistral 7B	Q4_K_M	~4.5 GB
Gemma 2 27B	Q4_K_M	~16 GB

CPU-only inference works but is 3 - 10x slower. For interactive use, aim for at least a 16 GB M-series Mac or an NVIDIA GPU with matching VRAM.

Ollama is the fastest local LLM setup available today. Pair it with Open WebUI for a ChatGPT-style interface, or point any OpenAI SDK at localhost:11434/v1 to drop it into an existing app.

Ollama: The Fastest Way to Run LLMs Locally - Complete 2026 Guide

Related Articles

What is SpaceX Is Buying Cursor? A Practical Overview

What Is Ollama and Why It Matters

Installation

Pulling and Running Your First Model

REST API at localhost:11434

OpenAI-Compatible Endpoint

Modelfile: Custom System Prompts and Defaults

Docker Compose Example

Memory Requirements by Model

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

What Is the Text in Claude Code's Extended Thinking Output? A Practical Overview

Ollama: The Fastest Way to Run LLMs Locally - Complete 2026 Guide

Related Articles

What is SpaceX Is Buying Cursor? A Practical Overview

What Is Ollama and Why It Matters

Installation

Pulling and Running Your First Model

REST API at localhost:11434

OpenAI-Compatible Endpoint

Modelfile: Custom System Prompts and Defaults

Docker Compose Example

Memory Requirements by Model

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

What Is the Text in Claude Code's Extended Thinking Output? A Practical Overview

The workspace your team
actually needs