LM Studio: Run and Serve Local LLMs With a GUI and API Server

LM Studio gives you a polished desktop GUI for downloading GGUF models, tuning GPU layers, and serving an OpenAI-compatible local API - no command line required.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 10, 2026

6 min read

// tags

#lm-studio#local-llm#gguf#gui#api-server

FIG. ART-28

6 min read

“

LM Studio: Run and Serve Local LLMs With a GUI and API Server

// reading plan

sections

473

words

min read

// Developer Tools

What is SpaceX Is Buying Cursor? A Practical Overview

SpaceX is buying Cursor, the AI-powered code editor. The deal signals a shift in how AI coding tools are valued and deployed. Here's a practical breakdown of what's happening and what it means for developers.

4 min read

// Developer Tools

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Downloading Models

Use the built-in search to browse HuggingFace Hub directly. LM Studio surfaces GGUF-format models from trusted publishers like TheBloke and Bartowski. Select a quantization level - Q4_K_M is a good default - and click Download. Models are stored in ~/LM Studio/.

The supported formats page lists GGUF, MLX (Apple Silicon native), and GPTQ.

GPU Layer Offloading

In the model settings panel, the GPU Layers slider controls how many transformer layers are offloaded to VRAM. More layers = faster inference. Set it to the maximum your VRAM can hold - LM Studio shows a live VRAM meter. A rule of thumb:

Hardware	Q4_K_M Llama 3.1 8B GPU Layers
8 GB VRAM	~22 layers (partial offload)
16 GB VRAM	All 32 layers
Apple M2 24 GB unified	All 32 layers

Chat Playground

The Chat tab gives you a full conversation interface with message history, system prompt editor, and generation parameters (temperature, top-p, repeat penalty). You can save and load presets - useful for comparing prompting strategies across models.

Local API Server

Enable the server from the Local Server tab. LM Studio starts an OpenAI-compatible HTTP server on localhost:1234. Drop-in replacement in any app:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "What is GGUF?"}],
)
print(response.choices[0].message.content)

The model name "local-model" is a placeholder - LM Studio routes all /v1/chat/completions requests to whichever model is loaded.

Hardware Requirements by Model Size

Model Size	Min RAM (Q4_K_M)	Recommended
7B	6 GB	8 GB VRAM
13B	10 GB	16 GB VRAM
34B	24 GB	24 GB VRAM
70B	40 GB	2x24 GB or A100

For most laptops, a 7B or 8B model at Q4_K_M runs at 15 - 40 tokens/sec - fast enough for interactive use. Apple M-series chips with unified memory outperform discrete NVIDIA cards of equivalent memory size because there is no PCIe bandwidth bottleneck.

LM Studio is updated frequently - check the changelog for new features like multi-model serving and MLX acceleration on Apple Silicon.

LM Studio: Run and Serve Local LLMs With a GUI and API Server

Related Articles

What is SpaceX Is Buying Cursor? A Practical Overview

Why LM Studio

Installing LM Studio

Downloading Models

GPU Layer Offloading

Chat Playground

Local API Server

Hardware Requirements by Model Size

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

What Is the Text in Claude Code's Extended Thinking Output? A Practical Overview

LM Studio: Run and Serve Local LLMs With a GUI and API Server

Related Articles

What is SpaceX Is Buying Cursor? A Practical Overview

Why LM Studio

Installing LM Studio

Downloading Models

GPU Layer Offloading

Chat Playground

Local API Server

Hardware Requirements by Model Size

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

What Is the Text in Claude Code's Extended Thinking Output? A Practical Overview

The workspace your team
actually needs