Why Chroma for Prototyping
Chroma is the fastest way to add a vector store to a Python LLM app. It runs in-memory or on-disk with a single import — no Docker, no server, no external dependencies. When you are ready for production, flip to client-server mode with the same API.
Installation
pip install chromadb
In-Memory Mode
import chromadb
client = chromadb.Client() # ephemeral, in-memory
collection = client.create_collection("docs")
collection.add(
documents=["PagedAttention manages KV cache as pages.", "HNSW is a graph ANN index."],
ids=["doc1", "doc2"],
)
results = collection.query(
query_texts=["how does LLM memory work?"],
n_results=2,
)
print(results["documents"])
Chroma uses all-MiniLM-L6-v2 (via the chromadb default embedding function) automatically — you do not need to manage embeddings yourself.
Persistent Mode
client = chromadb.PersistentClient(path="./chroma_db")
Data persists across process restarts in the specified directory. This is sufficient for single-server production deployments with millions of documents.
Metadata Filtering
collection.add(
documents=["LangGraph is a state machine framework.", "Instructor adds Pydantic to LLMs."],
metadatas=[{"category": "agents"}, {"category": "tooling"}],
ids=["doc3", "doc4"],
)
results = collection.query(
query_texts=["build an agent"],
where={"category": "agents"},
n_results=1,
)
The where filter uses MongoDB-style operators: $eq, $ne, $gt, $in, $and, $or.
Custom Embedding Function
Use any embedding model by wrapping it:
from chromadb import EmbeddingFunction
from sentence_transformers import SentenceTransformer
class LocalEmbedder(EmbeddingFunction):
def __init__(self):
self.model = SentenceTransformer("all-mpnet-base-v2")
def __call__(self, input: list[str]) -> list[list[float]]:
return self.model.encode(input).tolist()
collection = client.create_collection("docs", embedding_function=LocalEmbedder())
LangChain Integration
pip install langchain-chroma
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma(
collection_name="docs",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
Client-Server Mode for Production
chroma run --host 0.0.0.0 --port 8000
Connect from your app:
client = chromadb.HttpClient(host="localhost", port=8000)
Chroma vs Qdrant vs Pinecone
| | Chroma | Qdrant | Pinecone | |---|---|---|---| | Self-host | Yes | Yes | No | | In-process | Yes | No | No | | Hybrid search | No | Yes | Yes | | Production scale | Medium | High | High |
Full documentation at docs.trychroma.com.