Why Qdrant for Production RAG
Most vector databases are fine for prototyping. Qdrant is built for production: it is written in Rust for memory safety and performance, uses HNSW (Hierarchical Navigable Small World) for sub-millisecond approximate nearest neighbor search, and supports hybrid search (sparse + dense vectors) natively. It also exposes a rich filtering API over payload metadata, so you can restrict similarity searches to specific users, documents, or time ranges without post-filtering.
Core Concepts
- Collection: a named set of points (analogous to a table)
- Point: a vector + payload + optional ID
- Payload: arbitrary JSON metadata attached to each point
- Vector: float array (dense) or token-score dict (sparse)
- HNSW index: the default index; configured per collection
Docker Setup
docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
The REST API is available at http://localhost:6333. The dashboard UI is at http://localhost:6333/dashboard.
Python SDK
pip install qdrant-client sentence-transformers
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
client = QdrantClient("localhost", port=6333)
encoder = SentenceTransformer("all-MiniLM-L6-v2")
# Create collection
client.create_collection(
collection_name="docs",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)
# Insert documents
texts = ["PagedAttention manages KV cache as pages.", "HNSW is a graph-based ANN index."]
vectors = encoder.encode(texts).tolist()
client.upsert(
collection_name="docs",
points=[
PointStruct(id=i, vector=v, payload={"text": t, "source": "technical-docs"})
for i, (v, t) in enumerate(zip(vectors, texts))
],
)
# Search with payload filter
query_vector = encoder.encode("how does memory management work in LLMs").tolist()
results = client.search(
collection_name="docs",
query_vector=query_vector,
query_filter={"must": [{"key": "source", "match": {"value": "technical-docs"}}]},
limit=3,
)
for r in results:
print(r.score, r.payload["text"])
Hybrid Search (Dense + Sparse)
Qdrant supports named vectors — include both a dense embedding and a sparse BM25 vector per point:
from qdrant_client.models import NamedVector
client.search_batch(
collection_name="docs",
requests=[
# dense semantic search
NamedVector(name="dense", vector=dense_query),
# sparse keyword search
NamedVector(name="sparse", vector=sparse_query),
],
)
Combine results with Reciprocal Rank Fusion for best-of-both retrieval.
Distance Metrics
| Metric | Use case | |---|---| | Cosine | Text embeddings (default) | | Dot Product | When vectors are pre-normalized | | Euclidean | Image embeddings |
Qdrant Cloud
For production, Qdrant Cloud provides a managed cluster with automatic backups, horizontal scaling, and a free 1 GB tier. Switch by changing the client URL:
client = QdrantClient(
url="https://your-cluster.qdrant.io",
api_key="YOUR_API_KEY",
)
Full documentation at qdrant.tech.