A vector database stores numerical representations of data (called embeddings) and enables a specific operation: find the items most similar to this query. You give it a question, a document, or an image. It gives you back the most semantically similar items in its store. This operation is called approximate nearest neighbor search, and it is the core operation that powers semantic search, RAG systems, recommendation engines, and duplicate detection.
Regular SQL databases cannot do this efficiently. A SQL index is optimized for exact matches and range queries. It cannot answer "what items are conceptually close to this query?" without scanning every row, which becomes impractical at millions or billions of embeddings.
Why Regular Databases Cannot Do This
SQL databases store data in structured tables and use B-tree or hash indexes to find rows matching exact conditions: WHERE user_id = 42 or WHERE created_at > '2026-01-01'. These indexes work by sorting values and doing binary search, which is efficient for exact and range queries.
Similarity search is fundamentally different. An embedding is a vector in a high-dimensional space (typically 384 to 3,072 dimensions). Similarity means geometric closeness: two vectors are similar if they point in roughly the same direction, measured by cosine similarity or Euclidean distance.
Finding the 10 most similar vectors in a collection of 1 million 1,536-dimensional embeddings requires comparing the query vector against all 1 million stored vectors and returning the top 10 by similarity score. A naive implementation does this exactly, but it takes seconds. Vector databases use approximate nearest neighbor (ANN) algorithms (specifically HNSW, IVF, and similar structures) to return the top-10 results in milliseconds at the cost of occasionally missing the absolute closest match.
Core Operations
The operations a vector database provides:
Insert: store a document or chunk alongside its embedding vector.
Search: given a query vector, return the top-k most similar stored vectors and their associated documents. This is the primary operation.
Filter: combine similarity search with metadata filters. "Find the 10 most similar chunks to this query, but only from documents tagged as legal contracts from 2024."
Delete and Update: remove or replace stored items.
Use Cases
Semantic search. Instead of matching keywords, match meaning. A user searching "how do I cancel my subscription" should find results about "account termination" and "ending a plan" even if those exact words are not in the query. Vector search finds results based on semantic similarity, not keyword overlap.
RAG document retrieval. The retrieval step in a RAG system: embed the user's question, search the vector database for relevant document chunks, pass them to the LLM. This is the most common production use case for vector databases today.
Recommendation systems. Store embeddings of products, articles, or users based on their features or behavior. Find the most similar items to what a user has engaged with.
Duplicate detection. Find near-duplicate documents, images, or records by embedding them and searching for high-similarity matches.
Anomaly detection. Items far from all other items (low similarity to their nearest neighbors) are candidates for anomalies.
Major Options Compared
ChromaDB
ChromaDB (trychroma.com) is the simplest option for local development and small projects. It runs as an in-memory database (for dev) or writes to a local file system (for persistence). No separate service to install, no API keys, just a Python library.
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("my_docs")
collection.add(
documents=["The quick brown fox"],
embeddings=[[0.1, 0.2, 0.3, ...]], # your embedding vector
ids=["doc1"]
)
results = collection.query(
query_embeddings=[[0.15, 0.18, 0.31, ...]],
n_results=5
)
Best for: local development, prototypes, collections under a few million items, teams without infrastructure for a managed service.
Limitations: horizontal scaling is limited. For very large production workloads, ChromaDB's performance and operational complexity require careful management.
Pinecone
Pinecone (pinecone.io) is a fully managed vector database. You do not run any infrastructure; you create an index through the API and query it. It scales automatically and has strong latency guarantees at large scale.
Best for: production applications where you do not want to manage infrastructure, collections in the hundreds of millions to billions range, teams without dedicated ML infrastructure experience.
Limitations: paid only (free tier is limited). Data is stored on Pinecone's infrastructure, which may not be acceptable for sensitive data requirements.
Weaviate
Weaviate (weaviate.io) is open-source with a self-hosted or managed cloud option. It has strong features for combining vector search with structured data filtering, and it supports multiple embedding model integrations natively.
Best for: teams who need self-hosted control over their data, applications that heavily combine vector search with structured metadata filters, teams comfortable managing containerized services.
pgvector
pgvector (github.com/pgvector/pgvector) is a Postgres extension that adds vector similarity search to a standard Postgres database.
-- Create a table with a vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536) -- dimension for text-embedding-3-small
);
-- Create an index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Search for the 5 most similar documents
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;
Best for: teams already running Postgres who do not want to add another service. Keeps all data in one database. Excellent choice for collections up to tens of millions of vectors.
Limitations: approximate nearest neighbor performance is slower than dedicated vector databases at very large scale. Requires Postgres operational expertise.
Qdrant
Qdrant (qdrant.tech) is an open-source vector database written in Rust, known for fast performance and rich filtering capabilities. Available as a self-hosted service or managed cloud.
Best for: high-throughput production workloads, teams who want open-source control with strong performance, applications needing sophisticated payload filtering alongside vector search.
How to Choose
The decision comes down to four factors:
Dataset size:
- Under 1 million vectors: ChromaDB or pgvector
- 1 million to 100 million: Weaviate, Qdrant, or pgvector
- Over 100 million: Pinecone, Weaviate, or Qdrant managed
Self-hosting requirement:
- Must self-host (data privacy, compliance): pgvector, Weaviate self-hosted, Qdrant self-hosted
- Managed is fine: Pinecone, Weaviate Cloud, Qdrant Cloud
Existing infrastructure:
- Already on Postgres: pgvector is the lowest-friction choice
- Starting fresh: ChromaDB for prototyping, then evaluate Pinecone or Qdrant for production
Development stage:
- Prototype: ChromaDB — zero setup, easy to switch later
- Production: evaluate based on scale, cost, and hosting requirements
The most common pattern: start with ChromaDB for prototyping and development. When you need to move to production, evaluate Pinecone (if managed is fine) or Qdrant/Weaviate (if self-hosted is required).
Keep Reading
- Building a RAG System From Scratch: A Complete Implementation Guide — How vector databases fit into a complete RAG implementation
- RAG vs Fine-Tuning: Which One Does Your Application Actually Need? — When to build RAG with a vector database versus other approaches
- Neural Networks Explained: A Visual Guide for Software Developers — How embeddings are created by neural networks
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.