Embedding models convert text into numerical vectors that capture semantic meaning, enabling semantic search, similarity ranking, and retrieval-augmented generation. For most RAG and semantic search use cases, open source embedding models now match or come close to OpenAI's text-embedding-3-small model on standard benchmarks, at zero API cost when run locally. The best open source embedding models for general English text: all-MiniLM-L6-v2 for speed-sensitive applications, all-mpnet-base-v2 for higher quality at moderate latency, and BAAI/bge-m3 for state-of-the-art quality and multilingual support. The right choice depends on your latency requirements, language needs, and whether you are running locally or via API.
Here is the complete comparison.
How Embedding Models Work
An embedding model takes a string of text as input and returns a vector (an array of floating-point numbers) as output. The vector's dimensions encode semantic information: two texts with similar meaning will produce vectors that are close in the vector space (high cosine similarity). Two texts with different meanings will produce vectors that are far apart.
Vector dimensionality varies by model: all-MiniLM-L6-v2 produces 384-dimensional vectors, all-mpnet-base-v2 produces 768-dimensional, text-embedding-3-small produces 1536-dimensional, and BGE-M3 produces 1024-dimensional. Higher dimensionality does not always mean better quality, but it does mean more storage and slower similarity search.
The primary use case in AI applications: you embed your documents and store the vectors in a vector database (Chroma, Pinecone, Weaviate, pgvector). At query time, you embed the user's query and find the most similar document vectors. Those documents are the context you pass to the LLM. This is the core of RAG.
The sentence-transformers Library
sentence-transformers (GitHub: UKPLab/sentence-transformers, 16k+ stars) is the standard Python library for running open source embedding models locally. It wraps Hugging Face Transformers with a simpler API for generating sentence embeddings.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 384)
The library handles tokenization, batching, and normalization automatically. For most use cases, this is all you need.