The 512-Token Bottleneck
Most popular embedding models — all-MiniLM-L6-v2, OpenAI text-embedding-3-small, BGE-base — have a maximum context of 512 tokens. For short sentences and paragraphs, this is fine. For embedding full documents, meeting transcripts, or long legal clauses, 512 tokens forces chunking strategies that break semantic context across boundaries.
nomic-embed-text-v1.5 raises that limit to 8192 tokens with no performance degradation on short-text tasks.
Matryoshka Embeddings
Traditional embedding models produce fixed-dimension vectors (e.g., 768-dim). Matryoshka Representation Learning (MRL) trains models so that the first N dimensions of a longer vector are themselves a high-quality embedding. nomic-embed-text supports 512, 256, and 128-dim slices from its 768-dim output:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
# Full 768-dim embedding
texts = ["search_query: How does attention mechanism work in transformers?"]
embeddings_768 = model.encode(texts)
# 256-dim Matryoshka slice (truncate, then renormalize)
import numpy as np
embeddings_256 = embeddings_768[:, :256]
embeddings_256 = embeddings_256 / np.linalg.norm(embeddings_256, axis=1, keepdims=True)
print(f"Full embedding shape: {embeddings_768.shape}")
print(f"Compressed embedding shape: {embeddings_256.shape}")
Task Prefix Convention
nomic-embed uses prefixes to signal the embedding use case:
# For retrieval queries
query_embedding = model.encode(["search_query: What causes transformer models to hallucinate?"])
# For documents being indexed
doc_embedding = model.encode(["search_document: Hallucination in LLMs occurs when..."])
# For classification/clustering (no prefix)
cluster_embedding = model.encode(["Machine learning is a subset of artificial intelligence"])
MTEB Performance
nomic-embed-text-v1.5 achieves an MTEB (Massive Text Embedding Benchmark) average of 62.3% — competitive with OpenAI's text-embedding-3-small (62.3%) and substantially better than all-MiniLM-L6-v2 (56.3%). The key differentiator is the 8192 context at equivalent quality.
nomic-embed-vision
Nomic also released nomic-embed-vision, a CLIP-compatible image embedding model aligned to the same embedding space as nomic-embed-text. This means you can embed both images and text into the same vector space and do cross-modal retrieval without a separate model:
from PIL import Image
from sentence_transformers import SentenceTransformer
vision_model = SentenceTransformer("nomic-ai/nomic-embed-vision-v1.5", trust_remote_code=True)
text_model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
image_emb = vision_model.encode(Image.open("diagram.png"))
text_emb = text_model.encode(["search_query: neural network architecture diagram"])
similarity = np.dot(image_emb, text_emb.T)
Deployment With Ollama
ollama pull nomic-embed-text
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "search_document: Your document text here"
}'
Full Openness Under Apache 2.0
Unlike most embedding models where the training data and pipeline are proprietary, nomic-embed releases training code, training data details, and weights under Apache 2.0 — enabling auditing, fine-tuning, and commercial deployment without restriction.