OpenAI text-embedding-3: The New Embedding Models and When to Use Each

OpenAI's text-embedding-3-small and text-embedding-3-large introduce Matryoshka representation learning - you can truncate dimensions without retraining, cutting storage costs while keeping most retrieval quality.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 16, 2026

8 min read

// tags

#openai-embeddings#text-embedding-3#rag#mteb#vector-search

FIG. ART-29

8 min read

“

OpenAI text-embedding-3: The New Embedding Models and When to Use Each

// reading plan

sections

447

words

min read

// LLMs & Language Models

When to Fine-Tune an LLM (And When to Rely on RAG Instead)

Fine-tuning updates model weights, while RAG inserts context. Learn how to combine these strategies or choose the right one for your data.

9 min read

// Machine Learning

ONNX: Export Any ML Model and Run It Anywhere

Matryoshka Representation Learning

The headline technical feature is Matryoshka embeddings: the model is trained so that the first N dimensions of a 3072-dimension vector are nearly as useful as the full vector. This means you can truncate dimensions at query time without retraining.

from openai import OpenAI
import numpy as np

client = OpenAI()

def get_embedding(text: str, dimensions: int = 1536) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
        dimensions=dimensions,  # truncate here, not post-hoc
    )
    return response.data[0].embedding

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr = np.array(a)
    b_arr = np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))

query_emb = get_embedding("How do transformers handle long sequences?", dimensions=256)
doc_emb = get_embedding("Attention mechanisms scale quadratically with sequence length.", dimensions=256)

print(f"Similarity: {cosine_similarity(query_emb, doc_emb):.4f}")

Using 256 dimensions instead of 1536 reduces vector storage by 6x while retaining roughly 92% of retrieval quality on most benchmarks.

Migration from Ada-002

The embeddings are not backward compatible - ada-002 vectors and text-embedding-3 vectors live in different spaces and cannot be compared. If you are migrating a production vector database:

Keep ada-002 running for existing queries
Re-embed your entire corpus with text-embedding-3-small
Update your vector store index
Cut over traffic and deprecate ada-002

For Pinecone, create a new index with the new dimension count (1536 for small, 3072 for large). For pgvector, alter the column or create a new one.

When Voyage or Cohere Beat OpenAI

Voyage-3 consistently leads MTEB for English retrieval tasks - if maximum retrieval accuracy is the priority and you can afford slightly more complex integration, Voyage is worth testing.
Cohere embed-multilingual-v3 dominates when you need 100+ languages - OpenAI's multilingual performance is good but not best-in-class.
OpenAI wins on simplicity (one SDK, one billing account) and latency (well-optimized inference infrastructure).

Model	MTEB Average	Dimensions	Cost/1M tokens
text-embedding-3-large	64.6	3072	$0.13
text-embedding-3-small	62.3	1536	$0.02
text-embedding-ada-002	61.0	1536	$0.10
Cohere embed-v3	64.5	1024	$0.10
Voyage-3	67.1	1024	$0.06

OpenAI text-embedding-3: The New Embedding Models and When to Use Each

Related Articles

When to Fine-Tune an LLM (And When to Rely on RAG Instead)

ONNX: Export Any ML Model and Run It Anywhere

The Two New Models

MTEB Leaderboard Performance

Matryoshka Representation Learning

Migration from Ada-002

When Voyage or Cohere Beat OpenAI

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Context Stuffing vs RAG: When to Put Everything in Context

OpenAI text-embedding-3: The New Embedding Models and When to Use Each

Related Articles

When to Fine-Tune an LLM (And When to Rely on RAG Instead)

ONNX: Export Any ML Model and Run It Anywhere

The Two New Models

MTEB Leaderboard Performance

Matryoshka Representation Learning

Migration from Ada-002

When Voyage or Cohere Beat OpenAI

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Context Stuffing vs RAG: When to Put Everything in Context

The workspace your team
actually needs