What Makes Embed v3 Different
Most embedding models treat all text the same way. Cohere Embed v3 does not. It takes an explicit input_type parameter that changes how the model encodes text based on its intended use. This is not a minor API detail — it is the primary reason Embed v3 outperforms competitors on asymmetric retrieval tasks (where queries are short and documents are long).
The input_type Parameter
import cohere
co = cohere.Client("YOUR_COHERE_KEY")
# Encoding a user search query
query_emb = co.embed(
texts=["What is the capital of France?"],
model="embed-english-v3.0",
input_type="search_query",
embedding_types=["float"],
).embeddings.float[0]
# Encoding documents to store in a vector database
doc_embs = co.embed(
texts=[
"Paris is the capital and largest city of France.",
"France is a country in Western Europe.",
],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["float"],
).embeddings.float
The four valid values are:
search_query— for user queries at retrieval timesearch_document— for documents being indexedclassification— for text classification tasksclustering— for topic clustering and deduplication
Always use the matching type at index time and query time, otherwise you are leaving retrieval quality on the table.
Model Variants
- embed-english-v3.0 — 1024 dimensions, English only, highest English performance
- embed-multilingual-v3.0 — 1024 dimensions, 108 languages, within 2-3 points of English-only on most tasks
Both output 1024-dimensional float32 vectors. Pricing is $0.10/1M tokens for both.
Compressed Embedding Formats
Cohere Embed v3 is one of the first production embedding APIs to natively support compressed output formats:
# int8 compression — 4x storage reduction, ~1% quality loss
doc_embs_int8 = co.embed(
texts=["Your document text here"],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["int8"],
).embeddings.int8
# binary compression — 32x storage reduction, ~3-5% quality loss
doc_embs_binary = co.embed(
texts=["Your document text here"],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["ubinary"],
).embeddings.ubinary
For a corpus of 10 million documents at 1024 dimensions:
- float32: ~40GB
- int8: ~10GB
- binary: ~1.25GB
The binary format fits a 10M-document index in the RAM of a standard cloud VM, enabling in-memory vector search without specialized hardware.
MTEB Comparison
| Model | MTEB English Retrieval | Multilingual | |---|---|---| | Cohere embed-english-v3.0 | 55.0 | English only | | Cohere embed-multilingual-v3.0 | 54.1 | 108 languages | | OpenAI text-embedding-3-large | 55.4 | Good | | Voyage-3 | 58.1 | Limited |
OpenAI edges ahead on raw MTEB retrieval, but Cohere's multilingual coverage (108 languages) and compressed format support make it the stronger choice for international enterprise deployments.
Weaviate Integration
import weaviate
from weaviate.classes.init import Auth
client = weaviate.connect_to_weaviate_cloud(
cluster_url="YOUR_WEAVIATE_URL",
auth_credentials=Auth.api_key("YOUR_WEAVIATE_KEY"),
headers={"X-Cohere-Api-Key": "YOUR_COHERE_KEY"},
)
# Weaviate handles Cohere embedding automatically with text2vec-cohere
collection = client.collections.get("Document")
results = collection.query.near_text(
query="machine learning optimization techniques",
limit=5,
)