The Problem With Dense Retrieval
Dense retrieval models (DPR, ColBERT, nomic-embed) encode queries and documents into dense vectors and retrieve via approximate nearest neighbor search. They handle semantic similarity well but struggle with exact term matching and require specialized vector databases. BM25, the classic TF-IDF variant, does the opposite: exact term matching with fast inverted index lookup, no semantic generalization.
SPLADE (Sparse Lexical And Expansion model) occupies the best of both worlds: sparse representations that work with standard inverted indexes, but with learned weights and vocabulary expansion that capture semantics BM25 misses.
How SPLADE Produces Sparse Representations
SPLADE takes a BERT model and repurposes its masked language model (MLM) head. Instead of predicting masked tokens, it uses the MLM head to score every token in the vocabulary for each input position, then aggregates across positions via max-pooling and applies a log(1 + ReLU(x)) transformation to produce sparse, non-negative weights:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch
model_name = "naver/splade-cocondenser-ensembledistil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)
def encode_splade(text):
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
with torch.no_grad():
output = model(**inputs)
logits = output.logits # [1, seq_len, vocab_size]
# Max-pool across sequence positions, apply log(1 + ReLU)
vec = torch.log(1 + torch.relu(logits)).max(dim=1).values.squeeze()
return vec
query_vec = encode_splade("transformer self-attention mechanism explained")
# Most weights are 0; non-zero entries correspond to vocabulary terms
nonzero = (query_vec > 0).sum().item()
print(f"Non-zero terms: {nonzero} out of {len(query_vec)}") # typically 20-200
# See which terms were expanded
nonzero_indices = query_vec.nonzero(as_tuple=True)[0]
terms = [(tokenizer.decode([idx.item()]), query_vec[idx.item()].item())
for idx in nonzero_indices]
print(sorted(terms, key=lambda x: -x[1])[:10])
Query Expansion in Practice
For the query "transformer self-attention", SPLADE might assign non-zero weights to: "transformer", "attention", "self", "mechanism", "architecture", "neural", "encoder", "multi-head", "bert", "query". This expansion is what gives SPLADE semantic generalization — the inverted index sees expanded term lists, not just the original query terms.
BEIR Benchmark Results
On BEIR (a heterogeneous retrieval benchmark covering 18 datasets):
- BM25: 43.0 average nDCG@10
- SPLADE-v2: 51.1 average nDCG@10
- ColBERT v2: 52.0 average nDCG@10
- SPLADE++: 54.1 average nDCG@10
SPLADE++ matches or beats ColBERT while remaining compatible with inverted index infrastructure.
Integration With Elasticsearch
# Store SPLADE sparse vectors in Elasticsearch using sparse_vector field
doc_body = {
"mappings": {
"properties": {
"splade_embedding": {"type": "sparse_vector"},
"content": {"type": "text"}
}
}
}
# At query time, convert SPLADE output to dict and use sparse_vector query
splade_vec = encode_splade(query_text)
query_dict = {
str(idx): val.item()
for idx, val in zip(splade_vec.nonzero()[0], splade_vec[splade_vec > 0])
}
When to Choose SPLADE
Use SPLADE when: you already have Elasticsearch or OpenSearch infrastructure (no vector DB migration needed), interpretability matters (you can see which terms drove retrieval), or your domain has specialized terminology that benefits from expansion. Use dense retrieval when cross-lingual search, image-text retrieval, or semantic similarity beyond vocabulary overlap is the primary requirement.