SPLADE: Sparse Neural Retrieval That Beats BM25 With Learned Weights

SPLADE uses BERT's masked language model head to produce sparse, interpretable retrieval representations that outperform BM25 while remaining compatible with inverted index infrastructure.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 14, 2026

7 min read

// tags

#splade#sparse-retrieval#bm25#bert#information-retrieval

FIG. ART-30

7 min read

“

SPLADE: Sparse Neural Retrieval That Beats BM25 With Learned Weights

// reading plan

sections

461

words

min read

// Machine Learning

ONNX: Export Any ML Model and Run It Anywhere

ONNX (Open Neural Network Exchange) is the universal model format — export from PyTorch, scikit-learn, or HuggingFace and run 3x faster inference with ONNX Runtime on CPU or GPU.

7 min read

// Machine Learning

Decision Trees and Random Forests Explained: When Tree Methods Beat Neural Networks

The Problem With Dense Retrieval

Dense retrieval models (DPR, ColBERT, nomic-embed) encode queries and documents into dense vectors and retrieve via approximate nearest neighbor search. They handle semantic similarity well but struggle with exact term matching and require specialized vector databases. BM25, the classic TF-IDF variant, does the opposite: exact term matching with fast inverted index lookup, no semantic generalization.

SPLADE (Sparse Lexical And Expansion model) occupies the best of both worlds: sparse representations that work with standard inverted indexes, but with learned weights and vocabulary expansion that capture semantics BM25 misses.

How SPLADE Produces Sparse Representations

SPLADE takes a BERT model and repurposes its masked language model (MLM) head. Instead of predicting masked tokens, it uses the MLM head to score every token in the vocabulary for each input position, then aggregates across positions via max-pooling and applies a log(1 + ReLU(x)) transformation to produce sparse, non-negative weights:

from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

model_name = "naver/splade-cocondenser-ensembledistil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

def encode_splade(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    with torch.no_grad():
        output = model(**inputs)
    logits = output.logits  # [1, seq_len, vocab_size]
    # Max-pool across sequence positions, apply log(1 + ReLU)
    vec = torch.log(1 + torch.relu(logits)).max(dim=1).values.squeeze()
    return vec

query_vec = encode_splade("transformer self-attention mechanism explained")
# Most weights are 0; non-zero entries correspond to vocabulary terms
nonzero = (query_vec > 0).sum().item()
print(f"Non-zero terms: {nonzero} out of {len(query_vec)}")  # typically 20-200

# See which terms were expanded
nonzero_indices = query_vec.nonzero(as_tuple=True)[0]
terms = [(tokenizer.decode([idx.item()]), query_vec[idx.item()].item())
         for idx in nonzero_indices]
print(sorted(terms, key=lambda x: -x[1])[:10])

Query Expansion in Practice

For the query "transformer self-attention", SPLADE might assign non-zero weights to: "transformer", "attention", "self", "mechanism", "architecture", "neural", "encoder", "multi-head", "bert", "query". This expansion is what gives SPLADE semantic generalization — the inverted index sees expanded term lists, not just the original query terms.

BEIR Benchmark Results

On BEIR (a heterogeneous retrieval benchmark covering 18 datasets):

BM25: 43.0 average nDCG@10
SPLADE-v2: 51.1 average nDCG@10
ColBERT v2: 52.0 average nDCG@10
SPLADE++: 54.1 average nDCG@10

SPLADE++ matches or beats ColBERT while remaining compatible with inverted index infrastructure.

Integration With Elasticsearch

# Store SPLADE sparse vectors in Elasticsearch using sparse_vector field
doc_body = {
    "mappings": {
        "properties": {
            "splade_embedding": {"type": "sparse_vector"},
            "content": {"type": "text"}
        }
    }
}

# At query time, convert SPLADE output to dict and use sparse_vector query
splade_vec = encode_splade(query_text)
query_dict = {
    str(idx): val.item()
    for idx, val in zip(splade_vec.nonzero()[0], splade_vec[splade_vec > 0])
}

When to Choose SPLADE

Use SPLADE when: you already have Elasticsearch or OpenSearch infrastructure (no vector DB migration needed), interpretability matters (you can see which terms drove retrieval), or your domain has specialized terminology that benefits from expansion. Use dense retrieval when cross-lingual search, image-text retrieval, or semantic similarity beyond vocabulary overlap is the primary requirement.

SPLADE: Sparse Neural Retrieval That Beats BM25 With Learned Weights

Related Articles

ONNX: Export Any ML Model and Run It Anywhere

Decision Trees and Random Forests Explained: When Tree Methods Beat Neural Networks

The Problem With Dense Retrieval

How SPLADE Produces Sparse Representations

Query Expansion in Practice

BEIR Benchmark Results

Integration With Elasticsearch

When to Choose SPLADE

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Supervised Learning Explained: How Models Learn from Labeled Examples

SPLADE: Sparse Neural Retrieval That Beats BM25 With Learned Weights

Related Articles

ONNX: Export Any ML Model and Run It Anywhere

Decision Trees and Random Forests Explained: When Tree Methods Beat Neural Networks

The Problem With Dense Retrieval

How SPLADE Produces Sparse Representations

Query Expansion in Practice

BEIR Benchmark Results

Integration With Elasticsearch

When to Choose SPLADE

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Supervised Learning Explained: How Models Learn from Labeled Examples

The workspace your team
actually needs