What are the best practices for Lost in the Middle: Why LLMs Struggle With Long Contexts?

Best practices include: (1) placing the most relevant information at the beginning or end of the context, (2) limiting the number of retrieved chunks to 4-8, (3) re-ranking and reordering passages so the top result is first and second best is last, (4) using sliding window or hierarchical summarization for long documents, and (5) fine-tuning with position-aware data if possible.

How much does Lost in the Middle: Why LLMs Struggle With Long Contexts cost?

The Lost in the Middle effect itself is a research finding, not a product, so it has no direct cost. However, mitigating it may involve additional engineering effort (e.g., re-ranking, chunking) and potentially higher API costs if you use multiple calls for summarization. The cost savings from improved accuracy often outweigh these expenses.

Is Lost in the Middle: Why LLMs Struggle With Long Contexts worth it in 2026?

Yes, the finding remains highly relevant in 2026 as context windows continue to grow. Newer models like Gemini 1.5 Pro (1M tokens) still exhibit position bias, though some improvements have been made. Practitioners should still apply mitigations like strategic reordering and chunking to ensure reliable performance in long-context applications.

Lost in the Middle: Why LLMs Struggle With Long Contexts (2025)

The U-Shaped Performance Curve

Performance was highest when the relevant document appeared at the very beginning (primacy effect) or very end (recency effect) of the context. Performance was lowest when the relevant document was in the middle - dropping 15-25 percentage points on accuracy compared to the same model with the document at position 0%.

With 20 documents, models that scored 70%+ accuracy with the answer first dropped to under 50% accuracy with the answer in the middle - a meaningful drop in real-world usefulness despite the context fitting within the model's window.

Why This Happens

The leading hypothesis is that Transformer attention patterns during pretraining correlate with position. Documents at the beginning are always attended to (they are in the attention window of every subsequent token). Documents at the end receive strong attention from each other. Documents in the middle receive proportionally less attention on average.

Additionally, the KV cache for middle-position tokens is accessed less frequently in autoregressive generation, reducing their effective influence on the output.

Implications for RAG Systems

This finding has direct implications for how to build RAG pipelines:

Put the most relevant chunks first or last - not in the middle of your retrieved context
Prefer fewer, higher-quality passages over many mediocre ones - adding irrelevant middle context actively hurts
Re-rank retrieved passages and place top-ranked results at context boundaries

def reorder_for_lost_in_middle(passages: list[str], scores: list[float]) -> list[str]:
    """
    Reorder passages to avoid the 'lost in the middle' effect.
    Highest scored passage first, second highest last, rest in between.
    """
    paired = sorted(zip(scores, passages), reverse=True)
    sorted_passages = [p for _, p in paired]

    if len(sorted_passages) <= 2:
        return sorted_passages

    # Best passage first, second best last, rest in middle
    result = [sorted_passages[0]]
    result.extend(sorted_passages[2:])   # middle passages
    result.append(sorted_passages[1])    # second best at end
    return result

How to Structure Prompts for Long Contexts

Beyond RAG, this affects any long-context task:

Put the most critical instructions at the beginning and end of the system prompt
Put examples and background in the middle (they are less likely to be precisely recalled)
For summarization tasks, consider chunked approaches that process the document in windows rather than all at once

Model Comparisons

Claude 1.3 (100k context) showed a less severe U-shape than GPT-3.5-turbo, suggesting Anthropic's training paid attention to uniform recall. GPT-4 showed more robustness than GPT-3.5. Llama 2 models showed the most severe degradation. The pattern was consistent but varied in magnitude.

Practical Mitigations for Practitioners

1. Chunk and Reorder Strategically

When building a RAG pipeline, don't just concatenate retrieved chunks in order of relevance. Instead, place the most relevant chunk at the very beginning, the second most relevant at the very end, and the rest in between. This simple reordering can yield 10-20% accuracy improvements in multi-document QA tasks.

2. Limit Context Size

If your model supports 100k tokens, you might be tempted to fill it all. But the "lost in the middle" effect suggests that beyond a certain point, adding more context actually hurts performance. For many tasks, a context of 4-8 well-chosen chunks (roughly 2-4k tokens) outperforms a context of 20+ chunks (10k+ tokens).

3. Use Sliding Window or Hierarchical Summarization

For long documents, consider processing them in overlapping windows and then summarizing each window before feeding the summaries into the final prompt. This reduces the effective context length and mitigates position bias.

4. Fine-tune with Position-Aware Data

If you're fine-tuning an LLM for a specific long-context task, include training examples where the answer appears in various positions, especially the middle. This can help the model learn to attend uniformly across the context.

Real-World Example: Customer Support Chat

Imagine a customer support chatbot that receives a long conversation history. The most recent messages (at the end) get strong attention, and the initial problem description (at the beginning) also gets attention. But a key detail mentioned in the middle of the conversation might be lost. To mitigate, you can:

Extract the initial problem statement and place it at the beginning of the prompt
Summarize the middle portion into a concise bullet list
Place the most recent query at the end

This structure ensures that critical information is not buried.

The Cost of Ignoring Lost in the Middle

In production RAG systems, ignoring this effect can lead to:

Lower answer accuracy, especially for complex queries requiring multiple pieces of evidence
Increased hallucination when the model guesses based on incomplete context
Poor user experience when the model misses key details

By contrast, applying the simple reordering strategy can improve accuracy by 10-15% without any model changes.

Future Directions

Since the original paper, several follow-up works have explored:

Position-aware attention mechanisms that weight all positions equally
Training methods that expose models to more diverse position distributions
Hybrid retrieval that combines sparse and dense methods to reduce the number of chunks needed

As context windows grow to millions of tokens (e.g., Gemini 1.5 Pro), the "lost in the middle" problem may become even more pronounced, making these mitigations essential.

Conclusion

The "Lost in the Middle" paper is a must-read for anyone building LLM applications. It reveals a fundamental limitation of current Transformer architectures: position bias. By understanding and mitigating this effect, you can build more reliable RAG systems, better prompts, and more robust long-context applications. The key takeaway: treat context boundaries as prime real estate, and don't let your most important information get lost in the middle.

Lost in the Middle: Why LLMs Struggle With Long Contexts

The Counterintuitive Finding

The Experimental Setup

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

When to Fine-Tune an LLM (And When to Rely on RAG Instead)

LLM Knowledge Cutoffs: What They Mean and How to Work Around Them

Context Stuffing vs RAG: When to Put Everything in Context

The U-Shaped Performance Curve

Why This Happens

Implications for RAG Systems

How to Structure Prompts for Long Contexts

Model Comparisons

Further Reading

Practical Mitigations for Practitioners

1. Chunk and Reorder Strategically

2. Limit Context Size

3. Use Sliding Window or Hierarchical Summarization

4. Fine-tune with Position-Aware Data

Real-World Example: Customer Support Chat

The Cost of Ignoring Lost in the Middle

Future Directions

Conclusion

Frequently Asked Questions

What is Lost in the Middle: Why LLMs Struggle With Long Contexts?

How does Lost in the Middle: Why LLMs Struggle With Long Contexts work?

What are the best practices for Lost in the Middle: Why LLMs Struggle With Long Contexts?

How much does Lost in the Middle: Why LLMs Struggle With Long Contexts cost?

Is Lost in the Middle: Why LLMs Struggle With Long Contexts worth it in 2026?

The workspace your team
actually needs

Lost in the Middle: Why LLMs Struggle With Long Contexts

The Counterintuitive Finding

The Experimental Setup

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

When to Fine-Tune an LLM (And When to Rely on RAG Instead)

LLM Knowledge Cutoffs: What They Mean and How to Work Around Them

Context Stuffing vs RAG: When to Put Everything in Context

The U-Shaped Performance Curve

Why This Happens

Implications for RAG Systems

How to Structure Prompts for Long Contexts

Model Comparisons

Further Reading

Practical Mitigations for Practitioners

1. Chunk and Reorder Strategically

2. Limit Context Size

3. Use Sliding Window or Hierarchical Summarization

4. Fine-tune with Position-Aware Data

Real-World Example: Customer Support Chat

The Cost of Ignoring Lost in the Middle

Future Directions

Conclusion

Frequently Asked Questions

What is Lost in the Middle: Why LLMs Struggle With Long Contexts?

How does Lost in the Middle: Why LLMs Struggle With Long Contexts work?

What are the best practices for Lost in the Middle: Why LLMs Struggle With Long Contexts?

How much does Lost in the Middle: Why LLMs Struggle With Long Contexts cost?

Is Lost in the Middle: Why LLMs Struggle With Long Contexts worth it in 2026?

The workspace your teamactually needs

The workspace your team
actually needs