What Is InternLM 2.5?
InternLM 2.5 is developed by Shanghai AI Laboratory, one of China's leading AI research institutes. The 20B parameter model (released July 2024) is positioned as a multilingual, long-context alternative to Qwen 2.5 and Yi-1.5, with particular strength on Chinese-language benchmarks and a 1M token context window.
The 1M Token Context Window
Very few models offer a genuine 1M token context, and the ones that do often degrade severely at long contexts. InternLM 2.5 20B is one of the models that maintains meaningful retrieval quality beyond 128k tokens.
Practical applications of 1M context:
- Analyzing an entire codebase at once (most production repos fit in 1M tokens)
- Reading a full academic literature review (50–100 papers worth of text)
- Processing a year of customer support tickets for trend analysis
- Long-running agentic conversations that accumulate substantial context
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "internlm/internlm2_5-20b-chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
response, history = model.chat(
tokenizer,
"Explain the differences between attention mechanisms in transformers",
history=[],
)
print(response)
Chain-of-Thought Reasoning
InternLM 2.5 was trained with explicit chain-of-thought (CoT) supervision, meaning it performs better on multi-step reasoning tasks when you prompt it to think step by step. On Chinese mathematical reasoning benchmarks:
- CMath (Chinese math): InternLM 2.5 20B scores above Qwen 2.5 14B and competitive with Qwen 2.5 32B
- CEVAL (Chinese comprehensive benchmark): scores 78.3%, above Yi-1.5 34B's 77.4% at less than half the parameter count
Tool Calling
The model includes native function calling support compatible with the OpenAI tool use format. This makes it drop-in compatible with agent frameworks like LangChain and LlamaIndex:
tools = [
{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current price of a stock by ticker symbol",
"parameters": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker (e.g. AAPL)"},
},
"required": ["ticker"],
},
},
}
]
Multilingual Coverage
InternLM 2.5 supports 10+ languages with particular strength in:
- Chinese (Simplified and Traditional)
- English
- Japanese
- Korean
- Multiple Southeast Asian languages
This is weaker coverage than Cohere's embed-multilingual-v3 (108 languages) but stronger than most models of this parameter count on Asian languages specifically.
Deployment with LMDeploy
Shanghai AI Lab maintains LMDeploy, an optimized inference framework for InternLM models with TurboMind backend:
pip install lmdeploy
lmdeploy serve api_server internlm/internlm2_5-20b-chat --server-port 23333
This exposes an OpenAI-compatible endpoint. LMDeploy achieves roughly 2–3x higher throughput than naive HuggingFace generation through continuous batching and KV cache optimization.