InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

Shanghai AI Lab's InternVL2-26B scores 61.2% on MMMU - within 2 points of GPT-4V - using a 6B vision encoder and dynamic high-resolution image tiling.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 30, 2026

7 min read

// tags

#internvl2#vision-language#shanghai-ai-lab#mmmu#open-source

FIG. ART-26

7 min read

“

InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

// reading plan

sections

381

words

min read

// Developer Tools

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Open Code Review is an open-source CLI tool from Alibaba that uses AI to review code changes. It runs locally, supports multiple LLMs, and costs about $0.01 per review. Here's a practical breakdown.

4 min read

// Machine Learning

ONNX: Export Any ML Model and Run It Anywhere

Dynamic High-Resolution Tiling

InternVL2 processes high-resolution images by dynamically splitting them into tiles of up to 448×448 pixels each. A 4K image can be represented with up to 40 tiles, preserving fine details in dense text, charts, and technical schematics without resizing artifacts.

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

model = AutoModel.from_pretrained(
    "OpenGVLab/InternVL2-26B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("OpenGVLab/InternVL2-26B", trust_remote_code=True)

image = Image.open("technical_diagram.png")
question = "<image>
Describe all the components and their connections in this diagram."

response = model.chat(tokenizer, image, question, generation_config={"max_new_tokens": 512})
print(response)

Benchmark Results Across Sizes

Model	MMMU	DocVQA	ChartQA
InternVL2-2B	36.3%	86.9%	76.2%
InternVL2-8B	51.2%	91.6%	83.3%
InternVL2-26B	61.2%	92.9%	87.2%
GPT-4V	63.1%	88.4%	78.5%

Note that InternVL2 outperforms GPT-4V on DocVQA and ChartQA while being within 2 points on MMMU.

Production Deployment With LMDeploy

For high-throughput serving, LMDeploy provides an optimized backend for InternVL2:

pip install lmdeploy
lmdeploy serve api_server OpenGVLab/InternVL2-26B --tp 2 --port 8080

This enables tensor-parallel serving across multiple GPUs with an OpenAI-compatible API.

Choosing a Size

InternVL2-8B fits on a single A100 40GB and covers most document/chart tasks adequately. InternVL2-26B is worth the additional GPU memory for scientific paper understanding, dense OCR, and math-heavy visuals. The 76B variant is for research labs with multi-GPU infrastructure.

InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

Related Articles

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Closing the Gap With GPT-4V

Architecture: InternViT-6B + InternLM2

Dynamic High-Resolution Tiling

Benchmark Results Across Sizes

Production Deployment With LMDeploy

Choosing a Size

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

InternVL 2: The Open-Source VLM Approaching GPT-4V on Benchmarks

Related Articles

Open Code Review – An AI-powered code review CLI tool: A Practical Overview

Closing the Gap With GPT-4V

Architecture: InternViT-6B + InternLM2

Dynamic High-Resolution Tiling

Benchmark Results Across Sizes

Production Deployment With LMDeploy

Choosing a Size

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

The workspace your team
actually needs