OpenHermes 2.5: Mistral Fine-Tuned on 1M Synthetic GPT-4 Conversations

Nous Research's OpenHermes 2.5 demonstrates that one million carefully curated synthetic conversations can produce an instruction model that rivals much larger open weights.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 15, 2026

7 min read

// tags

#openhermes#nous-research#synthetic-data#gpt-4#instruction-following

FIG. ART-27

7 min read

“

OpenHermes 2.5: Mistral Fine-Tuned on 1M Synthetic GPT-4 Conversations

// reading plan

sections

397

words

min read

// Open Source AI

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

OpenCode runs Claude, GPT, Gemini, or local Ollama models in one terminal agent — Claude Code is official, polished, and Anthropic-native. Honest 2026 comparison.

5 min read

// Open Source AI

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

ChatML Format

OpenHermes 2.5 uses the ChatML prompt format, which provides clean structure for multi-turn dialogue:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "teknium/OpenHermes-2.5-Mistral-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = """<|im_start|>system
You are a helpful coding assistant. Be concise.<|im_end|>
<|im_start|>user
Write a Python function to flatten a nested list.<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=300, temperature=0.7)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Running With Ollama

For local deployment, Ollama provides a quantized version:

ollama pull openhermes
ollama run openhermes "Explain the CAP theorem in simple terms."

GPQA and Benchmark Position

On GPQA (Graduate-Level Google-Proof Q&A), OpenHermes 2.5 scores competitively for its size - a benchmark specifically designed to be hard for models trained on internet data, requiring actual reasoning rather than pattern matching. The model consistently ranks in the top tier of open 7B instruction models across coding benchmarks like HumanEval and reasoning benchmarks like ARC-Challenge.

System Prompt Flexibility

One of the model's practical strengths is how well it responds to varied system prompts. Unlike models trained on narrow chat formats, OpenHermes 2.5 reliably adopts personas, follows domain-specific constraints, and maintains instruction-following across long multi-turn sessions. This makes it particularly useful for roleplay applications, domain-specific assistants, and structured output generation.

Data Volume vs. Data Quality

The lesson from OpenHermes 2.5 is nuanced: 1M examples worked here because they were diverse and filtered, not simply because of the count. Teams attempting to replicate this approach should budget more time for data curation than for training - the training run itself is relatively cheap on modern hardware.

OpenHermes 2.5: Mistral Fine-Tuned on 1M Synthetic GPT-4 Conversations

Related Articles

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

Synthetic Data at Scale

What Makes the Data Different

ChatML Format

Running With Ollama

GPQA and Benchmark Position

System Prompt Flexibility

Data Volume vs. Data Quality

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

Building a RAG System With Open Source Tools: A Practical Guide

OpenHermes 2.5: Mistral Fine-Tuned on 1M Synthetic GPT-4 Conversations

Related Articles

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

Synthetic Data at Scale

What Makes the Data Different

ChatML Format

Running With Ollama

GPQA and Benchmark Position

System Prompt Flexibility

Data Volume vs. Data Quality

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

Building a RAG System With Open Source Tools: A Practical Guide

The workspace your team
actually needs