Aya 23: Cohere's Multilingual LLM Fine-Tuned Across 23 Languages

Cohere For AI trained Aya 23 on 204k human-written multilingual prompts to create an instruction model that serves low-resource languages most commercial LLMs ignore.

Mahmudul Haque Qudrati

CEO & ML Engineer

April 20, 2026

7 min read

// tags

#aya-23#cohere#multilingual#low-resource#instruction-tuning

FIG. ART-26

7 min read

“

Aya 23: Cohere's Multilingual LLM Fine-Tuned Across 23 Languages

// reading plan

sections

391

words

min read

// LLMs & Language Models

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

AA Index 61 vs 60 vs 57. SWE-Bench Pro, GDPval-AA, pricing tables, and where each model loses. Updated June 3, 2026 with primary source benchmarks.

12 min read

// LLMs & Language Models

DeepSeek-R1: Architectures, Training Methods, and Why Reasoning Models Matter

The Aya Dataset

The foundation of Aya 23's fine-tuning is the Aya Dataset: 204,000 human-written and human-verified prompt-completion pairs across 65 languages (Aya 23 uses a 23-language subset). Unlike synthetic multilingual datasets generated by translating English examples, these were created by native speakers working in their own languages - capturing idiomatic expressions, cultural context, and language-specific reasoning patterns.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/aya-23-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

# Ukrainian example
messages = [{"role": "user", "content": "Поясніть квантові обчислення простими словами."}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=300)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Model Sizes and License

Aya 23 comes in 8B and 35B variants, both under the CC-BY-NC 4.0 license (non-commercial). The 35B model substantially outperforms the 8B on complex reasoning tasks across all 23 languages, while the 8B is practical for deployment on a single A100.

Benchmark Comparisons

Against multilingual competitors on WMT translation and multilingual reasoning:

Aya 23 8B surpasses mT0-13B and BLOOMZ-7B on multilingual instruction-following
The 35B variant outperforms Aya 1 (the predecessor) by an average of 6.6% on discriminative and 4.1% on generative tasks
Performance on low-resource languages shows the largest gains over baselines

Practical Applications

Teams building multilingual customer support, document summarization for international markets, or government services that must serve non-English speakers will find Aya 23 more capable than applying a translation layer around an English-only model. The native instruction-following capability in each language eliminates the compounding errors of translate-then-reason pipelines.

Aya 23: Cohere's Multilingual LLM Fine-Tuned Across 23 Languages

Related Articles

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

The Multilingual Gap in Instruction Tuning

23 Languages, Including Low-Resource Ones

The Aya Dataset

Model Sizes and License

Benchmark Comparisons

Practical Applications

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

DeepSeek-R1: Architectures, Training Methods, and Why Reasoning Models Matter

Local LLMs in 2026: Comparing Llama 3.3, Mistral Large, and DeepSeek-R1

Aya 23: Cohere's Multilingual LLM Fine-Tuned Across 23 Languages

Related Articles

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

The Multilingual Gap in Instruction Tuning

23 Languages, Including Low-Resource Ones

The Aya Dataset

Model Sizes and License

Benchmark Comparisons

Practical Applications

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

DeepSeek-R1: Architectures, Training Methods, and Why Reasoning Models Matter

Local LLMs in 2026: Comparing Llama 3.3, Mistral Large, and DeepSeek-R1

The workspace your team
actually needs