Whisper Large v3: OpenAI's Best Open-Source Speech Recognition Model

Whisper Large v3 reduces word error rates across all 99 supported languages and adds word-level timestamps, making it the default choice for production speech recognition pipelines.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 9, 2026

7 min read

// tags

#whisper#openai#speech-recognition#asr#multilingual

FIG. ART-30

7 min read

“

Whisper Large v3: OpenAI's Best Open-Source Speech Recognition Model

// reading plan

sections

383

words

min read

// AI Agents

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

Harness engineering is the practice of building structured, safe environments for AI agents to execute code. This post explains how to leverage OpenAI Codex in an agent-first world, with concrete examples, cost breakdowns, and honest tradeoffs.

5 min read

// LLM & Language Models

Python Transcription With faster-whisper

The faster-whisper library provides 4x speed improvement over the original Whisper through CTranslate2 backend:

from faster_whisper import WhisperModel

model = WhisperModel(
    "large-v3",
    device="cuda",
    compute_type="float16"
)

segments, info = model.transcribe(
    "audio.mp3",
    beam_size=5,
    word_timestamps=True,
    language="en"
)

print(f"Detected language: {info.language} ({info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
    for word in segment.words:
        print(f"  Word: '{word.word}' at {word.start:.2f}s")

Speaker Diarization With pyannote

Combine Whisper with pyannote.audio for "who said what when" transcripts:

from pyannote.audio import Pipeline
from faster_whisper import WhisperModel

# Requires HuggingFace token and model access approval
diarize_pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="YOUR_HF_TOKEN"
)

whisper_model = WhisperModel("large-v3", device="cuda", compute_type="float16")

# Run diarization
diarization = diarize_pipeline("audio.wav")

# Run transcription with word timestamps
segments, _ = whisper_model.transcribe("audio.wav", word_timestamps=True)

# Align speakers to words  -  see pyannote documentation for full alignment logic
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"Speaker {speaker}: {turn.start:.1f}s to {turn.end:.1f}s")

Whisper.cpp for CPU Inference

For environments without GPU access, Whisper.cpp provides pure CPU inference. Large v3 in Q5_0 quantization runs at roughly 10x real-time on an M2 MacBook Pro (1 hour of audio in 6 minutes). The whisper-cpp Python bindings make integration straightforward.

Batch Transcription for Long Audio

For files over 30 minutes, chunk audio into 25-30 second segments with 2-second overlap to avoid cutting mid-word. Whisper's context window is 30 seconds; feeding longer audio causes it to loop or hallucinate. faster-whisper handles this automatically when chunk_length is set appropriately.

Production throughput on a single A10G GPU: approximately 300 minutes of audio per hour with Large v3 at float16.

Whisper Large v3: OpenAI's Best Open-Source Speech Recognition Model

Related Articles

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

v3 vs v2: What Actually Improved

99 Languages and Accuracy

Python Transcription With faster-whisper

Speaker Diarization With pyannote

Whisper.cpp for CPU Inference

Batch Transcription for Long Audio

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

What Is OpenAI Frontier Models and Codex on AWS? A Practical Overview

ONNX: Export Any ML Model and Run It Anywhere

Whisper Large v3: OpenAI's Best Open-Source Speech Recognition Model

Related Articles

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

v3 vs v2: What Actually Improved

99 Languages and Accuracy

Python Transcription With faster-whisper

Speaker Diarization With pyannote

Whisper.cpp for CPU Inference

Batch Transcription for Long Audio

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

What Is OpenAI Frontier Models and Codex on AWS? A Practical Overview

ONNX: Export Any ML Model and Run It Anywhere

The workspace your team
actually needs