Instructor: Extract Structured Data From Any LLM With Pydantic

Instructor wraps any LLM with Pydantic validation and automatic retries — turning unreliable JSON mode into type-safe structured extraction that actually works.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 22, 2026

7 min read

// tags

#instructor#pydantic#structured-output#extraction#python

FIG. ART-36

7 min read

“

Instructor: Extract Structured Data From Any LLM With Pydantic

// reading plan

sections

378

words

min read

// AI Marketing

Google Search Console API: Automate SEO Reporting and Monitoring With Python

The GSC Search Analytics API lets you pull performance data programmatically, build automated reports, and set up traffic drop alerts — no more manual CSV exports.

9 min read

// Developer Tools

Advanced Git for Developers: The Operations You Avoid But Shouldn't

Why JSON Mode Is Not Enough

OpenAI's JSON mode guarantees syntactically valid JSON — but it does not guarantee the JSON matches your schema. You get {"name": null} when you expected {"name": "Alice"}, or an extra key that breaks your downstream parser. There is no retry, no validation, no error message. You are left writing bespoke parsing logic for every model and every use case.

Instructor solves this by combining Pydantic models with automatic retry-on-validation-error. Define what you want, and Instructor loops until the LLM produces it — or raises after max_retries attempts.

Installation

pip install instructor

Basic Extraction

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class Person(BaseModel):
    name: str
    age: int
    email: str | None = None

person = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Person,
    messages=[{"role": "user", "content": "John Doe is 34 and works at john@acme.com"}],
)
print(person)  # Person(name='John Doe', age=34, email='john@acme.com')

The return value is a fully validated Pydantic model — not a dict, not a string.

Automatic Retry on Validation Errors

Add field-level validators and Instructor handles retries automatically:

from pydantic import field_validator

class CVData(BaseModel):
    name: str
    years_experience: int
    skills: list[str]

    @field_validator("years_experience")
    @classmethod
    def must_be_positive(cls, v: int) -> int:
        if v < 0:
            raise ValueError("years_experience must be non-negative")
        return v

cv = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=CVData,
    messages=[{"role": "user", "content": "Jane has 5 years exp in Python, SQL, and ML."}],
    max_retries=3,
)

If the model returns -5 for years_experience, Instructor sends the Pydantic error back to the model and asks it to fix the value — up to 3 times.

Multi-Provider Support

Instructor patches any OpenAI-compatible client:

pip install instructor anthropic

import anthropic
import instructor

client = instructor.from_anthropic(anthropic.Anthropic())
# Same API — response_model works identically

Works with Groq, Ollama (via openai client with base_url), Google Gemini, Mistral, and more.

Partial Streaming

Stream partial Pydantic objects as they are generated:

for partial_person in client.chat.completions.create_partial(
    model="gpt-4o-mini",
    response_model=Person,
    messages=[{"role": "user", "content": "Alice is 28, alice@example.com"}],
):
    print(partial_person)  # name='Alice' age=None email=None → ... → fully populated

Practical Example: Search Query Extraction

class SearchQuery(BaseModel):
    intent: str
    keywords: list[str]
    date_range: str | None = None
    max_results: int = 10

query = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=SearchQuery,
    messages=[{"role": "user", "content": "Find Python ML papers from last year, top 5"}],
)
# SearchQuery(intent='research', keywords=['Python', 'ML'], date_range='last year', max_results=5)

Full documentation at python.useinstructor.com.

Instructor: Extract Structured Data From Any LLM With Pydantic

Related Articles

Google Search Console API: Automate SEO Reporting and Monitoring With Python

Advanced Git for Developers: The Operations You Avoid But Shouldn't

Why JSON Mode Is Not Enough

Installation

Basic Extraction

Automatic Retry on Validation Errors

Multi-Provider Support

Partial Streaming

Practical Example: Search Query Extraction

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

pnpm vs npm vs Yarn: The Definitive Package Manager Comparison for 2026

Instructor: Extract Structured Data From Any LLM With Pydantic

Related Articles

Google Search Console API: Automate SEO Reporting and Monitoring With Python

Advanced Git for Developers: The Operations You Avoid But Shouldn't

Why JSON Mode Is Not Enough

Installation

Basic Extraction

Automatic Retry on Validation Errors

Multi-Provider Support

Partial Streaming

Practical Example: Search Query Extraction

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

pnpm vs npm vs Yarn: The Definitive Package Manager Comparison for 2026

The workspace your team
actually needs