Reliably parsing LLM outputs is one of the most underestimated challenges in production AI applications. You ask for JSON, you get JSON plus an explanation paragraph. You ask for a list, you get a list with inconsistent formatting. You add schema instructions, success rate goes from 70% to 85% — still not good enough. Here is the full spectrum of approaches and where each lands on reliability.
The Reliability Spectrum
Not all output formats are equally parseable. The reliability of structured output is a function of how much you constrain the model's response format.
Level 0 — Free-form prose (0% parseable programmatically) No format instruction. "Describe the sentiment of this review." The model writes a paragraph. You cannot parse this without another model call.
Level 1 — Format instruction in prose (50-70% reliable) "Respond with only JSON." The model usually does, but occasionally adds a preamble ("Here is the JSON you requested:") or wraps the JSON in markdown code blocks.
Level 2 — JSON schema in prompt (80-88% reliable) You provide the exact JSON schema the model should follow, with field names and types. Reliability improves significantly but still fails when the model cannot determine a value and generates an explanation instead of null.
Level 3 — Function calling or tool use (95-99% reliable) Native function calling (OpenAI), tool use (Anthropic), or function declarations (Google) route the model's structured output through a separate mechanism that guarantees format compliance. The model cannot return free-form text — only valid function arguments.
Level 4 — Structured output with schema validation (99%+ reliable + type safety) OpenAI's Structured Outputs feature (available in gpt-4o-2024-08-06+) and similar provider features guarantee 100% schema compliance at the API level. The API rejects responses that do not match your schema and retries internally. Combine this with Zod or Pydantic validation in your application for end-to-end type safety.
Level 1: Extracting JSON from Markdown Code Blocks
When you ask a model to return JSON and it wraps it in a markdown code block, you can extract the content with a regular expression:
import re
import json
def extract_json(text: str) -> dict:
# Try to parse the whole response as JSON first
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# Look for JSON in a markdown code block
match = re.search(r'```(?:json)?s*({.*?}|[.*?])s*```', text, re.DOTALL)
if match:
return json.loads(match.group(1))
# Look for any JSON object or array in the text
match = re.search(r'({.*?}|[.*?])', text, re.DOTALL)
if match:
return json.loads(match.group(1))
raise ValueError(f"Could not extract JSON from response: {text[:200]}")
This handles the three most common cases: valid JSON response, JSON in a code block, and JSON embedded in prose.