What Sonar Does Differently
Standard LLM APIs return text generated from training data with a knowledge cutoff. Perplexity Sonar returns answers grounded in live web search results, with citations in the response body. The model is not just searching and summarizing — it integrates search results into coherent prose with numbered references.
This replaces a common pattern: LLM + search tool + citation extraction + response formatting. Sonar does all of it in one API call.
Sonar vs Sonar Pro
| | Sonar | Sonar Pro | |---|---|---| | Underlying LLM | Smaller Sonar model | Larger Sonar model | | Search sources | Standard index | Expanded sources, more recent | | Price | $1/1000 searches + $1/1M tokens | $5/1000 searches + $5/1M tokens | | Citations | Yes | Yes, more thorough | | Context | 127k | 127k |
For most use cases, Sonar is sufficient. Use Sonar Pro when you need maximum coverage on a fast-moving topic or when citation accuracy is critical (legal research, medical information, financial news).
OpenAI-Compatible API
from openai import OpenAI
# Drop-in swap from OpenAI
client = OpenAI(
api_key=os.environ["PERPLEXITY_API_KEY"],
base_url="https://api.perplexity.ai",
)
response = client.chat.completions.create(
model="sonar",
messages=[
{
"role": "system",
"content": "You are a research assistant. Cite your sources.",
},
{
"role": "user",
"content": "What are the latest benchmark results for GPT-4.1 vs Claude 3.7?",
},
],
)
answer = response.choices[0].message.content
# Citations are available in response.citations (Perplexity extension)
citations = getattr(response, "citations", [])
print(answer)
for i, citation in enumerate(citations, 1):
print(f"[{i}] {citation}")
The search_recency_filter Parameter
For time-sensitive queries, the search_recency_filter parameter constrains Sonar to only return sources from a specific time window:
response = client.chat.completions.create(
model="sonar",
messages=[
{"role": "user", "content": "What AI models were released this week?"},
],
extra_body={
"search_recency_filter": "week", # "month", "week", "day", "hour"
"return_images": False,
"return_related_questions": True,
},
)
Setting search_recency_filter: "day" ensures you only get information from the past 24 hours — critical for market monitoring, breaking news summarization, or competitive intelligence.
Use Cases
Competitive analysis: "Summarize the key announcements from [competitor] in the last 30 days" — run this on a schedule and store results for trend tracking.
News monitoring: "What are the top stories about AI regulation this week?" — replaces a custom pipeline of news API + LLM summarization + deduplication.
Live data extraction: "What is the current valuation of [company] according to recent news?" — works where data is too recent for training data.
Research grounding: When building a RAG system on proprietary documents, use Sonar to supplement with real-time web context that your documents may not cover.
When to Use Sonar vs Standard RAG
Use Sonar when:
- Data changes frequently (news, prices, regulations, product releases)
- You do not have a corpus to index
- Setup time matters (Sonar is ready in one API call, RAG requires indexing infrastructure)
Use standard RAG when:
- Data is proprietary and cannot be sent to a third-party search index
- You need exact retrieval from specific documents
- Volume is high enough that $5/1000 searches is cost-prohibitive
- You need strict citation to specific internal documents