Built for RAG From the Ground Up
Most LLMs treat retrieval-augmented generation as an afterthought — you inject documents into the context and hope the model cites them correctly. Cohere's Command R+ is different: it was trained specifically to generate grounded answers with inline citations that link directly to source passages.
This matters for enterprise applications where users need to verify AI answers against source documents — legal research, compliance, financial analysis, customer support with knowledge bases.
Model Specs
- 104 billion parameters
- 128k token context window
- Native RAG mode with citation generation
- Multi-step tool use (web search, database queries, custom APIs)
- Available via Cohere API or self-hosted with model weights
- Pricing: $2.50/1M input, $10.00/1M output
Basic RAG With Citations
import cohere
co = cohere.Client(api_key="your-cohere-api-key")
# Documents to ground the response
documents = [
{
"id": "policy-doc-1",
"text": "Employees may take up to 20 days of paid vacation per year. Unused days may be carried over up to 10 days maximum.",
"title": "HR Policy Manual Section 4.2"
},
{
"id": "policy-doc-2",
"text": "Sick leave is separate from vacation and limited to 10 days per calendar year with medical certification required beyond 3 consecutive days.",
"title": "HR Policy Manual Section 4.3"
}
]
response = co.chat(
model="command-r-plus-08-2024",
message="How many vacation days do I get and can I carry them over?",
documents=documents,
)
print(response.text)
print("
Citations:")
for citation in response.citations:
print(f" [{citation.start}:{citation.end}] -> {citation.document_ids}")
The model returns the answer plus precise character-level citations mapping each claim to its source document.
Multi-Step Tool Use
Command R+ can orchestrate multi-step research flows — search the web, query a database, call an API — before synthesizing a final answer:
tools = [
{
"name": "search_documents",
"description": "Search internal document database",
"parameter_definitions": {
"query": {"description": "Search query", "type": "str", "required": True}
}
},
{
"name": "lookup_employee",
"description": "Look up employee records by ID",
"parameter_definitions": {
"employee_id": {"description": "Employee ID", "type": "str", "required": True}
}
}
]
response = co.chat(
model="command-r-plus-08-2024",
message="What is Sarah Johnson's (ID: EMP-4821) remaining vacation balance?",
tools=tools,
force_single_step=False # Allow multi-step
)
# Response contains tool_calls to execute
for tool_call in response.tool_calls:
print(f"Call: {tool_call.name}({tool_call.parameters})")
Enterprise Connector
For teams using SharePoint, Confluence, Salesforce, or Google Drive, Cohere's connector framework integrates Command R+ with existing enterprise search — no custom RAG pipeline needed:
response = co.chat(
model="command-r-plus-08-2024",
message="Find all customer complaints about billing from last quarter.",
connectors=[{"id": "salesforce-connector"}]
)
Self-Hosting
Model weights are available on HuggingFace for teams requiring on-premises deployment. The model requires approximately 210GB of GPU memory in BF16 — 3× A100 80GB or equivalent.
When to Use Command R+ vs GPT-4o
Use Command R+ when:
- Your application requires verified citations on every claim
- You're building enterprise document Q&A where auditability matters
- Multi-step tool use with structured output is critical
- You need self-hosted deployment with full weights available
Use GPT-4o when:
- General-purpose reasoning and generation is the priority
- Vision capabilities are required
- You prioritize OpenAI's ecosystem and tooling
Summary
Command R+ is the best model for enterprise RAG applications where citation accuracy and source traceability are non-negotiable. The citation API simplifies what typically requires complex post-processing. Full docs at docs.cohere.com and weights at HuggingFace.