CrewAI: Building Multi-Agent Systems in Python

CrewAI lets you define agents with roles, assign them tasks, and have them collaborate. Here is when multi-agent beats single-agent, and when it does not.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 17, 2026

9 min read

// tags

#crewai#multi-agent#ai-agents#llm

FIG. ART-24

9 min read

“

CrewAI: Building Multi-Agent Systems in Python

// reading plan

sections

899

words

min read

// Machine Learning

GPT Architecture Explained: Beyond the Surface Level

GPT's autoregressive, decoder-only design enables text generation at scale. Here is how it actually works -- from pretraining data to emergent capabilities to GPT-4o.

9 min read

// Machine Learning

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

CrewAI is a Python framework for building multi-agent systems where multiple LLM-powered agents collaborate to complete a task. Each agent has a role, goal, and backstory. Tasks are assigned to agents. A crew is a group of agents that work together, passing outputs from one agent to the next, delegating sub-tasks, and collectively producing a final output. CrewAI is genuinely useful for workflows where parallel research, specialized reasoning, or sequential processing by domain-specific agents produces better results than a single agent. It is not useful for simple tasks, and it is expensive on token count: a five-agent crew processing a task will often use 5-20x the tokens of a single well-prompted agent.

The Core Concepts

Agent: An LLM-powered worker with a specific role, goal, and backstory.

from crewai import Agent

researcher = Agent(
    role="AI Research Analyst",
    goal="Find accurate, current information about AI topics",
    backstory="You are an expert AI researcher who reads and synthesizes technical literature.",
    verbose=True,
    allow_delegation=False,
    llm="gpt-4o-mini"
)

Task: A unit of work assigned to an agent.

from crewai import Task

research_task = Task(
    description="Research the current state of open source LLMs in 2026. Focus on: top models by capability, performance benchmarks, and licensing.",
    agent=researcher,
    expected_output="A structured summary covering at least 5 models with benchmark scores and licensing info"
)

Crew: The group of agents working together.

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, writing_task, review_task],
    process=Process.sequential,  # Tasks run in order
    verbose=True
)

result = crew.kickoff()
print(result.raw)

Process Types

Sequential: Tasks run in order. Each task's output is available to subsequent tasks. Simplest to reason about.

Hierarchical: A manager agent (powered by an LLM) decides which agents to use, assigns tasks, and synthesizes results. More flexible but less predictable and more expensive.

For most use cases, start with sequential. Hierarchical processes add complexity that is rarely worth it until you have validated that sequential does not meet your needs.

Tools Integration

Agents can use tools (functions they can call during execution):

from crewai_tools import SerperDevTool, FileReadTool

search_tool = SerperDevTool()  # Google search via Serper API

researcher = Agent(
    role="Research Analyst",
    goal="Find current information",
    backstory="You research AI topics using web search.",
    tools=[search_tool],
    llm="gpt-4o-mini"
)

Built-in CrewAI tools: web search (via Serper), website scraping, file read/write, code execution, GitHub search. You can also define custom tools as Python functions.

Real Example: Content Creation Crew

A practical crew for creating a blog post:

from crewai import Agent, Task, Crew, Process

# Agents
researcher = Agent(
    role="Content Researcher",
    goal="Research the topic thoroughly and gather accurate facts",
    backstory="Expert at finding and verifying technical information",
    llm="gpt-4o-mini"
)

writer = Agent(
    role="Technical Writer",
    goal="Write clear, accurate, and engaging technical content",
    backstory="Experienced technical writer who explains complex topics simply",
    llm="gpt-4o-mini"
)

editor = Agent(
    role="Editor",
    goal="Ensure accuracy, clarity, and proper structure",
    backstory="Meticulous editor who catches errors and improves readability",
    llm="gpt-4o-mini"
)

# Tasks
research_task = Task(
    description="Research: What are the top 5 open source LLMs in 2026? Include benchmark scores.",
    agent=researcher,
    expected_output="Structured research notes with model names, benchmarks, and sources"
)

writing_task = Task(
    description="Write a 1000-word blog post using the research provided. Focus on practical developer use cases.",
    agent=writer,
    expected_output="A complete blog post draft",
    context=[research_task]  # Has access to research output
)

editing_task = Task(
    description="Review the blog post for accuracy, clarity, and structure. Provide the final version.",
    agent=editor,
    expected_output="Final edited blog post ready to publish",
    context=[writing_task]
)

# Run
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential
)

result = crew.kickoff()

Limitations

Non-deterministic outputs. Multi-agent systems amplify the non-determinism of individual LLM calls. The same crew run twice can produce significantly different outputs. For applications requiring consistent output format, add output validation and retry logic.

Token cost. Each agent call is a full LLM inference. A 3-agent sequential crew processes a task 3 times. If your base task costs 1,000 tokens, the crew costs 3,000-10,000 tokens (agents also receive the previous agents' outputs as context, which grows through the pipeline). Budget for 5-15x token usage compared to a single-agent approach.

Debugging difficulty. When the crew produces a bad output, identifying which agent failed and why requires reading through verbose logs. The abstraction makes debugging harder than a standard Python call stack.

Speed. Sequential agent execution means latency stacks: 3 agents at 3 seconds each = 9+ seconds minimum. Parallel execution (via hierarchical process or async tools) helps but adds complexity.

When CrewAI Beats Single Agents

Research tasks requiring breadth and depth simultaneously. A research crew with a searcher, a summarizer, and a fact-checker produces better research than a single agent because the specialization focuses each agent on one part of the problem.

Content workflows with distinct creation and review phases. The writer/editor pattern works well because editing is a genuinely different task from writing, and separate agents with different prompts perform better than a single agent trying to do both.

Tasks requiring parallel information gathering. If your task needs information from 5 sources simultaneously, 5 parallel agents can gather and summarize them concurrently. A single agent would process them sequentially.

Keep Reading

DSPy Declarative LLM Guide — An alternative approach to multi-step LLM pipelines
LangChain vs LlamaIndex Comparison — Other frameworks in the LLM application space
Cutting LLM API Costs — Managing the token costs that multi-agent systems amplify

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.

CrewAI: Building Multi-Agent Systems in Python

Related Articles

GPT Architecture Explained: Beyond the Surface Level

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

The Core Concepts

Process Types

Tools Integration

Real Example: Content Creation Crew

Limitations

When CrewAI Beats Single Agents

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Few-Shot Example Selection: How to Choose Examples That Actually Help

CrewAI: Building Multi-Agent Systems in Python

Related Articles

GPT Architecture Explained: Beyond the Surface Level

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

The Core Concepts

Process Types

Tools Integration

Real Example: Content Creation Crew

Limitations

When CrewAI Beats Single Agents

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Few-Shot Example Selection: How to Choose Examples That Actually Help

The workspace your team
actually needs