Inside Athenic: How We Built a Multi-Agent Research System
Deep dive into Athenic's multi-agent architecture -how we orchestrate research agents across data sources to deliver comprehensive market intelligence in minutes.

Deep dive into Athenic's multi-agent architecture -how we orchestrate research agents across data sources to deliver comprehensive market intelligence in minutes.

TL;DR
Jump to Why multi-agent architecture · Jump to System design · Jump to Agent types · Jump to Orchestration · Jump to Challenges · Jump to Performance
When we started building Athenic, we knew single-agent LLMs couldn't deliver the research quality startups need. A single GPT-4 call can't simultaneously:
Multi-agent systems solve this by deploying specialist agents -each optimised for one task -that collaborate to deliver comprehensive results. Think of it like a research team: one person handles web search, another analyses documents, a third synthesises findings, and a coordinator ensures everyone stays aligned.
Here's how we architected Athenic's multi-agent research system, the technical challenges we faced, and the design decisions that let us deliver startup intelligence in minutes instead of hours.
Key takeaways
- Multi-agent systems outperform monolithic LLMs for complex tasks requiring diverse skills (web search, structured data analysis, document parsing).
- Our architecture: Orchestrator routes tasks → Specialists execute in parallel → Synthesiser aggregates → Quality validator ensures accuracy.
- Key challenge: Agent coordination overhead. Solution: Shared context layer + asynchronous execution with dependency tracking.
Traditional approach (single LLM):
User: "Research competitor X's pricing, recent funding, and customer sentiment. Compare to our product."
Single GPT-4 call: Attempts to web search, hallucinates data, provides shallow analysis. Accuracy: ~60–70%.
Why it fails:
Athenic approach:
User: "Research competitor X's pricing, recent funding, and customer sentiment. Compare to our product."
Orchestrator agent: Breaks into sub-tasks:
- Web research agent: Find competitor pricing page, scrape tiers.
- Funding agent: Query Crunchbase API for latest funding round.
- Sentiment agent: Scrape Twitter, Reddit, G2 for customer feedback.
- Internal agent: Pull our pricing from database.
- Synthesis agent: Aggregate findings, generate comparison report.
Result: Comprehensive report with citations, delivered in 12 minutes. Accuracy: 92%.
Why it works:
<!-- Single-agent (sequential) -->
<text x="50" y="70" fill="#94a3b8" font-size="14">Single-Agent (Sequential)</text>
<rect x="50" y="80" width="300" height="40" rx="8" fill="#ef4444" opacity="0.7" />
<rect x="50" y="80" width="60" height="40" rx="8" fill="#f59e0b" />
<text x="60" y="105" fill="#0f172a" font-size="10">Web</text>
<rect x="110" y="80" width="60" height="40" rx="8" fill="#a855f7" />
<text x="120" y="105" fill="#fff" font-size="10">Data</text>
<rect x="170" y="80" width="60" height="40" rx="8" fill="#22d3ee" />
<text x="175" y="105" fill="#0f172a" font-size="10">Docs</text>
<rect x="230" y="80" width="60" height="40" rx="8" fill="#10b981" />
<text x="238" y="105" fill="#0f172a" font-size="10">Synth</text>
<rect x="290" y="80" width="60" height="40" rx="8" fill="#6366f1" />
<text x="300" y="105" fill="#fff" font-size="9">Output</text>
<text x="50" y="140" fill="#ef4444" font-size="12">⏱ 20+ minutes (sequential)</text>
<!-- Multi-agent (parallel) -->
<text x="410" y="70" fill="#94a3b8" font-size="14">Multi-Agent (Parallel)</text>
<rect x="410" y="80" width="60" height="40" rx="8" fill="#f59e0b" opacity="0.8" />
<text x="420" y="105" fill="#0f172a" font-size="10">Web</text>
<rect x="480" y="80" width="60" height="40" rx="8" fill="#a855f7" opacity="0.8" />
<text x="490" y="105" fill="#fff" font-size="10">Data</text>
<rect x="550" y="80" width="60" height="40" rx="8" fill="#22d3ee" opacity="0.8" />
<text x="555" y="105" fill="#0f172a" font-size="10">Docs</text>
<rect x="620" y="80" width="60" height="40" rx="8" fill="#10b981" opacity="0.8" />
<text x="628" y="105" fill="#0f172a" font-size="10">Synth</text>
<rect x="690" y="80" width="60" height="40" rx="8" fill="#6366f1" opacity="0.8" />
<text x="700" y="105" fill="#fff" font-size="9">Output</text>
<text x="410" y="140" fill="#10b981" font-size="12">⏱ 5 minutes (parallel execution)</text>
<!-- Arrows showing parallel execution -->
<line x1="410" y1="130" x2="680" y2="130" stroke="#10b981" stroke-width="2" />
<text x="490" y="155" fill="#10b981" font-size="11">All agents run simultaneously</text>
"The companies winning with AI agents aren't the ones with the most sophisticated models. They're the ones who've figured out the governance and handoff patterns between human and machine." - Dr. Elena Rodriguez, VP of Applied AI at Google DeepMind
1. Orchestrator Agent
2. Specialist Agents Each agent has a narrow domain:
3. Synthesis Agent
4. Quality Agent
5. Shared Context Layer
<!-- User Query -->
<rect x="320" y="60" width="120" height="40" rx="10" fill="#38bdf8" opacity="0.8" />
<text x="345" y="85" fill="#0f172a" font-size="12">User Query</text>
<!-- Orchestrator -->
<rect x="300" y="130" width="160" height="50" rx="10" fill="#f59e0b" opacity="0.8" />
<text x="340" y="160" fill="#0f172a" font-size="13">Orchestrator Agent</text>
<!-- Specialist Agents -->
<rect x="50" y="220" width="100" height="40" rx="8" fill="#a855f7" opacity="0.8" />
<text x="70" y="245" fill="#fff" font-size="10">Web Agent</text>
<rect x="170" y="220" width="100" height="40" rx="8" fill="#22d3ee" opacity="0.8" />
<text x="190" y="245" fill="#0f172a" font-size="10">DB Agent</text>
<rect x="290" y="220" width="100" height="40" rx="8" fill="#10b981" opacity="0.8" />
<text x="305" y="245" fill="#0f172a" font-size="10">Doc Agent</text>
<rect x="410" y="220" width="100" height="40" rx="8" fill="#6366f1" opacity="0.8" />
<text x="430" y="245" fill="#fff" font-size="10">API Agent</text>
<rect x="530" y="220" width="120" height="40" rx="8" fill="#ef4444" opacity="0.8" />
<text x="540" y="245" fill="#fff" font-size="10">Sentiment Agent</text>
<!-- Synthesis -->
<rect x="260" y="280" width="240" height="10" rx="5" fill="#cbd5e1" />
<text x="310" y="275" fill="#cbd5e1" font-size="10">Shared Context Layer (Supabase)</text>
<!-- Arrows -->
<polyline points="380,100 380,130" stroke="#f8fafc" stroke-width="3" />
<polyline points="380,180 100,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 220,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 340,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 460,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 590,220" stroke="#cbd5e1" stroke-width="2" />
Role: Task planner and coordinator.
Inputs: User query, conversation history.
Outputs: Task decomposition plan, agent assignments.
Example:
User query: "Research Notion's pricing strategy and compare to ours."
Orchestrator plan:
Tasks: 1. Web Agent: Scrape Notion pricing page → extract tiers, features, prices. 2. Database Agent: Query our pricing table → get our tiers. 3. Synthesis Agent: Compare Notion vs us → generate markdown table + analysis.
Tech stack:
Role: Search the web, scrape pages, extract structured data.
Tools:
Example task:
"Find Stripe's latest funding round amount and date."
Execution:
Output:
{
"company": "Stripe",
"funding_round": "Series I",
"amount_usd": 6500000000,
"date": "2023-03-14",
"investors": ["Thrive Capital", "General Catalyst"],
"sources": ["https://crunchbase.com/...", "https://techcrunch.com/..."
}
Role: Query internal databases (CRM, analytics, knowledge base).
Tools:
Example task:
"How many customers signed up last month?"
Execution:
SELECT COUNT(*) FROM customers WHERE created_at >= '2025-07-01' AND created_at < '2025-08-01';{"count": 127}.Safety: Queries are sandboxed (read-only access, row-level security).
Role: Parse and analyse uploaded documents (PDFs, DOCX, spreadsheets).
Tools:
Example task:
"Extract key metrics from this investor deck PDF."
Execution:
Role: Call external APIs (Crunchbase, LinkedIn, Twitter, PubMed).
Integration approach:
Example task:
"Get company profile for startup X from Crunchbase."
Execution:
get_company(name="Startup X").Role: Aggregate findings from specialist agents into cohesive narrative.
Inputs: Outputs from specialist agents (JSON, text, tables).
Output: Markdown report with citations.
Example:
Inputs:
- Web Agent: Notion pricing tiers.
- Database Agent: Our pricing tiers.
Synthesis output:
# Notion vs Our Product: Pricing Comparison Notion offers 4 tiers: Free, Plus ($8/user/mo), Business ($15), Enterprise (custom). Our product offers 3 tiers: Starter (free), Pro ($12), Enterprise ($25). **Key differences:** - Notion's Plus tier is 33% cheaper than our Pro. - We offer more integrations at Pro tier (50+ vs Notion's 20). - Notion targets broader market (individuals + teams); we focus on B2B. **Recommendation:** Consider lowering Pro tier to $10 to match Notion's positioning. Sources: [1] Notion pricing page, [2] Internal database
Some tasks depend on others. Example:
"Research competitor pricing, then recommend our pricing changes."
Dependency graph:
Solution: Task graph with dependency tracking.
Implementation (simplified):
class TaskGraph:
def __init__(self):
self.tasks = {}
self.dependencies = {}
def add_task(self, task_id, agent, depends_on=[]):
self.tasks[task_id] = {"agent": agent, "status": "pending"}
self.dependencies[task_id] = depends_on
async def execute(self):
"""Execute tasks respecting dependencies."""
completed = set()
while len(completed) < len(self.tasks):
# Find tasks ready to run (all dependencies met)
ready = [
tid for tid in self.tasks
if self.tasks[tid]["status"] == "pending"
and all(dep in completed for dep in self.dependencies[tid])
]
# Run ready tasks in parallel
results = await asyncio.gather(*[
self.run_task(tid) for tid in ready
])
completed.update(ready)
return self.get_final_output()
async def run_task(self, task_id):
agent = self.tasks[task_id]["agent"]
result = await agent.execute()
self.tasks[task_id]["status"] = "completed"
self.tasks[task_id]["result"] = result
return result
Agents need to share information. Example:
Solution: Shared context layer (Supabase).
Implementation:
research_context table.CREATE TABLE research_context (
id UUID PRIMARY KEY,
research_job_id UUID,
agent_id TEXT,
task_id TEXT,
data JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Agents query context:
def get_context(research_job_id, task_id):
"""Retrieve context for a task."""
return supabase.table("research_context").select("*").eq("research_job_id", research_job_id).eq("task_id", task_id).execute()
Problem: Orchestrating 5+ agents adds latency (planning, routing, waiting for dependencies).
Initial approach: Sequential execution → 20+ min per query.
Solution: Parallel execution with dependency tracking → 5–7 min.
Lesson: Optimise for parallelism. Only enforce dependencies where truly necessary.
Problem: If Web Agent fails (rate limit, timeout), entire research job fails.
Initial approach: Hard failures → poor user experience.
Solution: Graceful degradation.
Example output:
"We found competitor pricing on 3 of 5 sites. Unable to access Site X (timeout) and Site Y (rate limit). Recommendations based on available data."
Problem: Agents sometimes hallucinate or return low-confidence answers.
Initial approach: No validation → 78% accuracy.
Solution: Quality Agent validates output.
Result: Accuracy improved to 92%.
Problem: 5+ LLM calls per research job → $0.50–$2 per query.
Solution:
Result: Average cost: $0.30/query (down from $1.20).
Currently text-only. Adding:
Instead of reactive (user asks → we research), build proactive agents:
Multi-user research projects:
Building Athenic's multi-agent research system taught us that specialisation beats generalisation. By deploying narrow, expert agents that collaborate through a shared context layer, we deliver research quality that matches human analysts -in 1/20th the time. If you're building multi-agent systems, start simple (2–3 agents), optimise for parallelism, and invest in quality validation from day one.
Q: How do AI agents handle errors and edge cases?
Well-designed agent systems include fallback mechanisms, human-in-the-loop escalation, and retry logic. The key is defining clear boundaries for autonomous action versus requiring human approval for sensitive or unusual situations.
Q: What skills do I need to build AI agent systems?
You don't need deep AI expertise to implement agent workflows. Basic understanding of APIs, workflow design, and prompt engineering is sufficient for most use cases. More complex systems benefit from software engineering experience, particularly around error handling and monitoring.
Q: What's the typical ROI timeline for AI agent implementations?
Most organisations see positive ROI within 3-6 months of deployment. Initial productivity gains of 20-40% are common, with improvements compounding as teams optimise prompts and workflows based on production experience.