Inside Athenic: How We Built a Multi-Agent Research System
Deep dive into Athenic's multi-agent architecture -how we orchestrate research agents across data sources to deliver comprehensive market intelligence in minutes.
Deep dive into Athenic's multi-agent architecture -how we orchestrate research agents across data sources to deliver comprehensive market intelligence in minutes.
TL;DR
Jump to Why multi-agent architecture · Jump to System design · Jump to Agent types · Jump to Orchestration · Jump to Challenges · Jump to Performance
When we started building Athenic, we knew single-agent LLMs couldn't deliver the research quality startups need. A single GPT-4 call can't simultaneously:
Multi-agent systems solve this by deploying specialist agents -each optimised for one task -that collaborate to deliver comprehensive results. Think of it like a research team: one person handles web search, another analyses documents, a third synthesises findings, and a coordinator ensures everyone stays aligned.
Here's how we architected Athenic's multi-agent research system, the technical challenges we faced, and the design decisions that let us deliver startup intelligence in minutes instead of hours.
Key takeaways
- Multi-agent systems outperform monolithic LLMs for complex tasks requiring diverse skills (web search, structured data analysis, document parsing).
- Our architecture: Orchestrator routes tasks → Specialists execute in parallel → Synthesiser aggregates → Quality validator ensures accuracy.
- Key challenge: Agent coordination overhead. Solution: Shared context layer + asynchronous execution with dependency tracking.
Traditional approach (single LLM):
User: "Research competitor X's pricing, recent funding, and customer sentiment. Compare to our product."
Single GPT-4 call: Attempts to web search, hallucinates data, provides shallow analysis. Accuracy: ~60–70%.
Why it fails:
Athenic approach:
User: "Research competitor X's pricing, recent funding, and customer sentiment. Compare to our product."
Orchestrator agent: Breaks into sub-tasks:
- Web research agent: Find competitor pricing page, scrape tiers.
- Funding agent: Query Crunchbase API for latest funding round.
- Sentiment agent: Scrape Twitter, Reddit, G2 for customer feedback.
- Internal agent: Pull our pricing from database.
- Synthesis agent: Aggregate findings, generate comparison report.
Result: Comprehensive report with citations, delivered in 12 minutes. Accuracy: 92%.
Why it works:
<!-- Single-agent (sequential) -->
<text x="50" y="70" fill="#94a3b8" font-size="14">Single-Agent (Sequential)</text>
<rect x="50" y="80" width="300" height="40" rx="8" fill="#ef4444" opacity="0.7" />
<rect x="50" y="80" width="60" height="40" rx="8" fill="#f59e0b" />
<text x="60" y="105" fill="#0f172a" font-size="10">Web</text>
<rect x="110" y="80" width="60" height="40" rx="8" fill="#a855f7" />
<text x="120" y="105" fill="#fff" font-size="10">Data</text>
<rect x="170" y="80" width="60" height="40" rx="8" fill="#22d3ee" />
<text x="175" y="105" fill="#0f172a" font-size="10">Docs</text>
<rect x="230" y="80" width="60" height="40" rx="8" fill="#10b981" />
<text x="238" y="105" fill="#0f172a" font-size="10">Synth</text>
<rect x="290" y="80" width="60" height="40" rx="8" fill="#6366f1" />
<text x="300" y="105" fill="#fff" font-size="9">Output</text>
<text x="50" y="140" fill="#ef4444" font-size="12">⏱ 20+ minutes (sequential)</text>
<!-- Multi-agent (parallel) -->
<text x="410" y="70" fill="#94a3b8" font-size="14">Multi-Agent (Parallel)</text>
<rect x="410" y="80" width="60" height="40" rx="8" fill="#f59e0b" opacity="0.8" />
<text x="420" y="105" fill="#0f172a" font-size="10">Web</text>
<rect x="480" y="80" width="60" height="40" rx="8" fill="#a855f7" opacity="0.8" />
<text x="490" y="105" fill="#fff" font-size="10">Data</text>
<rect x="550" y="80" width="60" height="40" rx="8" fill="#22d3ee" opacity="0.8" />
<text x="555" y="105" fill="#0f172a" font-size="10">Docs</text>
<rect x="620" y="80" width="60" height="40" rx="8" fill="#10b981" opacity="0.8" />
<text x="628" y="105" fill="#0f172a" font-size="10">Synth</text>
<rect x="690" y="80" width="60" height="40" rx="8" fill="#6366f1" opacity="0.8" />
<text x="700" y="105" fill="#fff" font-size="9">Output</text>
<text x="410" y="140" fill="#10b981" font-size="12">⏱ 5 minutes (parallel execution)</text>
<!-- Arrows showing parallel execution -->
<line x1="410" y1="130" x2="680" y2="130" stroke="#10b981" stroke-width="2" />
<text x="490" y="155" fill="#10b981" font-size="11">All agents run simultaneously</text>
1. Orchestrator Agent
2. Specialist Agents Each agent has a narrow domain:
3. Synthesis Agent
4. Quality Agent
5. Shared Context Layer
<!-- User Query -->
<rect x="320" y="60" width="120" height="40" rx="10" fill="#38bdf8" opacity="0.8" />
<text x="345" y="85" fill="#0f172a" font-size="12">User Query</text>
<!-- Orchestrator -->
<rect x="300" y="130" width="160" height="50" rx="10" fill="#f59e0b" opacity="0.8" />
<text x="340" y="160" fill="#0f172a" font-size="13">Orchestrator Agent</text>
<!-- Specialist Agents -->
<rect x="50" y="220" width="100" height="40" rx="8" fill="#a855f7" opacity="0.8" />
<text x="70" y="245" fill="#fff" font-size="10">Web Agent</text>
<rect x="170" y="220" width="100" height="40" rx="8" fill="#22d3ee" opacity="0.8" />
<text x="190" y="245" fill="#0f172a" font-size="10">DB Agent</text>
<rect x="290" y="220" width="100" height="40" rx="8" fill="#10b981" opacity="0.8" />
<text x="305" y="245" fill="#0f172a" font-size="10">Doc Agent</text>
<rect x="410" y="220" width="100" height="40" rx="8" fill="#6366f1" opacity="0.8" />
<text x="430" y="245" fill="#fff" font-size="10">API Agent</text>
<rect x="530" y="220" width="120" height="40" rx="8" fill="#ef4444" opacity="0.8" />
<text x="540" y="245" fill="#fff" font-size="10">Sentiment Agent</text>
<!-- Synthesis -->
<rect x="260" y="280" width="240" height="10" rx="5" fill="#cbd5e1" />
<text x="310" y="275" fill="#cbd5e1" font-size="10">Shared Context Layer (Supabase)</text>
<!-- Arrows -->
<polyline points="380,100 380,130" stroke="#f8fafc" stroke-width="3" />
<polyline points="380,180 100,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 220,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 340,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 460,220" stroke="#cbd5e1" stroke-width="2" />
<polyline points="380,180 590,220" stroke="#cbd5e1" stroke-width="2" />
Role: Task planner and coordinator.
Inputs: User query, conversation history.
Outputs: Task decomposition plan, agent assignments.
Example:
User query: "Research Notion's pricing strategy and compare to ours."
Orchestrator plan:
Tasks: 1. Web Agent: Scrape Notion pricing page → extract tiers, features, prices. 2. Database Agent: Query our pricing table → get our tiers. 3. Synthesis Agent: Compare Notion vs us → generate markdown table + analysis.
Tech stack:
Role: Search the web, scrape pages, extract structured data.
Tools:
Example task:
"Find Stripe's latest funding round amount and date."
Execution:
Output:
{
"company": "Stripe",
"funding_round": "Series I",
"amount_usd": 6500000000,
"date": "2023-03-14",
"investors": ["Thrive Capital", "General Catalyst"],
"sources": ["https://crunchbase.com/...", "https://techcrunch.com/..."]
}
Role: Query internal databases (CRM, analytics, knowledge base).
Tools:
Example task:
"How many customers signed up last month?"
Execution:
SELECT COUNT(*) FROM customers WHERE created_at >= '2025-07-01' AND created_at < '2025-08-01';{"count": 127}.Safety: Queries are sandboxed (read-only access, row-level security).
Role: Parse and analyse uploaded documents (PDFs, DOCX, spreadsheets).
Tools:
Example task:
"Extract key metrics from this investor deck PDF."
Execution:
Role: Call external APIs (Crunchbase, LinkedIn, Twitter, PubMed).
Integration approach:
Example task:
"Get company profile for startup X from Crunchbase."
Execution:
get_company(name="Startup X").Role: Aggregate findings from specialist agents into cohesive narrative.
Inputs: Outputs from specialist agents (JSON, text, tables).
Output: Markdown report with citations.
Example:
Inputs:
- Web Agent: Notion pricing tiers.
- Database Agent: Our pricing tiers.
Synthesis output:
# Notion vs Our Product: Pricing Comparison Notion offers 4 tiers: Free, Plus ($8/user/mo), Business ($15), Enterprise (custom). Our product offers 3 tiers: Starter (free), Pro ($12), Enterprise ($25). **Key differences:** - Notion's Plus tier is 33% cheaper than our Pro. - We offer more integrations at Pro tier (50+ vs Notion's 20). - Notion targets broader market (individuals + teams); we focus on B2B. **Recommendation:** Consider lowering Pro tier to $10 to match Notion's positioning. Sources: [1] Notion pricing page, [2] Internal database
Some tasks depend on others. Example:
"Research competitor pricing, then recommend our pricing changes."
Dependency graph:
Solution: Task graph with dependency tracking.
Implementation (simplified):
class TaskGraph:
def __init__(self):
self.tasks = {}
self.dependencies = {}
def add_task(self, task_id, agent, depends_on=[]):
self.tasks[task_id] = {"agent": agent, "status": "pending"}
self.dependencies[task_id] = depends_on
async def execute(self):
"""Execute tasks respecting dependencies."""
completed = set()
while len(completed) < len(self.tasks):
# Find tasks ready to run (all dependencies met)
ready = [
tid for tid in self.tasks
if self.tasks[tid]["status"] == "pending"
and all(dep in completed for dep in self.dependencies[tid])
]
# Run ready tasks in parallel
results = await asyncio.gather(*[
self.run_task(tid) for tid in ready
])
completed.update(ready)
return self.get_final_output()
async def run_task(self, task_id):
agent = self.tasks[task_id]["agent"]
result = await agent.execute()
self.tasks[task_id]["status"] = "completed"
self.tasks[task_id]["result"] = result
return result
Agents need to share information. Example:
Solution: Shared context layer (Supabase).
Implementation:
research_context table.CREATE TABLE research_context (
id UUID PRIMARY KEY,
research_job_id UUID,
agent_id TEXT,
task_id TEXT,
data JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Agents query context:
def get_context(research_job_id, task_id):
"""Retrieve context for a task."""
return supabase.table("research_context").select("*").eq("research_job_id", research_job_id).eq("task_id", task_id).execute()
Problem: Orchestrating 5+ agents adds latency (planning, routing, waiting for dependencies).
Initial approach: Sequential execution → 20+ min per query.
Solution: Parallel execution with dependency tracking → 5–7 min.
Lesson: Optimise for parallelism. Only enforce dependencies where truly necessary.
Problem: If Web Agent fails (rate limit, timeout), entire research job fails.
Initial approach: Hard failures → poor user experience.
Solution: Graceful degradation.
Example output:
"We found competitor pricing on 3 of 5 sites. Unable to access Site X (timeout) and Site Y (rate limit). Recommendations based on available data."
Problem: Agents sometimes hallucinate or return low-confidence answers.
Initial approach: No validation → 78% accuracy.
Solution: Quality Agent validates output.
Result: Accuracy improved to 92%.
Problem: 5+ LLM calls per research job → $0.50–$2 per query.
Solution:
Result: Average cost: $0.30/query (down from $1.20).
Currently text-only. Adding:
Instead of reactive (user asks → we research), build proactive agents:
Multi-user research projects:
Building Athenic's multi-agent research system taught us that specialisation beats generalisation. By deploying narrow, expert agents that collaborate through a shared context layer, we deliver research quality that matches human analysts -in 1/20th the time. If you're building multi-agent systems, start simple (2–3 agents), optimise for parallelism, and invest in quality validation from day one.