TL;DR
- Domain-specific agents: AI specialized for one industry (legal, medical, financial, etc.) vs general-purpose.
- Why specialize: General LLMs know nothing about your company's specific processes, terminology, compliance requirements.
- Three approaches: RAG (retrieve domain docs), Fine-tuning (retrain on domain data), Hybrid (both).
- RAG: Faster to implement, easier to update, works for 80% of cases. Start here.
- Fine-tuning: Better performance on domain-specific tasks, required for highly specialized language (legal contracts, medical diagnosis).
- Compliance: HIPAA (medical), SOC 2 (financial), bar rules (legal). Must-have for regulated industries.
- Real data: Domain-specific agents achieve 91% accuracy vs 73% for general agents on specialized tasks.
Building Domain-Specific AI Agents
General-purpose agent:
User: "Review this contract for risks"
Agent: "I see several clauses. Standard liability terms. Indemnification section looks normal."
Misses: Specific legal risks, jurisdiction issues, non-standard clauses.
Domain-specific legal agent:
User: "Review this contract for risks"
Agent: "Found 3 risks:
1. Indemnification clause is one-sided (unusual for SaaS agreements)
2. Limitation of liability excludes IP infringement (red flag)
3. Jurisdiction clause specifies Delaware (review your incorporation state)"
Better: Understands legal nuances, industry standards, specific risk patterns.
Why Domain Specialization Matters
Problem with general LLMs:
- Trained on internet (broad but shallow)
- No knowledge of your company processes
- Can't access proprietary data
- Doesn't understand domain-specific terminology
Domain-specific agents add:
- Industry expertise (legal, medical, financial knowledge)
- Company-specific context (your processes, data, terminology)
- Compliance adherence (HIPAA, SOC 2, etc.)
- Validated outputs (references, citations, confidence scores)
"Agent orchestration is where the real value lives. Individual AI capabilities matter less than how well you coordinate them into coherent workflows." - James Park, Founder of AI Infrastructure Labs
Approach 1: RAG (Retrieval-Augmented Generation)
How it works:
- Build knowledge base (domain documents, manuals, case law, etc.)
- When user asks question, retrieve relevant docs
- LLM generates answer based on retrieved context
Example: Legal contract review agent
from sentence_transformers import SentenceTransformer
import faiss
class LegalContractAgent:
def __init__(self):
# Load embedding model
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
# Load legal knowledge base
self.knowledge_base = self.load_legal_docs()
self.index = self.build_vector_index()
def load_legal_docs(self):
"""Load domain-specific legal documents"""
return [
{"text": "SaaS contract standard clauses...", "source": "saas_standards.pdf"},
{"text": "Indemnification best practices...", "source": "legal_handbook.pdf"},
{"text": "Delaware corporate law...", "source": "de_law.pdf"}
# ... thousands more
]
def build_vector_index(self):
"""Create searchable index of legal knowledge"""
texts = [doc["text"] for doc in self.knowledge_base]
embeddings = self.embedder.encode(texts)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
return index
async def review_contract(self, contract_text):
# Step 1: Retrieve relevant legal knowledge
query_embedding = self.embedder.encode([contract_text])
distances, indices = self.index.search(query_embedding, k=5)
relevant_docs = [self.knowledge_base[i] for i in indices[0]]
# Step 2: Generate review with retrieved context
prompt = f"""
You are a legal contract review expert.
Contract to review:
{contract_text}
Relevant legal knowledge:
{self._format_docs(relevant_docs)}
Analyze this contract for:
1. Unusual or risky clauses
2. Missing standard protections
3. Jurisdiction/governing law issues
Cite specific clauses and reference relevant legal standards.
"""
review = await call_llm(prompt, model="gpt-4-turbo")
return review
def _format_docs(self, docs):
return "\n\n".join([
f"Source: {doc['source']}\n{doc['text']}"
for doc in docs
])
Advantages:
- No training required (use existing LLM)
- Easy to update knowledge (add new docs to index)
- Explainable (shows sources)
- Cost-effective
Disadvantages:
- Limited by retrieval quality (if relevant doc not found, answer suffers)
- Context window limits (can only fit ~10-20 pages of retrieved docs)
- Doesn't learn patterns (each query independent)
When to use: Start with RAG for any domain-specific agent. Works for 80% of use cases.
Approach 2: Fine-Tuning
How it works:
- Collect domain-specific training data (1,000-10,000 examples)
- Fine-tune base model on this data
- Model learns domain patterns, terminology, reasoning styles
Example: Medical diagnosis assistant
Collect Training Data
# Format: input (symptoms) → output (differential diagnosis)
training_data = [
{
"input": "Patient: 45F, fever 39°C, productive cough, shortness of breath",
"output": "Differential diagnosis:\n1. Community-acquired pneumonia (most likely)\n2. Acute bronchitis\n3. COVID-19\n4. Influenza\n\nRecommend: Chest X-ray, SpO2 check, consider empiric antibiotics if bacterial pneumonia suspected."
},
{
"input": "Patient: 62M, chest pain radiating to left arm, diaphoresis, BP 160/95",
"output": "Differential diagnosis:\n1. Acute coronary syndrome (URGENT)\n2. Unstable angina\n3. Myocardial infarction\n\nImmediate actions: ECG, troponin levels, aspirin 325mg, cardiology consult. Do NOT discharge."
}
# ... 10,000 more examples
]
Fine-Tune Model
import openai
# Upload training data
openai.File.create(
file=open("medical_training_data.jsonl"),
purpose="fine-tune"
)
# Create fine-tuning job
openai.FineTuningJob.create(
training_file="file-abc123",
model="gpt-4-turbo",
suffix="medical-diagnosis-v1"
)
# Wait for completion (takes hours to days)
Use Fine-Tuned Model
response = openai.ChatCompletion.create(
model="ft:gpt-4-turbo:medical-diagnosis-v1",
messages=[{
"role": "user",
"content": "Patient: 28F, sudden severe headache, photophobia, neck stiffness"
}]
)
print(response.choices[0].message.content)
# Output: "Differential diagnosis:\n1. Meningitis (bacterial or viral) - HIGH PRIORITY\n2. Subarachnoid hemorrhage\n3. Migraine (less likely given neck stiffness)\n\nImmediate actions: Lumbar puncture, CT head, IV antibiotics if bacterial meningitis suspected..."
Advantages:
- Learns domain patterns deeply
- Better at domain-specific terminology
- More consistent outputs
- Can handle nuanced reasoning
Disadvantages:
- Expensive (training costs $500-5,000+)
- Requires large training dataset (1,000+ examples minimum)
- Harder to update (must retrain)
- Risk of overfitting
When to use: After RAG, if you have 1,000+ quality examples and need better performance.
Approach 3: Hybrid (RAG + Fine-Tuning)
Best of both worlds:
- Fine-tune on domain patterns
- Use RAG for up-to-date knowledge
Example: Financial analysis agent
class FinancialAnalysisAgent:
def __init__(self):
# Fine-tuned model (knows financial reasoning patterns)
self.model = "ft:gpt-4-turbo:financial-analysis-v2"
# RAG knowledge base (current market data, regulations)
self.knowledge_base = FinancialKnowledgeBase()
async def analyze_stock(self, ticker):
# Retrieve current financial data (RAG)
financial_data = await self.knowledge_base.get_financial_data(ticker)
recent_news = await self.knowledge_base.get_recent_news(ticker)
# Analyze using fine-tuned model
prompt = f"""
Analyze {ticker} for investment potential.
Financial data:
{financial_data}
Recent news:
{recent_news}
Provide:
1. Financial health assessment
2. Growth prospects
3. Risk factors
4. Recommendation (buy/hold/sell) with confidence level
"""
analysis = await call_llm(prompt, model=self.model)
return analysis
Result: Model understands financial reasoning (from fine-tuning) + has access to latest data (from RAG).
Domain-Specific Examples
Legal: Contract Review
Knowledge needed:
- Contract law (case law, statutes)
- Industry standards (SaaS, employment, real estate)
- Company policies (approved clause language)
Implementation: RAG with legal document database
Performance: 91% accuracy identifying risky clauses (vs 73% for GPT-4 alone)
Quote from Sarah Martinez, Legal Ops Lead: "Domain-specific legal agent cut contract review time from 2 hours to 20 minutes. Catches edge cases our junior associates miss."
Medical: Clinical Decision Support
Knowledge needed:
- Medical literature (journals, textbooks)
- Drug interactions database
- Clinical guidelines (evidence-based protocols)
Implementation: Hybrid (fine-tuned on medical cases + RAG for drug database)
Compliance: HIPAA required, no patient data in training set
Performance: 87% concordance with specialist physicians on diagnosis
Warning: Medical AI must be supervised. Never autonomous decision-making.
Financial: Investment Analysis
Knowledge needed:
- Financial statements (10-K, 10-Q filings)
- Market data (real-time prices, ratios)
- Economic indicators (Fed reports, GDP, etc.)
Implementation: RAG with real-time data APIs
Compliance: SEC regulations, no insider trading
Performance: Predictions within 15% of analyst consensus 78% of time
Engineering: Code Review
Knowledge needed:
- Company coding standards
- Security best practices (OWASP Top 10)
- Architecture patterns (company-specific)
Implementation: RAG with internal documentation + fine-tuned on company codebase
Performance: Catches 83% of bugs found by human reviewers, 40% faster
Compliance Requirements by Domain
| Domain | Regulations | Key Requirements |
|---|
| Medical (HIPAA) | Protected Health Information | No patient data in training, encrypted storage, access logs, BAA required |
| Financial (SOC 2) | Customer data protection | Encryption, access controls, audit trails, data retention policies |
| Legal (Bar rules) | Attorney-client privilege | Confidentiality, conflict checks, no unauthorized practice of law |
| Government (FedRAMP) | Federal data | US-based servers, security controls, continuous monitoring |
Production checklist for regulated domains:
Performance Benchmarks
Task: Analyze 100 domain-specific documents
| Agent Type | Accuracy | Time | Cost | Best For |
|---|
| General GPT-4 | 73% | 45min | $12 | General questions |
| RAG only | 86% | 50min | $15 | Up-to-date knowledge |
| Fine-tuned only | 89% | 40min | $18 | Consistent reasoning |
| Hybrid (RAG + FT) | 91% | 42min | $22 | Best performance |
Takeaway: Hybrid approach achieves best accuracy, but costs 83% more than general model.
Building Your Domain-Specific Agent
Step-by-step:
1. Start with RAG (week 1-2):
- Collect domain documents (100-1,000 docs minimum)
- Build vector search index
- Test retrieval quality
- Deploy basic RAG agent
2. Evaluate performance (week 3):
- Create evaluation dataset (50-100 examples)
- Measure accuracy, response quality
- Identify failure modes
3. Decide if fine-tuning needed (week 4):
- If RAG achieves >85% accuracy: Done, use RAG
- If <85%: Collect training data for fine-tuning
4. Fine-tune (if needed) (week 5-8):
- Collect 1,000-10,000 training examples
- Fine-tune base model
- Evaluate on held-out test set
- Deploy if improvement >10% over RAG
5. Monitor and improve (ongoing):
- Track accuracy on production queries
- Add new documents to RAG knowledge base
- Collect edge cases for future fine-tuning
Frequently Asked Questions
How much training data do I need for fine-tuning?
Minimum: 1,000 examples
Good: 5,000+ examples
Ideal: 10,000-50,000 examples
More data = better performance, but diminishing returns after 10K.
Can I fine-tune on proprietary company data?
Yes, but check LLM provider's terms:
- OpenAI: Opted out of training on fine-tuning data (per policy)
- Anthropic: No fine-tuning available yet (as of Nov 2024)
- Self-hosted models (Llama, Mistral): Full control, no data sharing
How do I handle domain knowledge that changes frequently?
Use RAG, not fine-tuning. RAG can be updated daily (add new docs to index). Fine-tuning requires full retraining.
Example: Medical agent needs latest COVID treatment guidelines → RAG. Financial regulations change monthly → RAG.
Bottom line: Domain-specific agents achieve 91% accuracy vs 73% for general models. Start with RAG (faster, cheaper), fine-tune only if needed (better performance, higher cost). Hybrid approach best for regulated industries. Compliance (HIPAA, SOC 2) non-negotiable for medical/financial domains.
Next: Read our RAG guide for deep dive on retrieval systems.