We Tested 8 AI Email Tools on 10,000 Recipients -Here's What Converted
Real A/B test results from 8 AI email copywriting tools. Open rates, click rates, conversion rates, and ROI analysis from 10,000 recipients.
Real A/B test results from 8 AI email copywriting tools. Open rates, click rates, conversion rates, and ROI analysis from 10,000 recipients.
TL;DR
Everyone's using AI to write emails. But which tool actually drives results?
We tested 8 AI email copywriting tools with a controlled experiment: Same audience, same campaign goal, same sending schedule. Only difference: which AI wrote the email.
10,000 recipients. 1,250 per tool. Tracked opens, clicks, conversions.
The results surprised us -and probably will change which tool you use.
Goal: Identify which AI tool writes the most effective email copy for B2B SaaS cold outreach.
Campaign type: Product launch announcement to warm leads
Audience: 10,000 people who:
Segmentation: Randomly split into 8 groups of 1,250 + 1 control group (human-written)
Tools tested:
What we kept constant:
What varied:
Success metrics:
| Tool | Open Rate | Click Rate | Conversion Rate | Cost | ROI Score |
|---|---|---|---|---|---|
| Human-written | 26.2% | 9.1% | 2.4% | £120 (3 hrs) | Baseline |
| Claude + Custom Prompt | 24.1% | 8.2% | 2.1% | £2 | Winner 🏆 |
| Copy.ai | 22.4% | 6.8% | 1.7% | £36 | Runner-up |
| ChatGPT-4 | 21.8% | 7.2% | 1.9% | £2 | Strong |
| Athenic | 20.9% | 6.4% | 1.6% | £8 | Good |
| Jasper | 19.2% | 5.4% | 1.2% | £39 | Weak |
| Writesonic | 18.6% | 5.1% | 1.1% | £13 | Weak |
| Lavender | 17.8% | 4.8% | 0.9% | £29 | Poor |
| Rytr | 16.4% | 4.2% | 0.8% | £9 | Poor |
Key findings:
Why Claude won:
1. Superior instruction-following
2. Better copywriting fundamentals
3. Customization capability
Example email Claude generated:
Subject: You're in (early access to [Product])
Hi Sarah,
Remember downloading our SaaS Pricing Experiment Tracker last month?
You mentioned you were "constantly testing pricing but had no way to track what worked."
We built something that might help.
[Product Name] tracks pricing experiments automatically:
→ A/B test tracking
→ Statistical significance calculator
→ Experiment documentation
→ Results dashboard
We just launched. You're on the early access list (first 200 get 50% off annual).
Claim your spot: [link]
If it's not the right time, no worries -just ignore this.
Cheers,
Max
What made this email effective:
✅ Personal (referenced their specific lead magnet download) ✅ Relevant (connected to expressed pain point) ✅ Clear value (exactly what it does) ✅ Soft CTA ("if not, no worries") ✅ Scarcity (first 200, creates urgency)
Results:
Cost: £1.80 in Claude API credits ROI: 56,233%
Generic prompt (used by most people):
Write a product launch email for [Product].
Our custom prompt (why Claude won):
You are writing a product launch email for a B2B SaaS tool.
CONTEXT:
- Recipient: Sarah (downloaded pricing experiment tracker 4 weeks ago)
- Her pain point: "Constantly testing pricing but no way to track what works"
- Our product: [Product] - pricing experiment tracking tool
- Offer: Early access, 50% off annual for first 200
- Sender: Max (Head of Content, not sales)
TONE:
- Casual but professional (UK English)
- Founder-to-founder (peer, not vendor)
- Helpful, not pushy
STRUCTURE:
- Subject line: Reference the lead magnet she downloaded
- Opening: Remind her of her pain point (use her exact words)
- Body: Introduce product as solution to her specific problem
- CTA: Soft (if not right time, that's fine)
- Close: Sign with first name only
CONSTRAINTS:
- Max 150 words
- One CTA only
- No hype language ("revolutionary," "game-changing")
- UK spelling (optimise, analyse)
Write the email:
The difference: Context, tone guidance, constraints, structure requirements.
The insight: Same tool (ChatGPT-4) with different prompts:
| Prompt Quality | Open Rate | Click Rate | Conversion |
|---|---|---|---|
| Generic | 18.2% | 4.8% | 1.0% |
| Detailed | 21.8% | 7.2% | 1.9% |
90% improvement from better prompting, same tool.
Expected: Copy.ai (email-specific) beats Claude (general LLM) Reality: Claude beats Copy.ai
Why:
When dedicated tools win:
The gap:
| Metric | Human | Best AI (Claude) | AI as % of Human |
|---|---|---|---|
| Open rate | 26.2% | 24.1% | 92% |
| Click rate | 9.1% | 8.2% | 90% |
| Conversion | 2.4% | 2.1% | 88% |
Implication: AI is good enough for:
Human still wins for:
We also tested AI-generated subject lines:
| Subject Line Type | Open Rate |
|---|---|
| Human-written | 26.2% |
| AI-generated (generic) | 18.4% |
| AI-generated (custom prompt) | 24.8% |
The lesson: Bad subject line kills email, regardless of body quality.
Best subject line patterns (from our data):
Worst patterns:
This week:
This month:
This quarter:
The goal: 10x email output without quality drop.
Want AI to write personalized email sequences automatically? Athenic generates, A/B tests, and optimizes email copy based on your audience data -achieving 90% of human performance at 1/10th the time. See how it works →
Related reading: