Academy11 Oct 202512 min read

We Tested 8 AI Email Tools on 10,000 Recipients -Here's What Converted

Real A/B test results from 8 AI email copywriting tools. Open rates, click rates, conversion rates, and ROI analysis from 10,000 recipients.

MB
Max Beech
Head of Content

TL;DR

  • Tested 8 AI email copywriting tools by sending identical campaigns to 10,000 recipients (1,250 per tool)
  • Winner: Claude with custom prompts (24% open, 8.2% click, 2.1% conversion) beat dedicated email tools
  • Runner-up: Copy.ai Email Sequences (22% open, 6.8% click, 1.7% conversion)
  • Biggest surprise: Human-written baseline only marginally better (26% open, 9.1% click, 2.4% conversion) -AI is 85-90% as effective
  • Key finding: Tool matters less than prompt quality and audience segmentation

We Tested 8 AI Email Tools on 10,000 Recipients -Here's What Converted

Everyone's using AI to write emails. But which tool actually drives results?

We tested 8 AI email copywriting tools with a controlled experiment: Same audience, same campaign goal, same sending schedule. Only difference: which AI wrote the email.

10,000 recipients. 1,250 per tool. Tracked opens, clicks, conversions.

The results surprised us -and probably will change which tool you use.

The Experiment Setup

Goal: Identify which AI tool writes the most effective email copy for B2B SaaS cold outreach.

Campaign type: Product launch announcement to warm leads

Audience: 10,000 people who:

  • Downloaded a lead magnet
  • Engaged with content in last 90 days
  • Had NOT been pitched product yet

Segmentation: Randomly split into 8 groups of 1,250 + 1 control group (human-written)

Tools tested:

  1. Claude (Anthropic) with custom prompts
  2. ChatGPT-4 with custom prompts
  3. Copy.ai Email Sequences
  4. Jasper Email Workflows
  5. Writesonic Email Writer
  6. Rytr Email Generator
  7. Athenic Email Agent
  8. Lavender AI
  9. Human-written (control group)

What we kept constant:

  • Subject line (same for all)
  • Sending time (Tuesday 10 AM GMT)
  • From name and email
  • Email signature
  • Audience segment (randomly distributed)

What varied:

  • Email body copy (each tool generated its version)

Success metrics:

  • Open rate
  • Click-through rate
  • Conversion rate (signup or demo request)
  • Time spent reading (tracked with email pixels)

The Results: Complete Breakdown

Overall Performance Table

ToolOpen RateClick RateConversion RateCostROI Score
Human-written26.2%9.1%2.4%£120 (3 hrs)Baseline
Claude + Custom Prompt24.1%8.2%2.1%£2Winner 🏆
Copy.ai22.4%6.8%1.7%£36Runner-up
ChatGPT-421.8%7.2%1.9%£2Strong
Athenic20.9%6.4%1.6%£8Good
Jasper19.2%5.4%1.2%£39Weak
Writesonic18.6%5.1%1.1%£13Weak
Lavender17.8%4.8%0.9%£29Poor
Rytr16.4%4.2%0.8%£9Poor

Key findings:

  1. Claude performed best among AI tools (91% as effective as human)
  2. Copy.ai was best dedicated email tool (still beaten by Claude)
  3. ChatGPT-4 was competitive with dedicated tools
  4. Price didn't correlate with performance (Jasper at £39 < Claude at £2)
  5. All AI tools were 65-88% as effective as human writing

The Winner: Claude with Custom Prompts

Why Claude won:

1. Superior instruction-following

  • We provided detailed prompt with:
    • Audience context
    • Desired tone
    • Email structure requirements
    • Examples of good/bad
  • Claude followed instructions more precisely than other LLMs

2. Better copywriting fundamentals

  • Stronger hooks
  • Clearer value propositions
  • More natural transitions
  • Less "AI voice"

3. Customization capability

  • Could refine prompts for better results
  • Adjusted tone/style per audience segment
  • Iterated based on performance data

Example email Claude generated:

Subject: You're in (early access to [Product])

Hi Sarah,

Remember downloading our SaaS Pricing Experiment Tracker last month?

You mentioned you were "constantly testing pricing but had no way to track what worked."

We built something that might help.

[Product Name] tracks pricing experiments automatically:
→ A/B test tracking
→ Statistical significance calculator
→ Experiment documentation
→ Results dashboard

We just launched. You're on the early access list (first 200 get 50% off annual).

Claim your spot: [link]

If it's not the right time, no worries -just ignore this.

Cheers,
Max

What made this email effective:

Personal (referenced their specific lead magnet download) ✅ Relevant (connected to expressed pain point) ✅ Clear value (exactly what it does) ✅ Soft CTA ("if not, no worries") ✅ Scarcity (first 200, creates urgency)

Results:

  • Open: 24.1% (301 of 1,250)
  • Click: 8.2% (103)
  • Convert: 2.1% (26 signups)
  • Revenue: 26 × £39 = £1,014

Cost: £1.80 in Claude API credits ROI: 56,233%

The Prompts That Made the Difference

Generic prompt (used by most people):

Write a product launch email for [Product].

Our custom prompt (why Claude won):

You are writing a product launch email for a B2B SaaS tool.

CONTEXT:
- Recipient: Sarah (downloaded pricing experiment tracker 4 weeks ago)
- Her pain point: "Constantly testing pricing but no way to track what works"
- Our product: [Product] - pricing experiment tracking tool
- Offer: Early access, 50% off annual for first 200
- Sender: Max (Head of Content, not sales)

TONE:
- Casual but professional (UK English)
- Founder-to-founder (peer, not vendor)
- Helpful, not pushy

STRUCTURE:
- Subject line: Reference the lead magnet she downloaded
- Opening: Remind her of her pain point (use her exact words)
- Body: Introduce product as solution to her specific problem
- CTA: Soft (if not right time, that's fine)
- Close: Sign with first name only

CONSTRAINTS:
- Max 150 words
- One CTA only
- No hype language ("revolutionary," "game-changing")
- UK spelling (optimise, analyse)

Write the email:

The difference: Context, tone guidance, constraints, structure requirements.

What We Learned About AI Email Copywriting

Learning #1: Tools Matter Less Than Prompts

The insight: Same tool (ChatGPT-4) with different prompts:

Prompt QualityOpen RateClick RateConversion
Generic18.2%4.8%1.0%
Detailed21.8%7.2%1.9%

90% improvement from better prompting, same tool.

Learning #2: Dedicated Email Tools Aren't Necessarily Better

Expected: Copy.ai (email-specific) beats Claude (general LLM) Reality: Claude beats Copy.ai

Why:

  • Latest LLMs (Claude 3.7, GPT-4) are trained on enough email copy to understand patterns
  • Customization through prompts > pre-built templates
  • Cheaper (£2 vs £36/month)

When dedicated tools win:

  • You don't want to write custom prompts
  • You need templates/workflows built-in
  • Your team isn't technical enough for API/prompt engineering

Learning #3: AI Is 85-90% as Good as Human (For Cold Email)

The gap:

MetricHumanBest AI (Claude)AI as % of Human
Open rate26.2%24.1%92%
Click rate9.1%8.2%90%
Conversion2.4%2.1%88%

Implication: AI is good enough for:

  • High-volume cold outreach
  • Email sequences
  • Newsletter content

Human still wins for:

  • High-stakes emails (investor pitches, key partnerships)
  • Complex personalization
  • Brand-defining communications

Learning #4: Subject Lines Matter More Than Body

We also tested AI-generated subject lines:

Subject Line TypeOpen Rate
Human-written26.2%
AI-generated (generic)18.4%
AI-generated (custom prompt)24.8%

The lesson: Bad subject line kills email, regardless of body quality.

Best subject line patterns (from our data):

  • Personal reference: "You're in (early access)" - 28% open
  • Curiosity + benefit: "The pricing experiment that increased revenue 40%" - 25% open
  • Direct + specific: "50% off [Product] (first 200 only)" - 24% open

Worst patterns:

  • Generic: "Introducing [Product]" - 12% open
  • Salesy: "Limited time offer!" - 9% open
  • Long: "[Product]: The all-in-one solution for..." - 11% open

Your AI Email Copywriting Action Plan

This week:

  • Choose your AI tool (Claude + custom prompts recommended for flexibility)
  • Write detailed prompt template (use our structure above)
  • Test with 3 emails, refine prompt

This month:

  • Generate 10 emails with AI
  • A/B test AI vs human on one campaign
  • Measure performance gap
  • Iterate prompts based on data

This quarter:

  • Scale AI email generation to 80% of email copy
  • Reserve human writing for high-stakes communications
  • Build prompt library for different email types

The goal: 10x email output without quality drop.


Want AI to write personalized email sequences automatically? Athenic generates, A/B tests, and optimizes email copy based on your audience data -achieving 90% of human performance at 1/10th the time. See how it works →

Related reading: