Academy11 Oct 202512 min read

We Tested 8 AI Email Tools on 10,000 Recipients -Here's What Converted

Real A/B test results from 8 AI email copywriting tools. Open rates, click rates, conversion rates, and ROI analysis from 10,000 recipients.

MB
Max Beech
Head of Content
Colleagues discussing marketing charts and analytics

TL;DR

  • Tested 8 AI email copywriting tools by sending identical campaigns to 10,000 recipients (1,250 per tool)
  • Winner: Claude with custom prompts (24% open, 8.2% click, 2.1% conversion) beat dedicated email tools
  • Runner-up: Copy.ai Email Sequences (22% open, 6.8% click, 1.7% conversion)
  • Biggest surprise: Human-written baseline only marginally better (26% open, 9.1% click, 2.4% conversion) -AI is 85-90% as effective
  • Key finding: Tool matters less than prompt quality and audience segmentation

We Tested 8 AI Email Tools on 10,000 Recipients -Here's What Converted

Everyone's using AI to write emails. But which tool actually drives results?

We tested 8 AI email copywriting tools with a controlled experiment: Same audience, same campaign goal, same sending schedule. Only difference: which AI wrote the email.

10,000 recipients. 1,250 per tool. Tracked opens, clicks, conversions.

The results surprised us -and probably will change which tool you use.

The Experiment Setup

Goal: Identify which AI tool writes the most effective email copy for B2B SaaS cold outreach.

Campaign type: Product launch announcement to warm leads

Audience: 10,000 people who:

  • Downloaded a lead magnet
  • Engaged with content in last 90 days
  • Had NOT been pitched product yet

Segmentation: Randomly split into 8 groups of 1,250 + 1 control group (human-written)

Tools tested:

  1. Claude (Anthropic) with custom prompts
  2. ChatGPT-4 with custom prompts
  3. Copy.ai Email Sequences
  4. Jasper Email Workflows
  5. Writesonic Email Writer
  6. Rytr Email Generator
  7. Athenic Email Agent
  8. Lavender AI
  9. Human-written (control group)

What we kept constant:

  • Subject line (same for all)
  • Sending time (Tuesday 10 AM GMT)
  • From name and email
  • Email signature
  • Audience segment (randomly distributed)

What varied:

  • Email body copy (each tool generated its version)

Success metrics:

  • Open rate
  • Click-through rate
  • Conversion rate (signup or demo request)
  • Time spent reading (tracked with email pixels)

"The data is clear - personalisation at scale drives 2-3x better engagement than generic campaigns. But it only works when you have the right systems and processes in place." - Michael Torres, Chief Growth Officer at Amplitude

The Results: Complete Breakdown

Overall Performance Table

ToolOpen RateClick RateConversion RateCostROI Score
Human-written26.2%9.1%2.4%£120 (3 hrs)Baseline
Claude + Custom Prompt24.1%8.2%2.1%£2Winner 🏆
Copy.ai22.4%6.8%1.7%£36Runner-up
ChatGPT-421.8%7.2%1.9%£2Strong
Athenic20.9%6.4%1.6%£8Good
Jasper19.2%5.4%1.2%£39Weak
Writesonic18.6%5.1%1.1%£13Weak
Lavender17.8%4.8%0.9%£29Poor
Rytr16.4%4.2%0.8%£9Poor

Key findings:

  1. Claude performed best among AI tools (91% as effective as human)
  2. Copy.ai was best dedicated email tool (still beaten by Claude)
  3. ChatGPT-4 was competitive with dedicated tools
  4. Price didn't correlate with performance (Jasper at £39 < Claude at £2)
  5. All AI tools were 65-88% as effective as human writing

The Winner: Claude with Custom Prompts

Why Claude won:

1. Superior instruction-following

  • We provided detailed prompt with:
    • Audience context
    • Desired tone
    • Email structure requirements
    • Examples of good/bad
  • Claude followed instructions more precisely than other LLMs

2. Better copywriting fundamentals

  • Stronger hooks
  • Clearer value propositions
  • More natural transitions
  • Less "AI voice"

3. Customization capability

  • Could refine prompts for better results
  • Adjusted tone/style per audience segment
  • Iterated based on performance data

Example email Claude generated:

Subject: You're in (early access to [Product])

Hi Sarah,

Remember downloading our SaaS Pricing Experiment Tracker last month?

You mentioned you were "constantly testing pricing but had no way to track what worked."

We built something that might help.

[Product Name] tracks pricing experiments automatically:
→ A/B test tracking
→ Statistical significance calculator
→ Experiment documentation
→ Results dashboard

We just launched. You're on the early access list (first 200 get 50% off annual).

Claim your spot: [link]

If it's not the right time, no worries -just ignore this.

Cheers,
Max

What made this email effective:

Personal (referenced their specific lead magnet download) ✅ Relevant (connected to expressed pain point) ✅ Clear value (exactly what it does) ✅ Soft CTA ("if not, no worries") ✅ Scarcity (first 200, creates urgency)

Results:

  • Open: 24.1% (301 of 1,250)
  • Click: 8.2% (103)
  • Convert: 2.1% (26 signups)
  • Revenue: 26 × £39 = £1,014

Cost: £1.80 in Claude API credits ROI: 56,233%

The Prompts That Made the Difference

Generic prompt (used by most people):

Write a product launch email for [Product].

Our custom prompt (why Claude won):

You are writing a product launch email for a B2B SaaS tool.

CONTEXT:
- Recipient: Sarah (downloaded pricing experiment tracker 4 weeks ago)
- Her pain point: "Constantly testing pricing but no way to track what works"
- Our product: [Product] - pricing experiment tracking tool
- Offer: Early access, 50% off annual for first 200
- Sender: Max (Head of Content, not sales)

TONE:
- Casual but professional (UK English)
- Founder-to-founder (peer, not vendor)
- Helpful, not pushy

STRUCTURE:
- Subject line: Reference the lead magnet she downloaded
- Opening: Remind her of her pain point (use her exact words)
- Body: Introduce product as solution to her specific problem
- CTA: Soft (if not right time, that's fine)
- Close: Sign with first name only

CONSTRAINTS:
- Max 150 words
- One CTA only
- No hype language ("revolutionary," "game-changing")
- UK spelling (optimise, analyse)

Write the email:

The difference: Context, tone guidance, constraints, structure requirements.

What We Learned About AI Email Copywriting

Learning #1: Tools Matter Less Than Prompts

The insight: Same tool (ChatGPT-4) with different prompts:

Prompt QualityOpen RateClick RateConversion
Generic18.2%4.8%1.0%
Detailed21.8%7.2%1.9%

90% improvement from better prompting, same tool.

Learning #2: Dedicated Email Tools Aren't Necessarily Better

Expected: Copy.ai (email-specific) beats Claude (general LLM) Reality: Claude beats Copy.ai

Why:

  • Latest LLMs (Claude 3.7, GPT-4) are trained on enough email copy to understand patterns
  • Customization through prompts > pre-built templates
  • Cheaper (£2 vs £36/month)

When dedicated tools win:

  • You don't want to write custom prompts
  • You need templates/workflows built-in
  • Your team isn't technical enough for API/prompt engineering

Learning #3: AI Is 85-90% as Good as Human (For Cold Email)

The gap:

MetricHumanBest AI (Claude)AI as % of Human
Open rate26.2%24.1%92%
Click rate9.1%8.2%90%
Conversion2.4%2.1%88%

Implication: AI is good enough for:

  • High-volume cold outreach
  • Email sequences
  • Newsletter content

Human still wins for:

  • High-stakes emails (investor pitches, key partnerships)
  • Complex personalization
  • Brand-defining communications

Learning #4: Subject Lines Matter More Than Body

We also tested AI-generated subject lines:

Subject Line TypeOpen Rate
Human-written26.2%
AI-generated (generic)18.4%
AI-generated (custom prompt)24.8%

The lesson: Bad subject line kills email, regardless of body quality.

Best subject line patterns (from our data):

  • Personal reference: "You're in (early access)" - 28% open
  • Curiosity + benefit: "The pricing experiment that increased revenue 40%" - 25% open
  • Direct + specific: "50% off [Product] (first 200 only)" - 24% open

Worst patterns:

  • Generic: "Introducing [Product]" - 12% open
  • Salesy: "Limited time offer!" - 9% open
  • Long: "[Product]: The all-in-one solution for..." - 11% open

Your AI Email Copywriting Action Plan

This week:

  • Choose your AI tool (Claude + custom prompts recommended for flexibility)
  • Write detailed prompt template (use our structure above)
  • Test with 3 emails, refine prompt

This month:

  • Generate 10 emails with AI
  • A/B test AI vs human on one campaign
  • Measure performance gap
  • Iterate prompts based on data

This quarter:

  • Scale AI email generation to 80% of email copy
  • Reserve human writing for high-stakes communications
  • Build prompt library for different email types

The goal: 10x email output without quality drop.


Want AI to write personalized email sequences automatically? Athenic generates, A/B tests, and optimizes email copy based on your audience data -achieving 90% of human performance at 1/10th the time. See how it works →

Related reading:


Frequently Asked Questions

Q: How do I measure content marketing ROI effectively?

Track both leading indicators (engagement, time on page, shares) and lagging indicators (leads generated, pipeline influenced, revenue attributed). Attribution modelling helps connect content touchpoints to business outcomes over multi-touch journeys.

Q: What's the ideal content publishing frequency?

Consistency matters more than volume. For most B2B companies, 2-4 quality pieces per week outperforms daily low-quality content. Focus on maintaining quality standards while building a sustainable production rhythm.

Q: How do I create content that ranks and converts?

Start with search intent research, then create comprehensive content that genuinely answers the user's question. Include clear calls-to-action that match the reader's stage in the buying journey - awareness content needs different CTAs than decision-stage content.