We find a measurement gap in 100% of the accounts we audit. See what is hiding in yours. We find gaps in 100% of audits. Get your free audit
Playbooks Ad copy & creative Ad-Copy Testing at Scale

Ad-copy testing at scale: the best headlines earn 3x the click-through of the median, and the worst earn almost none.

Inside a responsive search ad, every headline competes. Some pull three times the clicks of the average; some pull almost nothing. The difference is real money, and the only way to know which is which is to test at scale and read the results. We have measured over four million ad-asset performance records to do exactly that.

asset performance, values withheld
Headline click-through rates spread from a high top decile to a near-zero bottom decile, values withheld
Directional shape only. Real figures are withheld for privacy.
3x
top vs median CTR
Best-decile headlines vs the median, on click-through
~13%
best-decile CTR
What the strongest headlines pull
~0%
worst-decile CTR
The weakest headlines barely get clicked
34k+
assets tested
Across 4M+ asset performance records
01 The problem 02 Our approach 03 The levers 04 The result 05 How to apply it 06 What we watch for 07 In depth 08 Takeaways
01 · The problem

Most ad copy is written once and never tested.

A responsive search ad is not one ad; it is a pool of headlines and descriptions that Google mixes and matches in real time. That means every headline you add is a small bet, and like any pool of bets, the results are wildly uneven. Yet most accounts treat ad copy as a one-time creative exercise: write fifteen headlines at launch, never look at them again, and assume they all pull their weight. They do not.

When we measured headline performance across our book, the spread was stark. The top-decile headlines pulled a click-through rate around 13%; the median sat near 4%; and the bottom decile earned essentially 0%. In other words, the best headlines drive roughly 3x the clicks of the average, and the worst are dead weight occupying a slot that a better line could use.

That dead weight is not harmless. A weak headline that the system occasionally serves dilutes the ad, lowers the average click-through, and ultimately costs you impressions and conversions you would have won with a stronger pool. Untested ad copy is a silent tax, and almost nobody is reading the asset-level report to see it.

02 · Our approach

Treat the ad as a portfolio, and prune it.

We treat a responsive search ad like a portfolio of bets that needs active management. New headlines and descriptions are added deliberately, each one a hypothesis (a benefit, an offer, a proof point, an objection handled), and then we read the asset-level performance report to see which hypotheses the market actually rewarded. The winners stay and get imitated; the dead weight is replaced.

Variety is part of the discipline. We make sure the headline pool covers genuinely different angles, price, quality, speed, trust, the specific objection a buyer has, rather than fifteen rephrasings of the same claim. A pool of near-duplicates gives the system nothing to optimize between; a pool of distinct angles lets it find the message that fits each searcher. This is where ad-copy testing meets real Google Ads copywriting: the testing tells you what works, the writing gives it something worth testing.

We also test in context, not in a vacuum. A headline is only as good as the search it answers and the landing page it leads to, so the pool is built around the intent of its ad group and kept consistent with the page the click lands on. A brilliant headline pointed at the wrong query or a mismatched page is still a losing bet.

  • Add headlines as distinct hypotheses
  • Read the asset report, prune the losers
  • Keep angles varied and on-intent
03 · The levers

What separates a 13% headline from a dead one.

With over 34,000 distinct ad assets measured across 4 million-plus performance records, the patterns are not guesswork. The gap between the best and worst headlines is enormous and consistent, and a handful of levers explain most of it.

Headline click-through, top vs median vs bottom decile
Headline click-through rate by decile, best decile about triple the median, indexed
Best-decile headlines pull roughly 3x the median click-through; the bottom decile near zero (indexed; values withheld).
Lever A

a Match the searcher's words

The headlines in the top decile almost always reflect the language of the search. A query and an ad that share the same words feel relevant, earn a higher click-through, and tend to win Quality Score, which lowers cost. The dead-weight headlines are usually generic brand slogans that could belong to any company in any industry. Specific beats clever, and relevant beats both.

The headlines that win are the ones that sound like the search, not like a tagline.

Lever B

b Lead with the specific

Concrete claims outperform vague ones. A specific number, a named guarantee, a clear offer, a real differentiator pulls clicks; "quality you can trust" does not. The asset report is brutally honest about this: the abstract headlines cluster near the bottom of the click-through distribution, and the concrete ones cluster at the top. We let that data, not opinion, decide which headlines survive.

Specific, concrete headlines earn the clicks; abstract ones occupy a slot and earn nothing.

Lever C

c Cover distinct angles

A strong pool is diverse on purpose. Because the system assembles the ad per searcher, it needs raw material to choose from: a price angle for the bargain hunter, a quality angle for the premium buyer, a speed or trust angle for the hesitant one. When we widen a pool from near-duplicates to genuinely different angles, the average click-through rises, because the system can finally match the message to the person.

Lever D

d Prune and refresh

Testing is continuous, not a launch event. The bottom-decile assets are replaced with new hypotheses, and even winners fatigue as audiences see them repeatedly, so fresh angles are fed in on a cadence. The point of measuring four million asset records is not to crown a permanent winner; it is to keep the pool improving, retiring the dead weight and testing the next idea, indefinitely.

The asset report is never finished. The pool keeps improving as losers are cut and new angles tested.

04 · The result

Ad copy that pulls its weight, measured not guessed.

Run as a tested portfolio rather than a one-time write, ad copy stops being a guess. The headline pool trends toward the top of the click-through distribution as dead weight is pruned and winners are imitated, which means more clicks at lower cost from the same impressions, and a stronger Quality Score underneath it all.

The numbers behind the method are unambiguous: best-decile headlines around 13% click-through, the median near 4%, the bottom near 0%, a roughly 3x gap between great and average, measured across 34,000+ assets. Ad copy is not where you trust your instincts; it is where you test and let the report decide.

3x best-decile vs median CTR
~13% best-decile headline CTR
~0% bottom-decile headline CTR
34k+ distinct assets tested

The best headline can pull triple the clicks of the average. The only way to know which is which is to test at scale.

05 · How to apply it

Reading your own asset report.

Open the asset details on your responsive search ads and look at the headline performance. Even where Google's own labels are sparse, you can see impressions and click-through per asset. Sort them and look at the spread: if some headlines are doing nothing, they are dead weight, and the slot would be better used testing a new angle.

Then audit your pool for variety. Read your fifteen headlines back to back: are they fifteen different angles, or one claim said fifteen ways? If they rhyme, the system has nothing to optimize between, and your average click-through will sit stuck in the middle of the distribution instead of climbing toward the top.

Finally, make it a habit. Replace the weakest asset, add a new hypothesis, and check back. Ad-copy testing is a loop, not a launch task, and the accounts with the strongest creative are the ones that have run the loop the most times.

Healthy signVaried headline angles, asset report reviewed, weak assets replaced regularly.
Warning signFifteen near-duplicate headlines, written at launch, never reviewed since.
06 · What we watch for

Where ad-copy testing goes wrong.

Judging too fast is the first trap. Click-through and conversion data on a single asset take time to become reliable, and killing a headline after a few hundred impressions is reading noise. We let assets gather enough volume before ruling on them, and we weight conversion signal, not just clicks, because a high-CTR headline that does not convert is its own kind of dead weight.

Optimizing for clicks alone is the next. A sensational headline can win the click and lose the sale, or attract the wrong searcher entirely. We watch click-through and downstream conversions together, because the goal is profitable traffic, not a vanity click-through rate. The best headline is the one that brings the right person, not just the most people.

Then pool collapse. Pinning too many headlines, or letting the pool shrink to a few near-duplicates, takes away the system's ability to optimize and caps performance. We keep the pool wide and mostly unpinned so the algorithm has real choices, intervening with pins only when a compliance or brand reason demands it.

Last, creative fatigue. A winning headline is not winning forever; as the same audience sees it again and again, its click-through decays. We watch for that decay and feed in fresh angles on a cadence, so the pool keeps improving rather than slowly aging into the dead-weight zone it started above.

07 · In depth

On pinning, and letting it run.

The most consequential setting in a responsive search ad is one most people misuse: pinning. Pinning forces a headline into a fixed position, and every pin you add removes a degree of freedom the system uses to optimize. Pin everything and you have rebuilt a rigid expanded text ad, throwing away the whole advantage of the format.

So we keep the pool mostly unpinned and let the algorithm assemble the ad per searcher, intervening only when there is a real reason: a compliance requirement, a legal disclaimer, or a brand rule that a specific line must always appear or always lead. Even then we pin the minimum, often to a position group rather than a single slot, so the system keeps room to optimize the rest.

The instinct to pin usually comes from not trusting the testing. Once the asset report is steering the pool, that distrust fades: the data, not a pin, decides which headlines win, and the ad keeps improving on its own.

It helps to be concrete about what we actually test, because "test your ad copy" is useless advice without a structure. We test by theme: a price or value angle, a quality or premium angle, a speed or convenience angle, a trust or guarantee angle, and an angle that names the specific objection a buyer has. Each headline is tagged in our own notes to a theme, so when the data comes in we are not just learning that one line beat another, we are learning which message the market in this niche rewards.

Descriptions get the same treatment, with their own job. Where headlines win the click, descriptions earn the trust between the click and the landing page: the proof points, the specifics, the reassurance. We test them as deliberately as headlines, because a strong headline followed by a vague description leaks the very attention it just bought.

We treat Google's Ad Strength meter with healthy skepticism. It rewards quantity and variety of assets, which usually correlates with good practice, but "Excellent" is not the goal; conversions are. We have seen high-Ad-Strength ads underperform a leaner, sharper pool, so we use the meter as a checklist nudge, not a scoreboard, and let the asset-level click-through and conversion data make the real calls.

Reading that asset-level data is its own skill. Google's own asset labels are sparse, so we work from impressions, click-through and conversions per asset rather than trusting the label, and we read click and conversion together, since a headline that wins attention but no sales is attracting the wrong searcher.

The whole thing is a loop, not a launch. Replace the weakest asset, add a fresh hypothesis, let it gather volume, read the result, repeat, with a deliberate cadence so winners do not fatigue as the same audience sees them again and again. And it only works when the copy and the destination agree: the promise in the winning headline has to be the promise the landing page keeps, or the best-tested ad in the world just buys a bounce.

That alignment is where ad-copy testing meets real copywriting. The testing tells you which message wins; the writing gives the system messages worth winning with. One without the other stalls: test a pool of weak lines and the ceiling is low; write brilliant copy and never test it and you never learn which brilliance the market actually pays for. The 34,000 assets are not the point. The loop that keeps improving them is.

Headlines and descriptions are only part of the ad. The extension assets, sitelinks, callouts, structured snippets, promotions, get tested with the same discipline, because they take up real estate on the page and pull clicks of their own. A strong sitelink can lift the whole ad's click-through; a stale or generic one wastes prominent space. We treat the full ad unit as the thing under test, not just the headline slots.

There is a cost angle to all of this that is easy to miss. More relevant copy earns a higher click-through, and a higher click-through feeds Quality Score, and a better Quality Score lowers what you pay per click. So ad-copy testing is not only a click-volume lever; it lowers cost too, which means a well-tested pool wins twice, more clicks and cheaper ones, from the same budget.

Copy also has a calendar. A headline that wins in a normal week is not the one that wins during a sale, a holiday, or a product launch, so we rotate promotional and seasonal angles in and out on schedule rather than leaving an evergreen pool to run through a peak it was not written for. The test never really ends; it just changes what it is testing as the year moves.

And copy is not one-size-fits-all across the account. A headline that wins on a high-intent product term is rarely the one that wins on a broad category search, so the pools are built per ad group, matched to the intent of the keywords that trigger them. Testing in context, against the right query and the right landing page, is what keeps the asset report honest rather than crowning a headline that only looked good in the wrong place.

The reason we can read this at all is infrastructure. Asset-level performance across the whole book flows into our own data warehouse, which is how a spread of millions of records becomes a clear signal about which angles pay. Most accounts cannot see past a single ad's asset panel; reading the pattern across tens of thousands of assets is what turns ad-copy testing from a hunch into a method.

08 · Takeaways

What to remember.

Inside a responsive search ad, headlines compete, and the spread is enormous: the best pull around 3x the click-through of the median, and the worst pull almost nothing. Untested ad copy leaves that gap on the table.

Treat the ad as a portfolio: add distinct angles as hypotheses, read the asset report, prune the dead weight, and keep refreshing. Across 34,000-plus assets the lesson is the same, specific and relevant beats clever and vague, and the only way to know which headline is which is to test at scale and let the data decide.

Key improvements
  • Headline pools steered toward the top of the click-through distribution by pruning dead weight
  • Best-decile headlines pulling roughly 3x the median click-through
  • Distinct angles tested instead of near-duplicate rephrasings, so the system can optimize
  • Continuous testing across 34,000+ assets, winners kept and losers replaced

If your ad copy was written at launch and never tested, some of your headlines are almost certainly dead weight. We can show you which.

Find out which of your headlines are dead weight.

Our free Due Diligence Audit reviews your responsive search ad assets and copy testing across 50+ dimensions, so you know which headlines are pulling clicks and which are wasting a slot.

Get your free Due Diligence Audit