Split Testing at Scale: Avoiding the Phantom Lift

Winston Churchill famously said, "To improve is to change; to be perfect is to change often."

If you are an advertising agency, an e-commerce space or any website that relies heavily on clicks, purchases and form completions, split-testing is not only valuable, it's crucial. If you are not constantly split-testing, you are behind the curve and paying opportunity costs, everyday.

To properly scale, however, websites must be aware of some of the most common pitfalls in a growing split-testing operation. Vulnerability is most apparent when interpreting the results of a test through some Key Performance Indicators (KPI).

The most common misinterpretations of split-testing KPI’s is the Phantom Lift, and understanding how to adjust KPI’s to make more insightful decisions will be the key factor in extracting the full value out of split-testing.

What Is Phantom Lift ...

And why does it occur?

In simplest terms, the Phantom Lift is when we run many successful split-tests and our overall results do not meet our expectations. There are three types of lifts: The Expected Lift, The Actual Lift and their difference, The Phantom Lift.

[Expected Lift] – [Actual Lift] = [Phantom Lift]

For example, suppose I use simple A/B Split-Testing with the goal to improve click-rate on a site. In this example, the KPI will be relative percentage increase. That is, if a test variation performs better than a control variation, we say the test performs "x percent" better than the control.

Let's say over the course of a year we run 100 split tests, one right after the other, with the goal of improving our meager 20 percent click-through rate. After each test, we compare the control variation to the test variation. If out test variation performed better than the control, it becomes the new control variation and we record the lift the test provided.

If not, we do nothing and proceed with the next test. At the end of this process, we find we had 50 winners, which produced an average of a 4 percent lift.

Fantastic! We should have at least a seven times better click-thru rate than we did before (1.04^40), right?

Not exactly. You will notice that a seven times better click through rate would mean our click-through rate is now 140 percent; not altogether feasible. Moreover, our click through rate after all of our testing is only 25 percent. This is an improvement, to be sure, but not nearly as large as we would have hoped.

Why the Hell Should You Care?

The example above isn’t just mathematical sleight-of-hand or an elaborate computational ruse.

The problem starts with variance in the data. Variance can be thought of as the volatility of information. The less volatility, the easier it is to learn from data and drive decision making.

What so often happens is that this volatility is grossly unaccounted for, and we let too many false positives drive decision-making.

At this point, the phrases “small sample size” and “statistical significance” tend to creep into the discourse. But a third buzzword is equally important; publication bias. While usually found in the context of academic research, its relevance to split-testing is undeniable.

In our split-testing example earlier, we cherry-picked winning tests and ignored the information losing tests provided. The results we published led us to a false conclusion, in the end.

How Do We Address It?

Choose your KPI wisely. The above method was clearly insufficient, but we can augment how we interpret our results to better protect ourselves against variance and selection bias.

Popular strategies include leveraging statistical significance, confidence intervals, and Bayesian inference. No worthwhile strategy will be entirely immune to variance or false positives, but awareness of the problem is a huge step in the right direction.

Lies, Damn Lies and Statistics?

So what is our takeaway here? That there are lies, damn lies and statistics?

Learning Opportunities

WebinarJul 30, 2026 · 11:00 AM PDT

From Automation to Intelligence: How Leading Teams Are Rethinking Operations

Tired office clerk working with documents

WebinarAug 11, 2026 · 9:00 AM PDT

Content Leaders Collective: When Your Documentation Tools Can't Keep Up

WebinarAug 19, 2026 · 9:00 AM PDT

How to Win the War for Agentic Citations: The AEO Playbook You Need Now

Promotional banner for CX Retail USA Exchange 2026, an invite-only customer experience and retail leadership conference in Atlanta on Sept. 14–15, 2026.

ConferenceSep 14, 2026 · 7:30 AM EDT

CX Retail Exchange USA Atlanta 2026

ConferenceOct 21, 2026 · 8:30 AM EDT

Digital Transformation & Customer Experience Summit Boston 2026

Gaylord Rockies Resort & Convention Center in Aurora, Colorado

ConferenceNov 4, 2026 · 9:00 AM MST

Gartner Customer Service & Support Conference Denver 2026

WebinarOn Demand

Replacing Tasks, Not Roles: The Changing Nature of Contact Center Work

Watch Now

Prove the significant result not only in soccer

WebinarOn Demand

Content Leaders Collective: Proving Content's Business Impact Starts With the Right CCMS

Watch Now

View All

More realistically, false positives, if left unnoticed compound into large disparities in expected and actual performance. Take care with your KPIs and temper expectation with well-reasoned pessimism.

Title image "Made of Birds" (CC BY-ND 2.0) by Caden Crawford

fa-solid fa-hand-paper Learn how you can join our contributor community.

What Is Phantom Lift ...

Why the Hell Should You Care?

How Do We Address It?

Lies, Damn Lies and Statistics?

About the Authors