Last month, we dove into why cost per action (CPA) is generally not the best measure of campaign effectiveness, and how it risks condemning an otherwise well-structured campaign to advertising limbo, devoid of any real insights as to how the campaign impacted the marketer’s business.
Instead, marketers should optimize to lift in order to fully understand the real-world effects that the campaign caused. But not all lift measurement is created equal!
Different, competing methodologies abound, and this month we’ll dive deeply into several of them.
When you split the US, for instance, into two groups of designated market areas (DMAs) that have produced roughly similar consumer spend in the past, you can create an exposed and control group for your campaign. This is really simple to comprehend and execute, which in the byzantine world of research methodologies should not be understated in its value.
Yet it’s probably not the best approach for most marketers: while measurement is straightforward and doesn’t require user identifier-based attribution, we can’t ignore the statistical noise that is inherent to the approach.
In order to be sound, the test design must adjust for every DMA-specific variable, such as weather, exposure to ads in other channels, retail footprint, local-market competitors, local coupons, and so forth.
Of course, that’s nearly impossible to do while also maintaining a high degree of precision.
Identify a test group that you’d like to target. Then, model an audience that shares attributes with your target audience. Run the campaign and measure the delta. Seems straightforward enough! And it is. Even better, it’s not necessary to know specifics about the audience attributes prior to running a study, so looping in a third party measurement provider is easy and efficient.
That simplicity comes with a tradeoff, though: the injection of bias into the experiment.
Since you’re generally building your target based upon behavioral actions (such as a visit to a retail or online store), your two groups consist of those who share many attributes with each other, except that the control group typically consists of people who are identified as being “in market” for your particular product category. This tends to inflate performance metrics, as the two groups may look alike, but not in the most important way: their stage in the consideration funnel.
Intent to Treat
Using hashed (e.g, anonymized and encrypted) identifiers, segment your target audience into test and control groups. Measure the incrementality between all members of both groups, regardless of whether they saw the ad or not. This randomness helps avoid much of the bias we can see in other methodologies.
Unfortunately, there are limitations to this approach. Because your campaign won’t ever reach everyone in your targeting pool (due to variances in their behavior online, competitive media buying among advertisers, and so forth), the number of “exposed” users can vary substantially from those that were not exposed.
You now have the problem of the control group not being much of a control group anymore, because we don’t know that those users would have been served the ad if they were indeed in the exposed group.
The result is that your experiment can include statistical noise caused by the introduction of non-ad factors (such as the aforementioned variances inherent in programmatic media) which, to obviate, require that you run wildly more impressions as part of the campaign.
Public Service Announcements (PSAs)
Remember Smokey the Bear, the anthropomorphized ursine spokesanimal urging us to prevent forest fires since his first television appearance in 1944? Yes, this measurement approach utilizes for its control group the same type of PSAs that we remember fondly from our childhoods.
It’s really simple: the test group consists of those who were exposed to an ad, and the control group are those who saw the PSA instead. While running public-interest campaigns is a noble endeavor, those PSAs generally cost the same to run as do the actual, exposed group ads. Moreover, there’s a lot of opportunity for bias, here: performance-optimization algorithms that nearly all buy and sell-side platforms use will cause the PSA ads to be shown more to people that are more likely to interact with it. Because those that are likely to engage with the PSA can be different (behaviorally, demographically, etc.) from those that have a propensity to engage with the actual ad, now your test and control groups are unbalanced.
There’s another factor, too: when an advertiser runs a PSA, another advertiser can’t use that impression to run a potentially competitive ad.
When the target population is small, this can make an appreciable difference, because the media exposure profile of those in each group is also imbalanced.
Once again, segment the target audience based upon hashed identifiers. Serve the ad to the exposed group, and make note of an equal number of instances when the impression opportunity met all of the targeting criteria for the campaign, but don’t actually serve it. Call those the “Ghost Bids.” Then, run a model trained on actual bids and delivered impressions from the same campaign and determine the Ghost Bids that could have run the ad. Call those the “Ghost Impressions.”
Compare the actual impressions to ghost impressions, and you’ve just figured out incremental lift with minimal noise and bias.
It’s important to note that this methodology requires the demand side platform (or, in the case of non programmatic buys, the buy-side ad server) be technically capable of conducting such an experiment.
None of these approaches is “all good” or “all bad” -- each has their pros and cons, and some will be more appropriate for certain types of campaigns and KPIs. You will likely find that Ghost Bids provide a good balance between ease-of-execution and accuracy.
Next month, we’ll take a look at the measurement techniques that are on the horizon, and that are designed to be durable in the post-cookie reality.