Freitag, 18. Mai 2012

The A/B Test: Inside the Technology That’s Changing the Rules of Business

http://www.wired.com/epicenter/2012/04/ff_abtesting/
Photos: Spencer Higgins; Illustrations: Si Scott
Photo: Spencer Higgins; Illustration: Si Scott
Dan Siroker helps companies discover tiny truths, but his story begins with a lie. It was November 2007 and Barack Obama, then a Democratic candidate for president, was at Google’s headquarters in Mountain View, California, to speak. Siroker—who today is CEO of the web-testing firm Optimizely, but then was a product manager on Google’s browser team—tried to cut the enormous line by sneaking in a back entrance. “I walked up to the security guard and said, ‘I have to get to a meeting in there,’” Siroker recalls. There was no meeting, but his bluff got him in.
At the talk, Obama fielded a facetious question from then-CEO Eric Schmidt: “What is the most efficient way to sort a million 32-bit integers?” Schmidt was having a bit of fun, but before he could move on to a real question, Obama stopped him. “Well, I think the bubble sort would be the wrong way to go,” he said—correctly. Schmidt put his hand to his forehead in disbelief, and the room erupted in raucous applause. Siroker was instantly smitten. “He had me at ‘bubble sort,’” he says. Two weeks later he had taken a leave of absence from Google, moved to Chicago, and joined up with Obama’s campaign as a digital adviser.
At first he wasn’t sure how he could help. But he recalled something else Obama had said to the Googlers: “I am a big believer in reason and facts and evidence and science and feedback—everything that allows you to do what you do. That’s what we should be doing in our government.” And so Siroker decided he would introduce Obama’s campaign to a crucial technique—almost a governing ethos—that Google relies on in developing and refining its products. He showed them how to A/B test.
Over the past decade, the power of A/B testing has become an open secret of high-stakes web development. It’s now the standard (but seldom advertised) means through which Silicon Valley improves its online products. Using A/B, new ideas can be essentially focus-group tested in real time: Without being told, a fraction of users are diverted to a slightly different version of a given web page and their behavior compared against the mass of users on the standard site. If the new version proves superior—gaining more clicks, longer visits, more purchases—it will displace the original; if the new version is inferior, it’s quietly phased out without most users ever seeing it. A/B allows seemingly subjective questions of design—color, layout, image selection, text—to become incontrovertible matters of data-driven social science.
After joining the Obama campaign, Siroker used A/B to rethink the basic elements of the campaign website. The new-media team already knew that their greatest challenge was turning the site’s visitors into subscribers—scoring an email address so that a drumbeat of campaign emails might eventually convert them into donors. Their visit would start with a splash page—a luminous turquoise photo of Obama and a bright red “Sign Up” button. But too few people clicked the button. Under Siroker’s tutelage, the team approached the problem with a new precision. They broke the page into its component parts and prepared a handful of alternatives for each. For the button, an A/B test of three new word choices—”Learn More,” “Join Us Now,” and “Sign Up Now”—revealed that “Learn More” garnered 18.6 percent more signups per visitor than the default of “Sign Up.” Similarly, a black-and-white photo of the Obama family outperformed the default turquoise image by 13.1 percent. Using both the family image and “Learn More,” signups increased by a thundering 40 percent.
Most shocking of all to Obama’s team was just how poorly their instincts served them during the test. Almost unanimously, staffers expected that a video of Obama speaking at a rally would handily outperform any still photo. But in fact the video fared 30.3 percent worse than even the turquoise image. Had the team listened to instinct—if it had kept “Sign Up” as the button text and swapped out the photo for the video—the sign-up rate would have slipped to 70 percent of the baseline. (“Assumptions tend to be wrong,” as Siroker succinctly puts it.) And without the rigorous data collection and controls of A/B testing, the team might not even have known why their numbers had fallen, chalking it up perhaps to some decline in enthusiasm for the candidate rather than to the inferior site revamp. Instead, when the rate jumped to 140 percent of baseline, the team knew exactly what, and whom, to thank. By the end of the campaign, it was estimated that a full 4 million of the 13 million addresses in the campaign’s email list, and some $75 million in money raised, resulted from Siroker’s careful experiments.
A/B testing was a new insight in the realm of politics, but its use on the web dates back at least to the turn of the millennium. At Google—whose rise as a Silicon Valley powerhouse has done more than anything else to spread the A/B gospel over the past decade—engineers ran their first A/B test on February 27, 2000. They had often wondered whether the number of results the search engine displayed per page, which then (as now) defaulted to 10, was optimal for users. So they ran an experiment. To 0.1 percent of the search engine’s traffic, they presented 20 results per page; another 0.1 percent saw 25 results, and another, 30.
Due to a technical glitch, the experiment was a disaster. The pages viewed by the experimental groups loaded significantly slower than the control did, causing the relevant metrics to tank. But that in itself yielded a critical insight—tenths of a second could make or break user satisfaction in a precisely quantifiable way. Soon Google tweaked its response times and allowed real A/B testing to blossom. In 2011 the company ran more than 7,000 A/B tests on its search algorithm. Amazon.com, Netflix, and eBay are also A/B addicts, constantly testing potential site changes on live (and unsuspecting) users.
Today, A/B is ubiquitous, and one of the strange consequences of that ubiquity is that the way we think about the web has become increasingly outdated. We talk about the Google homepage or the Amazon checkout screen, but it’s now more accurate to say that you visited a Google homepage, an Amazon checkout screen. What percentage of Google users are getting some kind of “experimental” page or results when they initiate a search? Google employees I spoke with wouldn’t give a precise answer—”decent,” chuckles Scott Huffman, who oversees testing on Google Search. Use of a technique called multivariate testing, in which myriad A/B tests essentially run simultaneously in as many combinations as possible, means that the percentage of users getting some kind of tweak may well approach 100 percent, making “the Google search experience” a sort of Platonic ideal: never encountered directly but glimpsed only through imperfect derivations and variations.
Still, despite its widening prevalence, the technique is not simple. It takes some fancy technological footwork to divert user traffic and rearrange a site on the fly; segmenting users and making sense of the results requires deep knowledge of statistics. This is a barrier for any firm that lacks the resources to create and adjudicate its own tests. In 2006 Google released its Website Optimizer, which provided a free tool for anyone who wanted to run A/B tests. But the tool required site designers to create full sets of code for both A and B—meaning that nonprogrammers (marketing, editorial, or product people) couldn’t run tests without first taxing their engineers to write multiple versions of everything. Consequently there was a huge delay in getting results as companies waited for the code to be written and go live.
In 2009 this remained a problem in need of a solution. After the Obama campaign ended, Siroker was left amazed at the efficacy of A/B testing but also at the paucity of tools that would make it easily accessible. “The thought of using the tools we used then made me grimace,” he says. By the end of the year, Siroker joined forces with another ex-Googler, named Pete Koomen, and they launched a startup with the goal of bringing A/B tools to the corporate masses, dubbing it Optimizely. They signed up their first customer by accident. “Before we even spent much time working on the product,” Siroker explains, “I called up one of the guys from the Obama campaign, who had started up a digital marketing firm. I told him what I was up to, and about 20 minutes in, he suddenly said, ‘Well, that sounds great. Send me an invoice.’ He thought it was a sales call.”
The pair had made a sale, but they still didn’t have a product. So Siroker and Koomen started coding. Unlike the earlier A/B tools, they designed Optimizely to be usable by nonprogrammers, with a powerful graphical interface that lets clients drag, resize, retype, replace, insert, and delete on the fly. Then it tracks user behavior and delivers results. It’s an intuitive platform that offers the A/B experience, previously the sole province of web giants like Google and Amazon, to small and midsize companies—even ones without a hardcore engineering or testing team.
What this means goes way beyond just a nimbler approach to site design. By subjecting all these decisions to the rule of data, A/B tends to shift the whole operating philosophy—even the power structure—of companies that adopt it. A/B is revolutionizing the way that firms develop websites and, in the process, rewriting some of the fundamental rules of business.
Here are some of these new principles.