Monday, April 5, 2010

Evaluating Reddit Ads With Bayesian Probability: Messy First Attempt

Not long ago I did an experiment with Reddit ads, promoting my first information product, and an extremely naive analysis led me to believe the Reddit ads gave me an astounding 200% ROI. Yow!

However, I don't yet have proof of the ROI, and I need to be sure. The big ironic secret about Internet marketing is that the cartoon stereotype Internet marketing business which sells ridiculous ebooks on how to lose weight, bang hot chicks, and/or teach your parrot to sing the blues does so in a much more scientific way than any company I've ever seen in the tech industry - and I've seen a lot. It's all about precise math and clean data. If you can't prove an Internet marketing ad's usefulness mathematically, you don't buy.

Me, I've been in tech long enough to be tainted with its superstition, but I'm warming up to the Internet marketing way of thinking, so here's what I did. I spent $100 on Reddit ads, and $150 on books about statistics, probability, and web analytics. Then, since I had my data in CSV format and I was comfortable surfing around inside it with Utility Belt's unique combination of IRB and vim, I decided to have a little fun with it. By "fun" I mean "math."

By the way, at MountainWest RubyConf, I showed pictures of adorable or scary animals to counteract the inherent boredom of reading math.

Like, for instance, the hippopotamus.

I can't believe I found this.

The math to use: Bayes' Theorem. Here's Bayes' Theorem in a nutshell, first in English, then in Math, and finally in Ruby:

To find the probability that it rained, if you know that the ground is wet, multiply the probability of the ground being wet, when it rains, times the probability of it having rained, and then divide all that by the probability of the ground being wet.

P(R, if W) = (P(W, if R) * P(R)) / P(W)

I stole the code from a Python program I wrote 2005ish called rock_paper_scissors, which I had copied from an O'Reilly book which presented it in C, not Python (and used the more violent metaphor of kung fu). rock_paper_scissors learns your rock-paper-scissors strategy; if you play against it over and over again, the same way every time, it'll figure out how to beat you. (Given one of the dumbest possible interpretations of "figure out".)

I used Python because I'd never coded anything in C. However, if I'd known C, I would have used C. If I had learned Lisp first, and gotten incredible at it, I'm sure this code would be amazing. However, here's a hippo crossing a river.

So let's find the probability that I'll make money, if I buy Reddit Ads, by multiplying the probability that I buy Reddit Ads, when I make money, times the probability that I'll make money, and then dividing it all by the probability that I'll buy Reddit Ads.

P($, if R) = (P(R, if $) * P($)) / P(R)

I don't want to go into the numbers here; I'm shy about sharing private business data, and these numbers fail the science test. There's traffic unaccounted for, from both Hacker News and Twitter; I didn't do any click-tracking on my Reddit ads; Reddit and Google Analytics of course give me conflicting numbers of pageviews; and I hadn't at this point discovered the CSV export in Google Checkout, so I only did my math against the sales from PayPal.

However, to get these values, you're just looking at the frequency of these events. The O'Reilly book equivalated frequencies with probabilities, and until I get all the way through that $150 worth of books on probability and statistics that I ordered from Amazon, that's going to have to be good enough for me. Probability of making money, for example, was the number of times I saw sales above the daily average, divided by the number of days for which I was looking at sales figures. Probability I'd run Reddit ads was calculated a similar way. You get the idea.

This initial run through Bayes' Theorem indicated a 45% probability that running Reddit ads would give me a nice boost in sales. I looked at that 45% number and decided it was wrong, because, comparing it to my sales/traffic spreadsheet, I saw that both times I had run Reddit ads so far, I had seen a sales boost. So I disregarded my Bayesian analysis and went and bought more on Reddit ads. The ads didn't sell.

Two failed Reddit promos followed after a pair of successful ones, which very roughly validates the 45% rate my Bayesian analysis had predicted. You have to take that with a massive grain of salt, given all the shortcomings in my "science" here, but even so, the next time my numbers tell me I only have a 45% chance of seeing a return on investment, and I see a string of ROI, I might just recognize a lucky streak for what it is and avoid throwing good money away.