Tuesday, September 30, 2008

Blogger Claims They Can Prove I Am A Bot

Each graph has a unique distribution of ups and downs, but most contemporary writers define a fairly narrow subset of word frequencies in any given language. What caught our attention in the above data set was the total word count: over 150K words for Giles. A closer look showed that Giles posted 577 blog posts in 2007, and 365 so far in 2008 as of September 30th. Giles averages well over one post per day, and finds time to include video and images in most posts. We hypothesized that Giles profuse writing could be a symptom of hypergraphia, and so we took a closer look.

We analyzed the data from Giles' blog in a few ways, but the following chart shows the most interesting result. This is a comparison of simple literary fingerprints, like the chart above; however, here we are comparing Giles to himself in three different years: 2008, 2007, and 2006.

Giles' fingerprints for 2007 and 2006 are typical: two graphs that rise and fall from word to word in unison, with little statistical deviation over time. 2008 is different. In 2008, the area under his fingerprint graph is much less, indicating a significant increase in his vocabulary. This is not ordinary human behavior, and would require a significant conscious exertion on Giles' part if he were human.

There's a lengthy, detailed blog post and a GitHub repo.

Most obvious methodology flaw: my posts often quote other people's posts at length.

I'll admit I have investigated auto-blogging technologies, even looking into creating an army of Twitter bots who argue with each other, but haven't had the time to build anything serious.