Tuesday, October 7, 2008

Python: Stupid-Simple YouTube Comment Scraper

For work, I've needed to grab a ton of comments from YouTube videos.

YouTube has an API with Python samples. It's a mixed blessing. If you've ever had to develop against a Microsoft API, I have bad news for you: you remember that never-ending hellhole of searching for documentation to accomplish a trivial task? With Google, it's worse. The information is every bit as badly organized, but the documentation is less comprehensive, so you can spend just as much time searching and still find nothing because nothing's there.

In my frustration I tweeted about Python being a pain in the ass, but in fairness I have to say I've written stuff in Python that I really enjoyed. The problems I encountered came from Google's documentation, plus the rustiness of my Python skills, not from Python itself. Sorry to harsh on Google, I know some things they do are brilliant, but making it easy to search their documentation is not one of those things.

What I needed was a corpus of a thousand YouTube comments. What I settled for: probably less than 100. There's a way to get much larger numbers of comments, but if you can figure it out, please blog it. I didn't have the time. I settled for a cheap, crappy script that takes hardcoded video ids and gives you a bunch of comments.

#!/usr/bin/python

import gdata.youtube
import gdata.youtube.service

youtube_service = gdata.youtube.service.YouTubeService()

# randomly selected, mostly funny, some just stupid and lame
video_ids = ["6d26GGXkzR0", "21OH0wlkfbc", "PbeMwl_PA6A", "aSk0KDAc1gs"]
video_ids += ["JfXDAUIc9Xc", "hPPmLZ9dcRA", "frO6AC6wAc0", "0_fPV13lKm4"]
video_ids += ["nTasT5h0LEg", "A2syxXPR7xY", "MUaSxZf35O8", "wqXW-OMvpP0"]
video_ids += ["puHITWjTc_Q", "bHovtS_QWI4", "ao-9B8IV9_E", "xZlDb6vVsPw"]

for id in video_ids:
for comment in youtube_service.GetYouTubeVideoCommentFeed(video_id = id).entry:
print "..."
print comment.content.text
print "..."