Erik's blog

Code, notes, recipes, general musings

hadoop summit 09 > applications track > Case Studies on EC2

leave a comment »

ref: http://developer.yahoo.com/events/hadoopsummit09/

– eHarmony

— matching people is an N^2 process

— run hadoop jobs on EC2 and S3

— results downloaded from S3 and imported into BerkeleyDB

— S3 is a great place to store huge files for a long time because it’s so cheap

— switched from bash to ruby because ruby has better exception handling

— elastic map reduce has replaced 150 lines of ec2 management script

 

– share this

— simplifies sharing online content: delicious + ping.fm + bit.ly

— they’re a small compan, but they need to keep pace w/ the volume of the large publishers they support

— they’re 100% based on AWS

— aster + lamp stack + cascading running hadoop (to clean logs before pushing data into db) + s3 + sqs

— sharded search mostly used for business intel

— cascading allows efficient hadoop coding, more so than pig

— in the hadoop book, the author of cascading wrote a case study on sharethis

 

– lookery

— started as an ad network on facebook

— built completely on aws

— use a javascript-based tracker like google analytics to gather data

— data acquisition + data serving + reporting + billing–> all done in hadoop

— they use voldemort, a distributed key/val store instead of memcache

— heavy use of hadoop streaming w/ python

 

– deepdyve

— a search engine

— having an elastic infrastructure allows for innovation

— using hadoop, they went from 1 wk to 1 hr for indexing

— start spinning up new clusters and discarding old ones

— ec2 + katta + zookeeper + hadoop + lucene –>most of the software they run, they didn’t have to write

— query times are lower, user satisfaction is higher

— problems:

— unstable aws

— session timeout on zookeeper

— slow provisioning for aws

— with aws, they can run load tests to prepare for spikes

Advertisements

Written by Erik

June 10, 2009 at 1:51 pm

Posted in notes

Tagged with , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: