Erik's blog

Code, notes, recipes, general musings

barcamp san diego 5: “hbase, cassandra, bigtable, simpledb discussion”

leave a comment »

– amazon dyno (dynamo?)
– cassandra
— latest time stamp wins

– managing distributed records
— use checksum to verify data health

– why use an hbase
— random reads on disks are slow; reading from sequential data on disk is the only way to go
— simple fetch queries are roughly equivalent to an hbase lookup

– hdfs / hbase division?

– how to update record?
— hbase is not replacing relational dbs; they are used in conjunction.
— they can replace relational dbs, if the data we’re storing is normalized by nature, eg we’re just using it for user records
— if the data is actually normalized in the hbase, the update is straightforward.  If the data is denormalized in the hbase, we’re better off having the data normalized in a relational db, updating the normal db, and then updating the hbase in a batch process later.

– memcache vs hbase

– db sharding
— painful because it’s application logic and relational dbs are optimized for joins.
— hbase is optimized for sharding

Advertisements

Written by Erik

May 31, 2009 at 2:48 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: