Erik's blog

Code, notes, recipes, general musings

silicon valley hadoop user group 5-20-09: cloudera on automatic database import w/ sqoop

leave a comment »

motivation
- hadoop is great for unstructured data
- hadoop is not great for structured data
- how to glue data from mysql to unstructured data for hadoop

DBInputFormat
- uses jdbc to connect to db

DBWritable
- a bridge from jdbc result set to mapper value

Sqoop
- SQL-to-Hadoop
- jdbc-based interface
- auto datatype generation
- uses mapreduce to read tables from db
- imprts into hdfs and creates java file
- easy to import into hive
- serialized output is comma-separated

Advertisements

Written by Erik

May 20, 2009 at 6:16 pm

Posted in notes

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: