Erik's blog

Code, notes, recipes, general musings

notes from Cloud Expo 2009: Christophe Bisciglia on “Working with Big Data and Hadoop”

leave a comment »

falicies
– machines are reliable
– machines are unique or identifiable
– a data set should fit on one machine

hadoop
– it’s not a database
— it doesn’t serve data in real-time
— it augments existing DBs
— it does enable deeper analysis that would normally slow a relational DB
– leverages commodity hardware for big data & analytics
– cloudera does for hadoop what redhat does for linux

examples
– fox
— hat ppl are watching on set-top obxes
– autodesk
– D.E.Shaw
— analyze financial data
– mailtrust
— use hadoop to process mail logs and generate indexes that suport staff can use to make adhoc queries

data
– scientific and experimental data
– storage
— multiple machines are req’d to store the amount of data we’re interested in
— replication protects data from failure
— data is also 3 times as available

map-reduce
– allows for processing data locally
– allows for jobs to fail and be restarted

hadoop’s fault tolerance
– handled at software level

using hadoop
– map-reduce
— natively written in java
— map-reduce can be written in any language
– hive
— provides sql interface
– pig
— high level lang for ad-hoc analysis
— imperative lang
— great for researchers and techinical prod. managers

high performance DB and analytics.  when is it time for hadoop
– in general
— generation rate exceeds load capacity
— performance/cost considerations
— workloads that impede performance
– db
— 1000s of transactions per second
— many concurrent queries
— read/write
— many tables
— structured data
— high-end machines
— annual fees
– hadoop
— append only update pattern
— arbitrary keys
— unstructured or structured data
— commodity hardware
— free, open source

arch
– traditional: web server –> db –> oracle –> biz analytics
– hadoop: web server –> db –> hadop –> oracle –> biz analytics

cost
– data storage costs drops every year
– hadoop removes bottlenecks; use the right tool for the job
– makes biz intel apps smarter

tools
– cloudera’s distro for hadoop
– cloudera desktop

Advertisements

Written by Erik

November 3, 2009 at 6:26 pm

Posted in notes

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: