notes from cloudera basic training > installing the cloudera hadoop distro locally and on ec2
ref: http://www.cloudera.com/hadoop
installing cloudera distro on a cluster
- motivation: hadoop is complocated to install
- cloudera uses Alternatives to manage a|b testing
- cloudera has created a “configurator” that will generate an rpm customized to your cluster
– generates the configuration files and a custom installer. Each can optionally be used together or separately
- alternatively, you can install an unconfigured distro
- for large scale deployment, useĀ puppet, bcfg2, cfengine, etc. to manage the cluster
– cloudera’s tool can still be used to generate config scripts
- storing data in ebs takes advantage of locality and is much faster than s3
– ebs is more performant than normal hard drives




