Erik's blog

Code, notes, recipes, general musings

Archive for May 2009

barcamp san diego 5: “monitoring websites and iphone apps”

leave a comment »

– levels of monitoring
— cust experience
— synthetic api
— synthetic ui
— server health
— jbosss/jvm
— db

– strategy
— build a mesh of monitoring, eg great customer experience even though a jboss server failed

– quicken iphone experience
— built on top of an existing service
— mobile app services need to be decoupled from any underlying product
— recording, logging, and monitoring transactions need to be captured in such a way as to determine new info, eg device type
— building and testing on an iphone is very different from the traditional web model
— put app and provision file on a server so all testers can install the app remotely
— apple approval process is a black box

– operation and design considerations
— apps must be designed to handle variables beyond backend’s control
— network failure
— failure of required external services, eg google maps
— crash logs sent back to server
— accepting call in the middle of transaction will close app

@dmkanter on twitter

– tools
— nagios
— used for monitoring databases
— bigbrother
— gomez
— used for synth ui testing


Written by Erik

May 31, 2009 at 3:31 pm

barcamp san diego 5: “hotruby + jquery”

leave a comment »

– how to use ruby in the browser w/ using jquery
– history of js
— created by brendan eich in 1994
— based on scheme and lisp
— netscape got partnership w/ sun, so the language had to look like java
— IE duplicates it
— netscape tries to standardize it
— prototype -> implemented -> deployed -> standardized very quickly

– js is not dead yet
— most common platform (now on server w/ rhino)
— the most common programming lang
— the lang of the internet
— most problems aren’t so bad

– hotruby
— may be more performant than interpreted ruby
— write in ruby, compile it, and exec it in js vm
— not much changed after 2007
— original hotruby on github
— takes ruby bytecode as an input and runs it via js in the browser(!)
— now plugin req’d

– speaker’s work
— speaker has been updating it to ruby 1.9.1
— run this and look at response in firebug to see ruby bytecode
— sends text boxt content to server, compiles it to ruby bytecode, sends it back, and runs it using hotruby interpreter
— will run in all browsers
— look for “hotruby” in

Written by Erik

May 31, 2009 at 3:00 pm

Posted in notes

Tagged with , , , , , ,

barcamp san diego 5: “cloud computing on EC2”

leave a comment »

– rightscale alternatives
— chef
— sponsored by att
— an opensource ruby project
— puppet
— cfengine
— way cheaper than rightscale

– ec2 alternative
— eucalyptus
— opensource project on a private cloud
— akamai now hosts applications on their edge servers

– load balancer
— haproxy (
— software based
— allows us to route all traffic to a new cluster once it’s launched and running
— red5
— hardware based
— algorithms
— round robin

Written by Erik

May 31, 2009 at 2:56 pm

barcamp san diego 5: “hbase, cassandra, bigtable, simpledb discussion”

leave a comment »

– amazon dyno (dynamo?)
– cassandra
— latest time stamp wins

– managing distributed records
— use checksum to verify data health

– why use an hbase
— random reads on disks are slow; reading from sequential data on disk is the only way to go
— simple fetch queries are roughly equivalent to an hbase lookup

– hdfs / hbase division?

– how to update record?
— hbase is not replacing relational dbs; they are used in conjunction.
— they can replace relational dbs, if the data we’re storing is normalized by nature, eg we’re just using it for user records
— if the data is actually normalized in the hbase, the update is straightforward.  If the data is denormalized in the hbase, we’re better off having the data normalized in a relational db, updating the normal db, and then updating the hbase in a batch process later.

– memcache vs hbase

– db sharding
— painful because it’s application logic and relational dbs are optimized for joins.
— hbase is optimized for sharding

Written by Erik

May 31, 2009 at 2:48 pm

Google I/O notes: “Transactions Across Datacenters (and Other Weekend Projects)”

leave a comment »

Transactions Across Datacenters (and Other Weekend Projects)
– master/slave replication
— usually asynch
— weak/eventual consistency: granularity matters
— datastore: current
— this is how app engine “multi-home”s the datastore
– multi-master replication
— one of the most fascinating areas of computer science
— eventual consistency is the best we can do
— need serialization protocol
— a global timestamp
– 2 phase commit
— semi-distributed protocol: there is always a master coordinator
— ah! got an emergency call – gotta go – but this was the best talk ever…… 😦

Written by Erik

May 28, 2009 at 9:44 am

Posted in notes

Tagged with , , ,

Google I/O notes: “A Design for a Distributed Transaction Layer for Google App Engine”

leave a comment »

A Design for a Distributed Transaction Layer for Google App Engine
– distributed algorithms are difficult to impossible to debug – they must be proved correct
– correctness and performance are at the heart of engineering
– what is your goal? start there and work backwards, but keep focused on the goal
//check out codecon next year
– invariance
— correctness requires invariance
— a sentence that doesn’t change when everything else is changing
— initialize invariants during construction
— isolation and atomicity
– scalability
— deconstruct what you’re doing and figure out how to spread it out everywhere
— distributed machines are unreliable, non-serial, non-sychronized
– transactions
— a “good” state is one in which all invariants are satisfied
— invariants must be temporarily violated
— a “transaction” is a set of operations that take us from one good state to another
— durable: state persist
— atomic and isolated: no in-between states
— consistent: only jump from one good state to the next
— in app engine, “entity groups” partition data
— you can’t run queries in a transactional app engine
– algorithm (read this whitepaper)
— very similar to two-phase commit (‘there are only so many good ideas’)
— 1) run client
— records version numbers
— 2) get write locks
— 3) check version
— 4) copy shadows
— details
— deadlock prevention
—- get locks in a certain order
— ongoing progress
—- 10-100 x reads to writes in web apps
— concurrent roll-forward
— proof of isolation
— light swtiches are idempotent
– eventual vs. strong vs. causal consistency
— app engine uses strong consistency
– local vs distributed transactions
— local transactions are cheaper
— no read-after-write, and no write-after-write, because writes are buffered – enforce hard rules for scalability
– be able to tell if a transaction has or has not happened; provide ids for each transaction
– questions
— will this be released as a library or built-in?
— it’ll be released as an opensource library called “tapioca”
— roll-forward vs. roll-back?
— when a write takes place, a “diff” is generated against the db as a shadow object. at the correct time, this shadow object is incorporated as a “roll-forward” of the db.
– use transactions anytime you are going to violate an invariant to ensure we return to a good state

Written by Erik

May 28, 2009 at 9:37 am

Posted in notes

Tagged with , , ,

Google I/O notes: “Designing OpenSocial Apps for Speed and Scale”

leave a comment »

conference page

web dev best practices
– concat js and css files
– compress js and css
– quartermile api improvement after using yui compressor

– use firebug for local testing
– measuring latency internationally
— use js: create img element w/ onload fn and report netwrok latency back
— do the same for data calls

image spriting
– use yslow to measure network requests

– can be an apache config FilesMatch
– it looks like we can do this in php as well by just setting a header
– use a rand var in query string to bust cache

server-assisted optimizations
– for social networks, forcing performance tweaks results in network-level performance gain
– social networks usually have better infrastructure than app devs
– opensocial
— puts the url passed in on the cdn of the container
— content-rewrite feature in gadget spec controls which features are optimized
— use batch operations for network requests

opensocial best practices
– use OS 0.9 data pipelining and proxied content
– look on wiki.opensocial to learn more about data pipelining
– invalidation pattern
— invalidates cache for app associated w/ oauth key

database scaling
– joins are very expensive
– use master/slave architecture to scale horizontally
– use database partitioning to overcome replication costs
– bigtable already handles this
– use memcache to cache everything you can and filter db results in software-layer
– store frequently used data in a json blob

background processing
– as activities are generated, a note is made in a queue, on a seperate schdule, a bg process runs ops on the items in queue
– users see a slight delay (minutes)
– in app engine, use cron.yaml

quartermile’s (qm) data model
– enforce hard limits up front
– orkut limits friend lists to 1K, but myspace allows 100k +
– qm create artificial team concept and updates are limited to them
– app engine limits results to 1000 rows
– use cron to process and store expensive queries in the background
– store data in OS app data
— it’s all public, so don’t put secrets in here
— user-writable via js
— super fast
— perfect for storing pre-rendered bulk data
— in bg, render data, push it to app data. when app loads, retreive data from app data and store it in div!
– goals
— keep gadgets fast! we’re competing for a user’s attention
//use screencasts in presos

template-only profiles
– a way to declare the data an app needs w/o js
– produces fast profile rendering
– “process-on-server” OS feature
– all app’s on orkut profiles must use templates

– invalidation, cache headers, bg processes, app data, limited profiles have benefits in multiple ways
– cahcing static content can reduce traffic by 90%

//”office hours” supplement talk subjects

Written by Erik

May 28, 2009 at 9:03 am

Posted in notes