Erik's blog

Code, notes, recipes, general musings

Archive for January 2011

A dive into HTTP 1.1 message formatting

leave a comment »

It’s time to take a moment and think about Hypertext Transfer Protocol (HTTP) message formatting, specifically HTTP 1.1.  To quote Wikipedia’s article on HTTP, “HTTP functions as a request-response protocol in the client-server computing model”. The article also provides an example I find helpful:

In HTTP, a web browser, for example, acts as a client, while an application running on a computer hosting a web site functions as a server. The client submits an HTTP request message to the server. The server, which stores content, or provides resources, such as HTML files, or performs other functions on behalf of the client, returns a response message to the client. A response contains completion status information about the request and may contain any content requested by the client in its message body.


In general, what does an HTTP request look like? We can see one by requesting via cURL on the command line:

$ curl -v

> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host:
> Accept: */*
< HTTP/1.1 301 Moved Permanently
< Date: Sat, 22 Jan 2011 20:36:32 GMT
< Location:
< Vary: Accept-Encoding
< Content-Type: text/html; charset=utf-8
< Cache-Control: private
< Age: 0
< Transfer-Encoding: chunked
< Connection: keep-alive
< Server: YTS/1.18.5

In the output above, the request follows the request message format defined in the HTTP 1.1 specification (hereafter referred to as “the spec”):

Request = Request-Line
          *(( general-header
            | request-header
            | entity-header ) CRLF)
            [ message-body ]


As an aside, I have to draw attention to the usage of carriage return line feed (CRLF) in there. Douglas Crockford puts this in perspective in volume 1 of his lecture series Crockford on JavaScript:

One thing that is odd about ASCII is that it has a carriage return character and a line feed character. This was to model the way that Teletypes actually worked, where the carriage return character would take the print element and push it over to the left. The line feed character would take the platen and spin it one line. So most lines are going to end with going back and rolling the paper, and it took two separate codes to do that. Most timesharing systems didn’t require people to type in both codes — generally they would allow people to hit the return key, and then they would echo the line space key, just because there’s no reason to make people type both characters. Also, other devices don’t work that way. Most other printers of the time would just take a line of text and print it and advance; there was no way to separate the carriage return from the line feed function. So this was a pretty device specific thing.

Most systems who adopted ASCII as their character set chose one or the other. The systems that tended to be more hardware focused in their orientation tended to pick line feed, and the systems that tended to be more human focused tended to pick carriage return, and that was fine until they needed to interoperate. Then you’d have a committee of people, some using line feed, some using carriage return — how do you resolve that? You could just pick one. You could even flip a coin, because it really doesn’t matter. But these committees could not decide. Nobody wanted to be the guy who got it wrong, and nobody wanted to be the guy who had to change, so they came up with a mutually disagreeable compromise, which is: We will always require both. So that’s the way the internet protocols work. We haven’t been using Teletype machines in I don’t know how many years — they’re decades obsolete — but we’re still forcing both sets of control codes to be transmitted in HTTP because of this Teletype heritage.

Back to the example, the Request-Line is “GET / HTTP/1.1”. The spec defines the components of this line as:

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

The Method is “GET”, the Request-URI is “/” (relative to the domain being called, i.e., we’re requesting the root of, the HTTP-Version is “HTTP/1.1”.

We also have a few headers in there: User-Agent, Host, and Accept.  The spec defines headers as follows:

The request-header fields allow the client to pass additional information about the request, and about the client itself, to the server. These fields act as request modifiers, with semantics equivalent to the parameters on a programming language method invocation.

All of the headers in our request are request-headers, as opposed to general- or entity-headers.  The User-Agent header describes client making the request.  It is optional, but helpful for the service receiving the request, and so “User agents SHOULD include this field with requests.”  The Host header is required (a “client MUST include a Host header field in all HTTP/1.1 request messages”).The Accept header tells the server what type of media is acceptable for the response.  In the request, we’re saying all media types are acceptable, i.e., we’ll accept PDFs, HTML, RSS, etc.

We don’t have a message-body, so there’s not much more to say about the request message.




The format for responses is very similar to that for requests:

Response = Status-Line
           *(( general-header
            | response-header
            | entity-header ) CRLF)
            [ message-body ]


In Yahoo!’s response, the Status-Line is “HTTP/1.1 301 Moved Permanently”, which, when broken into its constituents, tells us that the HTTP-Version of the message format is “HTTP/1.1”, the Status-Code is “301”, and the Reason-Phrase is “Moved Permanently”. The spec says “The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.”

The first digit of the Status-Code communicates the general type of the response:

– 1xx: Informational – Request received, continuing process

– 2xx: Success – The action was successfully received, understood, and accepted

– 3xx: Redirection – Further action must be taken in order to complete the request

– 4xx: Client Error – The request contains bad syntax or cannot be fulfilled

– 5xx: Server Error – The server failed to fulfill an apparently valid request

Specific, pre-defined Status-Codes are described in detail by the spec, but the spec is extensible, so services can define their own codes. For example, Yahoo! and Twitter will return 999 and 420, respectively, for requests exceeding rate limits.

When a service returns a custom Status-Code unknown to the client, the Reason-Phrase can help a user determine the status of the response. The spec doesn’t explicitly state this, but it seems like the Reason-Phrase is arbitrary. Twitter’s Reason-Phrase for 420 made me laugh out loud: Enhance Your Calm. I love web services with a sense of humor.

Yahoo!’s response contained several headers: Date, Location, Vary, Content-Type, Cache-Control, Age, Transfer-Encoding, Connection, and Server.

The Date general-header communicates the time at which the response was generated. I can see how this would be helpful for debugging clock-skew issues in signed requests.

The Location response-header “is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource”. I most often see Location used with 3xx responses, i.e., redirect to this location, but I recently learned of another use, one that’s actually called out in the spec:

For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server’s preferred URI for automatic redirection to the resource.

This seems intuitive to me. Suppose we make a request to create a new object, e.g.,
curl -X POST, then it makes sense that would return 201 with a Location header pointing to the new resource.

The spec states that the Vary response-header “indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation”, but its use is still a bit unclear to me. Fortunately, Subbu Allamaraju, author of O’Reilly’s RESTful Web Services Cookbook, posted an informative analysis of the Vary header on his blog. According to his write-up a “server can use this response header to indicate the client of the list of request headers it uses to resolve a given URI to a representation”.

In the example, Yahoo! is telling us that it uses the Accept-Encoding request-header to determine which representation of the resource to return. In other words, requesting gzipped via Accept-Encoding: gzip will result in a different representation of the resource than requesting uncompressed. If a user agent knows this, it can cache the returned resource accordingly.

The Content-Type entity-header “indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.” In our case, Yahoo! sent us back text/html, using the utf-8 charset.

The Cache-Control general-header “is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain.” In the response, Cache-Control is set to “private” meaning “all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache”. This makes sense because Yahoo! displays private data on its home page for logged-in users, and we wouldn’t want this content cached and displayed to other users.

The Age response-header “conveys the sender’s estimate of the amount of time since the response (or its revalidation) was generated at the origin server”, literally, the age of the resource. The value is given in seconds. So our response was 0 seconds of age.

The Transfer-Encoding general-header “indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient”. Transfer-Encoding differs from Content-Encoding in that the former refers to the transmission while the latter refers to the entity being transmitted. Yahoo!’s response was “chunked“, meaning the message body is transmitted in a series of pieces, as defined by the spec.

Connection is a general-header that defines the TCP connection behavior for communication between the client and server. In Yahoo!’s response, the Connection header is set to keep-alive. Maintaining a persistent connection is more efficient than opening and closing connections for each request/response, and HTTP 1.1 made persistence the default behavior, but for backwards compatibility, servers can also send Connection: keep-alive to maintain a connection that would otherwise be closed.

The Server response-header simply communicates the server that handled the request. In Yahoo!’s case, its Yahoo! Traffic Server (YTS), a.k.a. Apache Traffic Server.


In short, curl -v and the spec are our friends.  HTTP is a standard for transmitting hypertext and defines things such as request methods and response codes.  HTTP interactions consist of requests and responses.  Requests look something like this:

GET / HTTP/1.1
headers \r\n


and responses look something like this:

HTTP/1.1 200 OK
headers \r\n



Written by Erik

January 23, 2011 at 12:26 am

Posted in specification

Tagged with

children of the node

leave a comment »

Suppose you’d like to traverse through a family tree in JavaScript, printing each generation of children on a single line. Why? Who knows, but lets suppose you’re so possessed by the idea that you’re losing sleep over it.

The tree looks like this:



     1       -       2

     |                 |

3    -    4            -    6


The correct output would look like this:


1 2

3 4 6

This sounds a lot like a breadth-first search to me, but let’s forget for a moment that Wikipedia exists and think through this.

A verbal walk-through of the problem might sound like this:

  1. Visit root node
  2. Print root node and <br> tag
  3. Print all the children of the root node and another <br> tag
  4. Print all the children of each child node and another <br> tag …

Ok, that’s a mess.  Seems like recursion might help simplify things, but then I’d end up with a stack-based traversal due to the call stack, an idea I find amazing.  But what I want is something more like a queue; first in, first out; root in, root out, children in children out, children’s children in, children’s children out … breath in, breath out.  I feel like I’m in yoga class.  So soothing.  Here’s a Buddha by a koi pond:

"goldie the fish is blessed by the garden buddha"

golden buddha by Paul Moody

Verbal walk through part deux:

  1. Enqueue root node
  2. Dequeue node, print, and enqueue each child of the node
  3. Repeat from step 2

Supposing we have a queue, Q.  We can depict the tree, T, in code like this:

var T = [
    { left: 1, right: 2 },
    { left: 3, right: 4 },
    { left: null, right: 6 },
    { left: null, right: null },
    { left: null, right: null },
    { left: null, right: null },
    { left: null, right: null }

Following the second approach, we’d get

  1. Q = [node 0]
  2. “node 0”, Q = [node 1, node 2]
  3. “node 1”, Q = [node 2, node 3, node 4]
  4. “node 2”, Q = [node 3, node 4, node 6]
  5. “node 3”, Q = [node 4, node 6]
  6. “node 4”, Q = [node 6]
  7. “node 6”

which would be correct, but the line breaks are off.  We need to print all the children of a generation before printing a line break.

Verbal walk through take three:

  1. Enqueue root node
  2. While there are nodes in the queue, dequeue node, print node, and enqueue children of the node
  3. Print a line break
  4. Repeat from step 2

Following the third approach, we’d get

  1. Q = [node 0]
  2. “node 0”, Q = [node 1, node 2]
  3. “node 1 node 2”, Q = [node 3, node 4, node 6]
  4. “node 3, node 4, node 6”, Q = []

That’s it!  Nice.  Here’s some code:

function printTree(tree){

    var queue = [];

    // enqueue root
    queue.push( 0 );

    do {

        var len = queue.length;

        // for each node in the queue
        for( var i = 0; i < len; i++ ){
            // dequeue
            var index = queue.shift();

            // print node
            document.writeln( index );

            var node = tree[ index ];

            // enqueue children of the node
            if( node.left ) {
                queue.push( node.left );
            if( node.right ) {
                queue.push( node.right );

        // print a line break
    // repeat
    } while( 0 !== queue.length );

// run it

Written by Erik

January 18, 2011 at 11:22 pm

YUI 2in3 Modal Panel Example

leave a comment »

Suppose you’re using YUI 3.2 and you’d like a modal dialog.  YUI 3 Overlay provides an easy way to position an element above the others, but it doesn’t provide modality.  The Overlay Extras gallery module sounds perfect, but it seems to work best with YUI 3.1.0.

YUI 2 has exactly what we’re looking for in its Panel widget.  Fortunately, the YUI 2in3 project makes the Panel available in YUI 3.2.  

The sample code below demonstrates usage, and you can see a demo on

    <body class="yui-skin-sam">
        <p><button id="show">show modal</button></p>
        <!-- modal dialog content markup -->
        <div id="content" style="visibility:hidden">
            <div class="hd">Header</div>
            <div class="bd">
                <p><button id="hide">hide modal</button></p>
            <div class="ft">Footer</div>
        <script src=""></script>
        YUI().use('yui2-container', 'yui2-dragdrop', 'event', function(Y) {

            var YAHOO = Y.YUI2;
            var modal = new YAHOO.widget.Panel("content", {
                width: "240px",
                fixedcenter: true,
                close: true,
                draggable: true,
                zindex: 4,
                modal: true,
                visible: false

  '#show').on('click', function() {
  '#hide').on('click', function() {



Written by Erik

January 10, 2011 at 6:04 pm

Posted in Uncategorized

re-scoping javascript callback functions using call and apply

leave a comment »

jQuery assigns the this object in an event handler to the element that the event handler is bound to. For example, the following will log the element with the id arbitrary-id:

$("#arbitrary-id").click(function (e) { console.log(this); });

But what if we don’t want this to be bound the element we just clicked on? We can get the element anyways by referring to e.currentTarget. What if we want to assign this to an arbitrary element?

Develop a function, accessible as a method on any function, that will allow us to set the element this refers to inside the function. For example, this code will log the element with the id my-foo-div:

$("#arbitrary-id").click(function (e) { console.log(this); }.bind( $("#my-foo-div") ));

We need this method to be available on any function. We can add it to the Function prototype to accomplish this.

As an aside, why do we all cringe whenever anyone mentions doing anything to an object’s prototype? Are there really no safe use-cases? It seems like too powerful a tool to ignore completely.

But anyway, here’s the start of a method definition:

Function.prototype.bind = function (element) { console.log(element); };

We can call it like this:

( function () {} ).bind( $("#arbitrary-id") );

How to define the this object inside the callback function called by the jQuery click handler?  Hmm … Well, the callback function is just a function and we did just add a bind method to all functions, so we can start there. What if we just assign this to the element passed in, like so:

Function.prototype.bind = function (element) { this = element; console.log(this); };

Okay, that threw “ReferenceError: Invalid left-hand side in assignment”, so let’s try the call method, which exists to set the scope of a function at the time it’s called.

Function.prototype.bind = function (element) { console.log(this); };
( function () { console.log( this ); } ).call( $("#my-foo-div") ).bind( $("#my-foo-div") );

Things are starting to get seriously weird, and it looks like call doesn’t return a function anyway, which means we can’t chain bind to it. What are we doing again? Oh, yeah, basically, we want to be able to call call on a function, but have the function be runnable later as a callback, something like a deferred call.

We can at least get bind to return a function, so the function’s callable later:

Function.prototype.bind = function (element) { return function () { console.log(element); } };

Now if I run $("#arbitrary-id").click(function (e) { console.log(this); }.bind( $("#my-foo-div") ) );, the callback logs the element with id my-foo-div. So, I have a reference to my-foo-div inside the callback for the event handler attached to arbitrary-id. I think this is progress. I just need to get the callback function to run in the scope of my-foo-div and I think I can do this with call.

Function.prototype.bind = function (element) { return function () { console.log( element ) ); } };

I’m getting an error “Uncaught TypeError: Object # has no method ‘call'”, which makes sense because jQuery is still setting this to the element that the event handler is attached to, which is the whole point of this exercise, but we’re close! I need to get this to refer to the callback function itself, not a dom element. Inside the bind function definition, this refers to the caller, which is the callback function. Let’s cache this in the scope of the callback function, and then refer to the cached object inside the returned function:

Function.prototype.bind = function (element) { var that = this; return function () { console.log( that ); } };

Nice! When I click on the arbitrary-id element, I see “function (e) { console.log(this); }”, which is my stringified callback function, in the log. So now I just need to call that function via the call method, passing in the element I want to bind the scope to:

Function.prototype.bind = function (element) { var that = this; return function () { element ); } };

Yay! It works. Before calling it quits, I’d like to be able to pass arguments to the callback. Fortunately, call‘s cousin apply and the native arguments object makes this easy:

Function.prototype.bind = function (element) { var that = this; return function () { that.apply( element, arguments ); } };

This calls for an image to chill out to. Here’s a picture of a toucan doing his thing:

Toucan Three by Rhea Monique

Toucan Three by Rhea Monique

Written by Erik

January 6, 2011 at 6:00 pm

Posted in code

Tagged with , , ,

Thoughts on a simple search suggest implementation

leave a comment »

Most search engines will suggest search terms as you’re typing. For example, if I type “appleipod” on, it suggests “apple ipod”, “apple new ipod phone”, “apple ipod nano”, etc.

Screen shot of Yahoo! search suggest results

You are given an input string “appleipod” and a hash table of search terms { … “apple ipod”, “apple new ipod phone”, “apple ipod nano” … }. The table could contain 50,000 terms. How would you write a function to resolve the input string to least one suggestion?

Approach 1:
A couple approaches came to mind immediately: 1) search for “search suggestion algorithm”, as this is a solved problem; 2) iterate through the table performing regular expression tests on each entry, which would be slow, but simple.

Since the point is to come up with a solution from scratch that simply produces a suggestion, I started thinking about option two and came up with the following:

function suggest ( input, dictionary ) {
   var suggestions = [];
   for ( var term in dictionary ) {
      var regexp = new RegExp( "(.*)" + term + "(.*)" );
      if ( regexp.test( input ) ) {
         suggestions.push( term );
   return suggestions;

Running suggest( "appleipod", {"apple":true, "apple ipod":true, "a":true, "app":true} ); would match “a”, “app”, and “apple” in “appleipod”. It wouldn’t match “appleipod” to “apple ipod”, but the goal was just to suggest something, so it passes.

Another person suggested a different approach, Approach 2:

We can take advantage of the random access features of the hash table by iterating through the input string, character by character, checking to see if there’s an entry in the hash table:

function suggest ( input, dictionary ) {
   var suggestions = [];
   var suggestion = '';
   for ( var i = 0; i < input.length; i++ ) {
      suggestion += input[i];
      if ( dictionary[ suggestion ] ) {
         suggestions.push( suggestion );
   return suggestions;

That this would suggest “a”, “app”, and “apple” for “appleipod”, so it meets the requirements and is a much faster option.

As an Approach 3, I wonder is there’s a way to do hash/regex look up, i.e., dictionary[ regex ], so we could match any dictionary term containing the string “apple”. Sounds like another problem to think about 😉

Update (1/6/11)

Hold on. Neither Approach 1 or 2 satisfy the requirement of returning at least one suggestion given a table such as { … “apple ipod”, “apple new ipod phone”, “apple ipod nano” … }. What we should be testing is suggest( "appleipod", {"noise":true, "apple ipod":true, "apple new ipod phone": true, "apple ipod nano": true, "more noise": true} ); and verifying that it returns at least one of the target terms, e.g., “apple ipod”. So, let’s modify Approach 2 to sequentially introduce a space in between segments of the input, i.e., “appleipod” –> “a ppleipod”, “ap pleipod”, etc., instead of testing character by character.

Let’s leave Approach 3 unsolved for now, and call this Approach 4:

function suggest ( input, dictionary ) {
   var suggestions = [];

   //the 1st suggestion would have a leading space, which is a bit strange so skip ahead
   var suggestion = input;
   for ( var i = 1; i < input.length; i++ ) {

      if ( dictionary[ suggestion ] ) {
         suggestions.push( suggestion );
      suggestion = input.substring(0, i) + ' ' + input.substr(i);
   return suggestions;

Written by Erik

January 6, 2011 at 12:44 pm

Posted in algorithm