the broken internet

March 10, 2009

here’s what I really want to be blogging about, what I really want to be addressing, and what niggles at me every day of life.

The internet is broken, or perhaps more accurate to say the technologies that form the major part of the internet are massively misused, infact by design and spec they don’t support the internet as we use it, the internet we are used to.

By internet I mean “the web”, and by the web I mean the same web that’s in website, webdesign and web development, you know everything you ever see in a web browser.

Let’s start at the core of the internet, the main protocol HTTP/1.1 and see how it’s misused.

HTTP/1.1
It’s described perfectly in the Abstract of rfc2616 (the HyperText Transfer Protocol).

“It is a generic, stateless, protocol”

Straight away we have a major design flaw feature that points to how we misuse this protocol. HTTP/1.1 is a generic (that’s good), stateless (virtually every web application and website you see is stateful, hows that supposed to work over a stateless), protocol.

So how does it work.. from the section 1.4 of the rfc which describes the Overall Operation of HTTP/1.1 we get this:

The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message’s protocol version and a success or error code, followed by a MIME-like message containing server information, entity metainformation, and possible entity-body content.

Simple.. a web browser sends a request packet to a web server (normally for a resource) and the webserver responds with a response packet containing said resource.

Let’s run a quick example.. say you visit facebook in you’re web browsers, here’s what happens:

  1. Browser sends a request to the web server hosting the domain facebook.com
  2. Facebook.com web server responds with the markup for the default page in it’s present state.
  3. connection is closed
  4. web browser parses the markup in order to display it and to find any related resources it may need to request.
  5. for each resource needed (image, script, advert, stylesheet, icon..) the web browser..
    1. sends a request for the resource
    2. recieves a response hopefully containing the resource
    3. closes the connection
  6. for each request (from above) recieved by the web server it
    1. allocates or creates a thread/process for the request
    2. thread/process resolves the response, locates the correct resource and parses it
    3. thread/process returns a response including the resource
    4. thread closes or web server deallocates the process

Even on a light page with only 10 images, 2 stylesheets and 2 javascripts, that’s 15 requests and responses, each one a seperate connection to a seperate process on the server. Slow, time consuming and a lot of overhead for something so simple.

HTTP/1.1 brought some new features to address these things, such as the connection header; primarily used to say when a connection should be close (connection: close); it allows a client to send in say 20 requests and recieve 20 responses in the order they where requested; thus only using one connection and one process. Is this used.. no; infact most webservers simply send back connection close and close the connection even if you don’t want to.

Further connection close is a bit useless in the scenario above, to be implemented correctly the webserver would need to parse the local resource, identify all resources it held itself, then send down responses for those as well; this however would break the protocol as no request was sent, and also it raises the question “what if the client doesn’t want the images or scripts, or has them cache’d?”.

Another vital limitiation of the Request/Response protocol; there is no allowance for a server push or on the fly update, there’s no way for a server to send anything to a client without the client first requesting it; a massive short-coming that leads to some of the worst misuse of HTTP/1.1.

Consider, a simple shoutbox on a small website, 10 people chatting in it concurrently. There are two ways to misuse the protocol to enable this.

  1. Polling
    A small javascript or meta refresh runs in the web browser; every second it asks the server for the entire contents of the shoutbox page again, or through further misuse of another technology (javascript), asks the web server “any updates”; continually, like a nagging child, once per second, for every user (10 users = 10 requests per second to the webserver only for shoutbox.. that’s a lot, 10 requests per second on a normal website = 864000 page views a day / circa half a million visits..)

I’ll finish this post if warrented :) tbc?



Leave a Reply