Skip navigation.
Home

HTTP: Its good for a lot more than you may think

Technology

Did you know that RFC 2616 (HTTP/1.1) defines the HTTP protocol to operate on any type of URI? This opens a lot of possiblities how different identifier schemes can be used.

In my role as co-chair of the XRI effort, I do a lot of research into existing specifications. I want to reuse them. Specifically, I want to reuse HTTP since its probably the most widely implemented specification out there (except maybe TCP/IP).


The Realization

According to my reading of RFC 2616, the on-the-wire protocol can use any URI. For example, this is legal according to 2616:

POST mailto:gwachob@wachob.com HTTP/1.1
Host: somehost.com

...some email...
...

Thats pretty surprising to me. I don't hear people talking about this facet of the HTTP specification. The HTTP specification is general purpose, not only with respect to the applications one can build upon it, but also with respect to the types of identifiers it can operate on.

I want to use HTTP for the "local access" phase of XRI resolution. Maybe something like this:

GET xri:=Gabe/foo HTTP/1.1
Host: somehost.com
Accept: application/xml+rddl

Now, it gets even better. HTTP/1.1 allows ANY URI in Location fields as well. So the response to my XRI resolution request could be:

303 XRI Alias HTTP/1.1
Location: xri:@GabeAlias/foo HTTP/1.1

Or, it could be (in the case of the first request):

307 Email Filter Redirect HTTP/1.1
Location: http://mailfiltering.example/gwachob@wachob.com

Implications

HTTP is a well understood protocol and has relatively rich semantics. The REST architecture (which is not negatively impacted by use of other URI schemes, I would point out) demonstrates that HTTP is a generic application protocol. Using HTTP with a wide number of identifier schemes seems like a big win to me.

Now, to be clear, there are some pieces missing. Foremost among them is how does one know which HTTP server to send a request to if the identifier doesn't identify a host/ip port? There are a bunch of answers to this question. In the XRI world, there is a protocol for discovering a "local access" server (a directory) which could expose this HTTP interface. DDDS, SRV and NAPTR records can also be used for mapping identifiers into network endpoints. In short, HTTP can be used directly as a directory interface protocol for almost any identifier scheme. And its a protocol that comes with all sorts of nice features including caching, content negotiation, extensible metadata, extensible authentication, layering on TLS, etc.

There is also some language in section 5.1.2 which may cause some issues with using arbitrary URIs willie-nillie. This requires more research.

Routable HTTP requests and responses

HTTP is currently implemented and deployed primarily as a form of RPC -- or at least as a "request-response" protocol. This is because of its heritage as a simple method of retrieving a document from a server. When the identifier being used to interact with the server is "abstract" (ie doesn't directly specify a hostname and port), the concept of "routing" becomes possible.

Routing can be done with HTTP requests in (at least) two ways. Routing can be done "transparently" (the server acts as an invisible proxy) or through a series of redirects (where the HTTP client makes new HTTP connections). This can be done today with plain HTTP URL HTTP requests. But its more compelling in a world where the identifiers are logical and bits that the client may want to access may be distributed across the network.

Digital identity folks talk about this a lot, as do web services folks. Digital identity problems tend to be of the sort "I have an identifier and I want to find data or exchange messages with something on the network representing it". That usually means looking up the identifier in some directory (or hierarchy/federation of directories) and discovering an endpoint. WS-Routing addresses the issue of "source routing" a message through several intermediaries on its way to an ultimate endpoint.

I'm not suggesting that the use of non-HTTP URIs in HTTP obviates SOAP or WS-Routing or anything of the like. I want to merely bring up the fact that HTTP's full capabilities are not neccesarily being leveraged by the various architectures. I'm not sure if this is a good thing or not - but I think its due in part to lack of awareness about HTTP's flexibility.

Example: HTTP Mail Transport Protocol (HTMP)

In closing, let me give a more fleshed out description of how HTTP could be used for delivering RFC 2822 email.

  • Lets assume someone wants to deliver an email to gwachob@wachob.com.
  • I have a server which accepts requests of the following form:

    POST mailto:gwachob@wachob.com HTTP/1.1
    Host: mailhost.wachob.com

  • Lets assume I have an SRV record of the following form in the wachob.com domain:
    _hmtp._http._tcp 0 1 80 mailhost.wachob.com.
  • Requests to deliver mail would generate the following HTTP request to mailhost.wachob.com port 80:
    POST mailto:gwachob@wachob.com HTTP/1.1
    host: mailhost.wachob.com
    content-type: text/plain
    content-length: 300
    rfc-2822-header: foo
    rfc-from: foo

    From: spammer@spam.example
    To: gwachob@wachob.com
    Subject: free credit card

    Get a free credit card!

  • The response could be a redirect to an outsource filter based on the headers or content:
    307 Email filtered HTTP/1.1
    Location: http://filterco.example/gwachob@wachob.com

Conclusion

I'm not proposing anything earth shattering. I'm not even sure that I'm proposing anything new. Please send me pointers if this has been discussed elsewhere. Feedback and more discussion welcome as usual! Email me or find me (GabeW) on #joiito if you want to talk!