Sometime soon, this internet thing is going to be really big, ya know.

But sometimes I wonder if it’s getting TOO big.

I don’t normally try to do two rants quick succession, but there was a recent email discussion on an internal mailing list that sparked this rant.

There’s a trend I’ve been seeing in recent years that makes me believe that people somehow think that there’s something special about the technologies that make up the WWW.  People keep trying to use WWW technologies in places that they don’t really fit.

Ever since Bill Gates sent out the “Internet Tidal Wave” memo 9 years ago, people seem to believe that every technology should be framed by the WWW. 

Take RPC, for example.  Why does RPC have to run over HTTP (or rather SOAP layered over HTTP)?  It turns out that HTTP isn’t a particularly good protocol for RPC, especially for connection oriented RPC like Exchange uses.  About the only thing that RPC-over-HTTP-over-TCP brings beyond the capabilities of RPC-over-TCP is that HTTP is often opened through firewalls.  But the downside is that HTTP is typically not connection oriented.  Which means that you’re either have to re-authenticate the user on every RPC or the server has to cache the client’s IP address and verify the client that way (or by requiring a unique cookie of some kind).

Why does .Net remoting even support an HTTP protocol?  Why not just a UDP and a TCP protocol (and I have serious questions about the wisdom of supporting a UDP protocol)?  Again, what does HTTP bring to .Net remoting?  Firewall pass-through?  .Net remoting doesn’t support security at all natively; do you really want unsecured data going through your firewall?  At least HTTP/RPC provides authentication.  And it turns out that supporting connection-less protocols like HTTP caused some rather interesting design decisions in .Net remoting – for instance, it’s not possible to determine if a .Net remoting client has gone away without providing your own ping mechanism.  At least with a connection oriented transport, you can have deterministic connection rundown.

Why does every identifier in the world need to be a URI?  As a case in point, one of our multimedia libraries needed a string to represent the source and destination of media content – the source was typically a file on the disk (but it could be a resource on the net).  The destination was almost always a local device (think of it as the PnP identifier of the dev interface for the rendering pin – it’s not, but close enough).  Well, the multimedia library decided that the format of the strings that they were using was to be a URI.  For both the source and destination.  So, when the destinations didn’t fit the IETF schema for URIs (it had % characters in it I believe, and our destination strings didn’t have a URI prefix) they started filing bugs against our component to get the names changed to fit the URI schema.  But why were they URIs in the first place?  The strings were never parsed, they were never cracked into prefix and object. 

Now here’s the thing.  URIs are great for referencing networked resources.  They really are, especially if you’re using HTTP as your transport mechanism.  But they’re not the solution for every problem.  The guys writing this library didn’t really want URIs, they really wanted opaque strings to represent locations.  It wasn’t critical that their URIs meet the URI format, they weren’t ever going to install a URI handler for the identifiers, all they needed to be were strings.

But since URIs are used on the internet, and the internet by definition is a “good thing” they wanted to use URIs.

Another example of an over-used internet technology: XML.  For some reason, XML is considered to be the be-all and end-all solution to a problem.  People seem to have decided that the data that’s represented by the XML isn’t important; it’s the fact that it’s represented in XML.  But XML is all ABOUT the data.  It’s a data representation format, for crying out loud.  Now, XML is a very, very nice data representation.  It has some truly awesome features that make representing structured data a snap, and it’s brilliantly extensible.  But if you’re rolling out a new structured document, why is XML the default choice?  Is there never a better choice than XML?  I don’t think so.  Actually, Dare Obasanjo proposed a fascinating XML litmus test here, it makes sense to me.

When the Exchange team decided to turn Exchange from an email system into a document storage platform that also did email, they decided that the premier mechanism for accessing documents in the store was to be HTTP/DAV.  Why?  Because it was an internet technology.  Not because it was the right solution for Exchange.  Not because it was the right solution for our customers.  But because it was an internet technology.  Btw, Exchange also supported OLEDB access to the store, which, in my opinion made a lot more sense as an access technology for our data store.

At every turn, I see people deploying solutions that are internet, even when it’s not appropriate.

There ARE times when it’s appropriate to use an internet technology.  If you’re writing an email store that’s going to interoperate with 3rd party clients, then your best bet is to use IMAP (or if you have to, POP3).  This makes sense.  But it doesn’t have to be your only solution.  There’s nothing WRONG with providing a higher capability non-internet solution if the internet solution doesn’t provide enough functionality.  But if you go the high-fidelity client route without going the standards based route, then you’d better be prepared to write those clients for LOTS of platforms.

It makes sense to use HTTP when you’re retrieving web pages.  You want to use a standardized internet protocol in that case, because you want to ensure that 3rd party applications can play with your servers (just like having IMAP and POP3 support in your email server is a good idea as mentioned above). 

URLs make perfect sense when describing resource location over the network.  They even make sense when determining if you want to compose email (mailto:foo@bar.com) or if you want to describe how to access a particular email in an IMAP message store (imap://mymailserver/public%20folders/mail%20from%20me).  But do they make sense when identifying the rendering destination for multimedia content? 

So internet technologies DO make sense when describing resources on the internet.  But they aren’t always the right solution to every problem.