ETech 2005 Trip Report: Building a New Web Service at Google

These are my notes on the Building a New Web Service at Google session by Nelson Minar

Nelson Minar gave a talk about the issues he encountered while shipping the Adwords API. The slides for the talk are available online. I found this talk the most useful of the conference given that within the next year or so I'll be going through the same thing at MSN.

The purpose of the Adwords API was to enable Google customers manage their adwords campaigns. In cases where users have large numbers of keywords, it begins to be difficult to manage an ad campaign using the Web interface that Google provides. The API exposes endpoints for campaign management, traffic estimation and reporting. Users have a quota on how many API calls they can make a month which is related to the size of their campaign. There are also some complex authentication requirements since certain customers give permission to third parties to manage their ad campaigns for them. Although the API has only been available for a couple of weeks there are already developers selling tools that built on the API.

The technologies used are SOAP, WSDL and SSL. The reason for using SOAP+WSDL was so that XML Web Service toolkits which perform object<->XML data binding could be used by client developers. Ideally developers would write code like

adwords = adwordsSvc.MakeProxy(...)adwords.setMaxKeywordCPC(53843, "flowers", "$8.43")

without needing to know or understand a lick of XML. Another benefit of SOAP were that it has a standard mechanism for sending metadata (SOAP headers) and errors (SOAP faults).

The two primary ways of using SOAP are rpc/encoded and document/literal. The former treats SOAP as a protocol for transporting programming language method calls just like XML-RPC while the latter treats SOAP as a protocol for sending typed XML documents. According to Nelson, the industry had finally gotten around to figuring out how to interop using rpc/encoded due to the efforts of the SOAP Builders list only for rpc/encoded to fall out of favor and document/literal to become the new hotness.

The problem with document/literal uses of SOAP is that it encourages using the full expressivity of W3C XML Schema Definition (XSD) language. This is in direct contradiction with trying to use SOAP+WSDL for object<->XML mapping since XSD has a large number of concepts that have no analog in object oriented programming.

Languages like Python and PHP have poor to non-existent support for either WSDL or document/literal encoding in SOAP. His scorecard for various toolkits in this regard was

  • Good: .NET, Java (Axis)
  • OK: C++ (gSOAP), PERL (SOAP::Lite)
  • Poor: Python (SOAPpy, ZSI), PHP (many options)

He also gave an example that showed how even things that seemed fundamentally simple such as specifying that an integer element had no value could cause interoperability problems in various SOAP toolkits. Given the following choices

<foo xsi:nil="true"/>
<foo/>
nothing
<foo>-1</foo>

The first fails in current version of the .NET framework since it maps ints to System.Int32 which is value type meaning it can't be null. The second is invalid according to the rules of XSD since an integer cannot be empty string. The third works in general. The fourth is ghetto but is the least likely to cause problems if your application is coded to treat -1 as meaning the value is non-existent.

There are a number of other issues Nelson encountered with SOAP toolkits including

  • Nested complex types cause problems
  • Polymorphic objects cause problems
  • Optional fields cause problems
  • Overloaded methods are forbidden.
  • xsi:type can cause breakage. Favor sending untyped documents instead.
  • WS-* is all over the map.
  • Document/literal support is weak in many languages.

Then came the REST vs. SOAP part of the discussion. To begin he defined what he called 'low REST' (i.e. Plain Old XML over HTTP or POX) and 'high REST'. Low REST implies using HTTP GETs for all API accesses but remembering that GET requests should not have side effects. High REST involves using the four main HTTP verbs (GET, POST, PUT, and DELETE) to manipulate resource representations, using XML documents as message payloads, putting metadata in HTTP headers and using URLs meaningfully.

Nelson also pointed out some limitations of REST from his perspective.

  • Development becomes burdensome if lots of interactivity in application (No one wants to write lots of XML parsing code)
  • PUT and DELETE are not implemented uniformly on all clients/web servers.
  • No standard application error mechanism (most REST and POX apps cook up their own XML error document format)
  • URLs have practical length limitations so one can't pass too muh data in a GET
  • No WSDL, hence no data binding tools

He noted that for complex data, the XML is what really matters which is the same if you are using REST or SOAP’s document/literal model. In addition he felt that for read-only APIs, REST was a good choice. After the talk I asked if he thought the Google search APIs should have been REST instead of SOAP and he responded that in hindsight that would have been a better decision. However he doesn't think there have been many complaints about the SOAP API for Google search. He definitely felt that there was a need for more REST tools as well as best practices.

He also mentioned things that went right including :

  • Switch to document/literal
  • Stateless design
  • Having a developer reference guide
  • Developer tokens
  • Thorough interop testing
  • Beta period
  • Batch methods (every method worked with a single item or an array which lead to 25x speed up for some customers in certain cases). Dealing with errors in the middle of a batch operation become problematic though.

There was also a list of things that went wrong

  • The switch to document/literal cost a lot of time
  • Lack of a common data model
  • Dates and timezones (they allowed users to specify a date to perform operations but since dates don't have time zones depending on when the user sent the request the results may look like they came from the previous or next day)
  • No gzip encoding
  • Having quotas caused customer confusion and anxiety
  • No developer sandbox which meant developers had to test against real data
  • Using SSL meant that debugging SOAP is difficult since XML on the wire is encrypted (perhaps WS-Security is the answer but so far implementations are spotty)
  • HTTP+SSL is much slower than just HTTP.
  • Using plain text passwords in methods meant that users couldn't just cut & paste SOAP traces to the forum which led to inadvertent password leaks.

This was an awesome talk and I definitely took home some lessons which I plan to share with others at work.