Developing features in order to develop features.


Someone asked me not too long ago for a list of Windows-supported languages that don’t rely on white space for word breaking. I gave him a quick answer just because I happened to know, but then Yaniv Feinberg and I spent time trying to figure out a way that this guy might have used our APIs to derive his own answer to that question. That was an interesting exercise, but in the end I was struck by the same thing I’m always struck by whenever people ask questions like that: how inefficient it is that application developers who may or may not know anything about word breaking or writing systems or computational linguistics are stuck trying to intuit the answers to questions like this one. This came really close to home not too long ago when I was having a discussion with my husband, who used to develop the help integration for Visual Studio. This is a guy who knows more linguistics than your average developer, finding himself pretty overwhelmed by the fact that he somehow needed to find a way to provide word breaking support in order to display help content in multiple languages. Multiply his story by a whole bunch of linguistic services and then again by a whole bunch of application developers, and you quickly have a complicated ecosystem where developers can’t focus on the real meat of the applications that they’re trying to develop because they’re so bogged down in the peripheral features required to make their product appealing to wide audiences.


We could actually broaden this even further, where it isn’t just linguistic functionality that developers find themselves having to create, but a whole range of other stuff too — every small piece for which their various target audiences turn out to require personalization. If we’re asking individual application developers to reinvent the wheel all across the development space, then we as an industry have a pretty broken model. Especially as regards the creation of truly globalized applications.


If you’re a developer, I’m interested in hearing about cases like this where you’ve found yourself having to create peripheral features in order to provide a personalized experience for your customers. If you were successful, I’d like to hear why. If you weren’t successful, I’d like to hear why. Because the more I talk to people, the more I’m convinced this happens all the time.

Comments (13)

  1. RK says:

    If your developing for an audience that takes you outside your linguistic capabilities then either ..

    A) Your that big that you need to be sensitive to those needs of your customer across many cultures.

    or

    B) Your thinking too BIG and wasting your time…

    With 350+ million in North America, equal in Europe and I can’t even guess Asia….what’s your point?

    Are we to be sensitive to Microsoft because of their linguistic requirements of their customers? Or to any other business that has needs or sees an opportunity in other "linguistic" markets?

    Just when I was starting to wear the MS flag again something like this comes along and makes me wonder. Why?

    Again whats your point?

    Microsoft developes "solutions" for many cultures…so linguistics is a requirement not a problem.

    regards

    RK

  2. KieranS says:

    Wow, it’s interesting that you interpreted the post that way. I wasn’t really speaking about Microsoft’s applications so much as I was speaking about the challenges of developing globalized or personalized applications more generally, for any developer on any platform.

    Actually what I meant to to communicate is that any platform — Microsoft’s or anyone’s — has an interest in making it easy for developers to create globalized or otherwise personalized applications. If we’e expecting developers to create support features that are by way of infrastructure for the features that are really core for their applications, then we’re expecting too much. That kind of software ecosystem just can’t sustain itself.

    I asked the question because I’m interested in hearing from developers 1. whether this is as big a problem as I perceive it to be. and 2. whether there are particular examples that people have that are noteworthy.

  3. Mihai says:

    <<If your developing for an audience that takes you outside your linguistic capabilities then either



    With 350+ million in North America, equal in Europe and I can’t even guess Asia….what’s your point?>>

    For many companies even Europe is outside their linguistic capabilities.

    It is good if you are not in that class, but many are.

    And Asia already has cases of languages that don’t use spaces for word break and are important markets (ie Japanese, Chinese, Thai)

  4. rk says:

    sorry didnt mean to come off like a jerk (after which i re-read my comments I sound like)…but…

    i think that multi-cultural/linguistic challenges  of software have not always been there with the 255 character set ….now we have the tools to address those needs with Unicode it opens up new avenues of challenges, roadblocks and opportunities ….

    my point was (and maybe i was offbase a bit, had a couple :-)) those challenges are what you make it…if you decide to go from the 255 character set to a 16KB character set than the road is far from easy and all the power to you…

    I dont pretend to have blinders on but what did all the people do in 1981 when IBM unveiled their new darling of engineering.

    engineering progress is one measurement humanity as a whole…from 255 bits to 16KB is not an easy road, but we’ll manage….

  5. Mihai says:

    I did not take it as a jerk reaction 🙂

    My explanation for

    "255 was ok" and "what did all the people do in 1981": world changes. What was good enough yesterday, is not good enough today.

    We want a car, a or microwave/gas oven, a color TV and 100 chanels, a GUI and a mouse.

    Yes, at various points in time we did with horse and chariot, a stove, a command prompt, a B&W and 2 chanels.

    So, I think the time of 255 is gone. Transition might be costly, but it started, and there is no way back 🙂

  6. MSDN Archive says:

    Very interesting post, Kieran (I’ve just discovered your blog).

    There are definitely languages which use white space for word breaking, but which also use other characters in some contexts. French is a case in point (as well as Italian or Romansh, for instance). The apostrophe is definitely a word-breaking character in strings such as l’école (the school), d’hier (of yesterday), l’enfant (the child)… A while ago, I wrote about how difficult it can be to decide on the status of such a character (because the apostrophe is not always a word-breaking character in French, as in aujourd’hui (today), which is only one token). See http://blogs.msdn.com/correcteurorthographiqueoffice/archive/2005/12/07/500807.aspx for more details about this discussion… We definitely have to consider all these aspects for the word breakers we develop for all these languages…

    Thierry

  7. KieranS says:

    These comments are launching some good discussion.

    The way I see it, there are some globalization best practices that all internationally minded developers need to be aware of (using Unicode, etc). The stuff I was talking about was a bit different, although I think I didn’t explain it quite right. Word breakers are just one example — if we have 300 or 100 or even just 10 developers all independently creating word breaker support for Windows applications, something is horribly wrong. That’s more than a best practice; that’s the development of an entire feature set. We need to make it easier for developers to pick up broad international support that just works so that they can focus on their core features, the stuff they really care about.

    Unrelated: Thierry, I’m working on an interesting word breaker puzzle these days that I’ll have to come bug you guys about soon.

  8. Dewi Morgan says:

    I spent months trying to get wrapping, cut-n-paste, image embedding, html etc working nicely in a lightweight non-Swing Java 1.1 rich text object.

    Eventually, I gave up and went with Swing.

    Reinventing the wheel is bad. If there is any way to reuse components, it is almost always better.