Marketing are Bad


Todd Bishop’s story in yesterday’s Seattle Post-Intelligencer about Professor Sandeep Krishnamurthy’s fisking of Word’s grammar checker is making the rounds. It even got a blurb on NPR’s Morning Edition today.

Word’s grammar checker is a by-product of the Natural Language Processing group at Microsoft Research. I’m not an expert in natural language processing. My job is taking the work they do and figuring out how to interface that work with Word. So, you can take what I say with an appropriately-sized grain of salt.

I can say, however, that natural language processing is “hard.” In computer science the word “hard” has a very technical meaning in terms of computational complexity. Problems that are “hard” don’t lend themselves to being broken down into a finite set of well-defined steps using a handful of basic operations.

Computers are stupid. They really only know how to do a very limited set of things. They can add, subtract, multiply and divide. They can move a value from one location to another. And they can compare two numbers. Everything that your computer does is built up from these basic operations. Everything.

Well, some pedant might pipe up and say that the granularity is even smaller; that even these operations are constructs built up from basic logical operations (“and,” “or,” “exclusive or,” etc.), but that’s a digression not worth exploring for now. Let’s just content ourselves with the operations that are provided by the processor’s machine language.

The job of the folks in our NLP research group is to take the general problem of parsing natural language sentences, like “Marketing are bad?” and break that problem down into these six basic operations. What makes the problem difficult is ambiguity.

Take the subject of this post. The word “Marketing” is obviously a noun, but is it a gerund? Or, am I referring to the marketing group collectively as individuals? If the former, then the sentence I wrote is grammatically incorrect. If the latter, then it’s grammatically correct. How might the computer know? How would you know without some further context for that sentence?

By the way, American sports writers are notorious for getting this one wrong. They’ll often write something like, “Seattle is on a pace to win the American League West.” As written, the sentence seems correct, but substitute “The Mariners” for “Seattle.” That substitution shouldn’t change the plurality of the verb, because, semantically, the subject hasn’t changed. The correct verb for either wording of the subject is, “are.”

Prof. Krishnamurthy’s fisking of Word’s grammar checker consists of a cobbling-together of a number these ambiguous sentences. Even in context, it’s difficult to tell if, say, “Gates” is a singular proper noun or a plural noun that simply has incorrect capitalization. If you can’t figure out the plurality of a noun in a sentence, how can you decide that the plurality of the associated verb matches its subject?

Which brings me to an ancillary facet to the overall problem. What should software do in response to this kind of ambiguity? If the grammar checker is unable to figure out whether a sentence is correct or incorrect, should Word err on the side of accepting the sentence as correct, or should Word err on the side of flagging it as an error? No matter how we answer that question, a non-negligible group of users won’t be happy.

“Well, so add a preference,” you say. At first, this seems like a simple answer, but it doesn’t always work that way. Word’s auto-formatter is probably the classic example. The primary reason people curse it is because of ambiguity, and we already have preference-related issues with the auto-formatter. And, yes, we’ve heard the complaints. We’re working on a solution.

But, ambiguity is our bane. It’s the heart of a number of problems we’re trying to solve, and its very existence means that we aren’t going to find complete solutions to those problems (at least not short of figuring out how to get a computer to mimic the human brain using just six basic operations). Given this limitation on solving problems like effective grammar checking, should we, as Prof. Krishnamurthy suggests (demands?), scrap the feature entirely? Or, is it better to offer a feature that, while less than perfect, still retains some utility?

If a feature has potential for solving real user problems, then I tend to shade toward adding it even if the feature is limited in its ability to solve the problem. One very significant benefit from putting even a partial solution into users’ hands is the feedback you get about the feature’s limitations. That’s why dialogues are important. As the cluetrain manifesto says, markets are conversations. It’s hard to have a conversation about a feature that isn’t there.

 

Rick

Currently playing in iTunes: From Now On by Supertramp

Comments (13)

  1. JMayeur says:

    — I take a slightly different tack to what the issue is with the MS Grammar Checker.

    First let me say that I think the "test" is too far on the absurd side of reality to have any value. If a writer’s grammer is that bad, they certainly have a much bigger issue to confront than any grammar checker could help solve, after all how would that writer know what to make of any suggestions a grammar checker made?

    For me, the GC fails because it doesn’t account for the one thing it will never [ never say never right? ] be able to handle, and that is individualism. One thing you learn about writing, after all those years of DO-IT-THIS-WAY!!!, is that well, frankly, much of the best writing out there breaks rules when the meat of the words/paragraph/phrase are better carried by an unconventional format.

    [ By default I always turn SpellCheck/GrammarCheck off, but Windows does not seem to be the best at persisting settings, that’s why I don’t like GC, its a ghost that keeps popping up. ]

    I think the problem with MS is a slightly larger issue with a hobbled ability to evaulate the value proposition of features.

    I had the good fortune to work for a vendor that provided software for part of MicroSoft. The one thing that consistently happened is that the end users at MS wanted to pack as many features as possible into the smallest visual space. Literally [yes literally] there were webforms with 40+ fields… Of course each user had a valid reason that their feature should be available, but in the end, when they got what they asked for, they were sorely dissappointed by its usablity.

    I suspect this is a common theme at MS. Where PMs/DEVs are faced with huge feature sets, plenty of talent to implement each feature, but with the planning deficit of not really allowing for the possibilty that less can be much, much more… [ the duplicate word warning from the MS GC is a super-peeve of mine ]

    .just a thought.

  2. James Hunter says:

    I am apparently an outlier, but I don’t mind ignoring extraneous squiggly lines. That is–a false positive I can evaluate and ignore, but if I am unaware of a misspelling I don’t know to fix it. The latter is a common problem (see "evaulate" [sic] in the previous post).

    It does seem as if it would be a difficult task to present only relevant grammar/spelling information to the user. I always assumed the point of the grammar- and spell-checkers was to catch occasional mistakes by the writer–not to correct systematic spelling or grammar problems. When I choose to break rules, I expect the rule-checker will flag the violations! And then I ignore them–which I guess is an odd response.

    James

  3. Rick Schaut says:

    James, I wouldn’t say that you’re unusual. No matter how we do this, a non-neglible group of users won’t be happy. That’s what makes the conversation both necessary and interesting.

  4. Fred says:

    Rick, I don’t think Prof. Krishnamurthy is suggesting or demanding you scrap the feature. True, he’s surprised you offer it in its current state, but he goes on to say:

    "I believe Microsoft has the ability to improve this feature and I hope it exercises it."

  5. Rick Schaut says:

    Fred,

    I suppose I should have said "implied" rather than "suggests". Wonder what that "suggests" about Prof. Krishnamurthy’s belief that we can significantly improve this feature :-).

  6. Eric Albert says:

    I don’t think your example regarding "Seattle" vs. "The Mariners" is accurate for American English speakers. "Seattle" in that context is a collective noun, and in American English collective nouns are paired with singular verbs. This is different in British English, where collective nouns are paired with plural verbs. If you Google for "british american grammar collective" you’ll see a number of discussions of this.

    One way to look at it is if you substitute "the team" for "Seattle" and "the players" for "Mariners". In American English you wouldn’t say "the team are on a pace", but you would say "the players are on a pace". In British English you’d use "are" in both cases.

  7. Rick Schaut says:

    If I remember grade-school grammar correctly, and, no, we won’t say how long ago that was, the rule for collective nouns depends on whether you’re speaking of the collective as a singular entity or as a group of individuals.

    For example, both, "Apple is a corporation headquartered in Cupertino, CA," and "Apple are busy putting the finishing touches on the next version of OS X," represent correct usage in either America or Britain.

    But, we’re quibbling. We can’t agree on this issue, yet Prof. Krishnamurthy expects us to be able to write a piece of software that gets it right. I think his expectations are just a tad bit out of line.

  8. And it sure is a fun problem to tackle.

  9. Eric Albert says:

    Oh, yes, definitely quibbling. If anyone insisted I write a perfect grammar checker, I’d go find another job. Even if I had all of the research groups in the world at my disposal. It isn’t a solvable problem.

  10. Rich says:

    Maybe part of the problem is determining whether Grammar is descriptive or prescriptive. Being more than a half century since I started school, I find that prescriptive Grammar guides me in making my choices. Prescriptive Grammar tends to be conservative and even outdated in some contemporary usage. But any living living language is fluid, and therefore Grammar by necessity must be descriptive.

    It seems as if the Grammar Checker reflects a prescriptive approach which also tries to incorporate the descriptive advances of the recent past.

    And the ultimate question is: "Who’s on first?"