Microsoft has been working on handwriting recognition for over 15 years going back to the Pen extensions for Windows 3.0. With the increased integration and broad availability of the handwriting components present in Windows Vista we continue to see increased use of handwriting with Windows PCs. We see many customers using handwriting across a wide variety of applications including schools, hospitals, banking, insurance, government, and more. It is exciting to see this natural form of interaction used in new scenarios. Of course one thing we need to continue to do is improve the quality of recognition as well as the availability of recognizers in more languages around the world. In this post, Yvonne, a Program Manager on our User Interface Platform team, provides a perspective on engineering new recognizers and recognition improvements in Windows 7. –Steven
Hi, my name is Yvonne and I’m a Program Manager on the Tablet PC and Handwriting Recognition team. This post is about the work we’ve done to improve recognition in handwriting for Windows 7.
Microsoft has invested in pen based computing since the early 1990s and with the release of Windows Vista handwriting recognizers are available for 12 languages, including USA, UK, German, French, Spanish, Italian, Dutch, Brazilian Portuguese, and Chinese (Simplified and Traditional), Japanese and Korean. Customers frequently ask us when we plan to ship more languages and why a specific language is not yet supported. We are planning to ship new and improved languages for Windows 7, including Norwegian, Swedish, Finnish, Danish, Russian, and Polish, and the list continues to grow. Let’s explore what it takes to develop new handwriting recognizers.
Windows has true cursive handwriting recognition, you don’t need to learn to write in a special way – in-fact, we’ve taught (or “trained” as we say) Windows the handwriting styles of thousands of people and Windows learns more about your style as you use it. Over the last 16 years we’ve developed powerful engines for recognizing handwriting, we continue to tune these to make them more accurate, faster and to add new capabilities, such as the ability to learn from you in Vista. Supporting a new language is much more than adding new dictionaries – each new language is a major investment. It starts with collecting native handwriting, next we analyze the data and go through iterations of training and tuning, and finally the system gets to you and continues to improve as you use it.
The development of a new handwriting recognizer starts with a huge data collection effort. We collect millions of words and characters of written text from tens of thousands of writers from all around the world.
Before I describe our collection efforts, I would like to answer a question we are frequently asked: “Why can’t you just use an existing recognizer with a new dictionary?” One reason is that some languages have special characters or accents. But the overriding reason is because people in different regions of the world learn to write in different ways, even between countries with the same language like the UK and US. Characters that may look visually very similar to you can actually be quite different to the computer. This is why we need to collect real world data that captures exactly how characters, punctuation marks and other shapes are written.
Setting up a data collection effort is challenging and time consuming because we want to ensure that we collect the “right kind of data”. We carefully choose our collection labs in the respective countries for which we develop recognizers.
Before we start our data collection in the labs, we configure our collection tools, prepare documentation, and compile language scripts that will guide our volunteers through the collection process. Our scripts are carefully prepared by native speakers in the respective language to ensure that we collect only orthographically correct data, data from different writing styles, and data that covers all characters, numbers, symbols and signs that are relevant to a specific language. All of our scripts are proofread and edited before they are blessed to be used at the collection labs.
Once our tools and scripts are ready, we open our labs and start to recruit volunteers to donate their handwriting samples. Our recruitment efforts ensure that we have balanced demographics such as gender, age, left handiness, and educational background that represent the majority of the population for that country.
A supervisor at the lab instructs the volunteers to copy the text as it is displayed in the collection tool in their own writing style. What is important to note is that we want to collect writing samples that accurately represent the person’s natural way of writing. We therefore encourage volunteers to treat “pen and tablet” like “pen and paper”. If one of the volunteers tends to writes in big, curvy strokes, then we want to collect his/her big, curvy strokes during the collection session. High quality data in this context refers to data that was naturally written.
Here is a snapshot of what our collection tool looks like:
Figure 1: Collection Tool
A collection session lasts between 60-90 minutes at which point our volunteer has donated a significant amount of handwritten data without feeling fatigued. The donated data is then uploaded and stored in our database at Microsoft ready for future use. The written samples contain important information like stroke orders, start- and end points, spacing, and other characteristics that are essential to train our new recognizer.
Let’s take a look at some of our samples in our database to illustrate the great variation among ink samples:
Figure 2: Ink samples illustrating different stroke orders.
The screenshot shows how three different volunteers inked the word “black”. The different colors are used to illustrate the exact stroke orders in which the word was written. Our first two volunteers used five strokes to write the word “black”; our third volunteer used four strokes. Please also note how our third volunteer used one stroke only to ink the letters “ck”, while our first volunteer used three strokes for the same combination of letters. All of this information is used to train our recognizers.
Neural Network and Language Model
Once we have collected a sufficient amount of inked data, we split our data into a training set, used by our development team, and a “blind” set, used by our test team. The training set is then employed to train the Neural Network, which is largely responsible for the magic that is taking place during the recognition process. Good, naturally written data is essential in developing a high quality recognizer; the recognizer can’t be any better than its training set. The more high quality data we feed into our Neural Network, the more equipped we are to handle sloppy cursive handwriting.
Our Neural Network is a Time-Delay Neural Network (TDNN) that can handle connected letters of cursive scripts. A TDNN takes ink segments of preceding and following stroke segments into consideration when computing the probabilities of letters, digits and characters for each segment of ink. The output of the TDNN is powerful but not good enough when handwriting is sloppy. In order to come within reach of human recognition accuracy, we have to employ information that goes beyond the shape of the letter: we call this the Language Model context. The majority of this Language Model context comes in form of the lexicon, which is a wordlist of valid spellings for a given language. For many languages, this is the same lexicon that the spellchecker uses. The TDNN and the lexicon work closely together to compute word probabilities and output the top suggestions for the given input.
Training the Neural Network is an involved process that takes time. We often experiment with borrowing data from other languages to increase the size of the training data with the ultimate goal to boost recognition accuracy. Borrowing characters from other languages does not always lead to success. As I mentioned above, stroke order, letter shape, writing styles and letter size can differ significantly from country to country and can have a negative impact on the performance of the TDNN. It often takes us several rounds of training, re-training and tuning before we find “the right formula” that will lead to high recognition accuracy.
How do we know if we are headed in the right direction when we build a new recognizer? This is an important question that the test team and native speakers answer for us. The test team is responsible for generating our recognition accuracy metrics that reflect how good our recognizer is. These accuracy metrics are based on our blind test set which is the collected data that development could not use for training. In addition to our accuracy metrics, we work with native speakers in house and at our world-wide subsidiaries to get feedback and further input.
Improving the recognizers through personalization
In the previous paragraphs I have outlined how we develop high quality recognizers that can handle a wide variety of different writing styles. But there is more as each person can also train the recognizer his/her unique writing style. The training that is done to teach the recognizer a personal writing style is the same training that happens before Microsoft ships the product. The only difference is that we are now collecting unique training data from a specific person (and not that of thousands of people). We call this process “Personalization”.
Figure 3: Personalization Wizard (Sentence module).
As the screenshots of our Personalization wizard illustrates, a person is asked to write the requested sentence to provide his/her ink samples. The more data a person donates during the personalization process, the better the recognizer will become. In addition to providing writing samples based on specified sentences, a person can target specific recognition errors, shapes, and characters that will all be used for training. Our Personalization feature is complex and offers a variety of different modules that enable a person to optimally tune the recognizer. We are proud to announce that Personalization will be available for all Vista languages and all new Windows 7 languages. We encourage you to use this feature to improve your recognition accuracy.
We continue to work on improving our recognizers which also means that we are incorporating our customers feedback through online telemetry (anonymously, privately, voluntary, and opt-in). In Windows Vista we released a new feature called “Report Handwriting Recognition Errors”, which gives people the opportunity to submit those ink samples that the recognizer did not recognize correctly. After the person has corrected a word in the Tablet Input Panel (TIP), we enable a menu that allows a person to send the misrecognized ink together with its corrected version to our team.
Here is a screenshot of what our error reporting tool looks like:
Figure 4: With “Report Handwriting Recognition Errors” people can choose which of the misrecognized ink samples they want to submit.
We receive approximately 2000 error reports per week. Each error report is stored in our database before we analyze it and use it to improve our next generation of recognizers. As you can imagine, real world data is extremely helpful because it is only this type of data that can reveal shortcomings of our recognizers.
We value and appreciate every single error report. Keep sending us your feedback, so that we can use it to improve the magic of our present and future recognizers.
– Yvonne representing the handwriting recognition efforts