Converting from traditional to simplified Chinese, part 2: Using the dictionary

Now that we have our traditional-to-simplified pseudo-dictionary, we can use it to generate simplified Chinese words in our Chinese/English dictionary.

class StringPool
 LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);
 LPWSTR DupString(const WCHAR* pszBegin)
  return AllocString(pszBegin, pszBegin + lstrlen(pszBegin));

The DupString method is a convenience we will use below.

    if (de.Parse(buf, buf + cchResult, m_pool)) {
     bool fSimp = false;
     for (int i = 0; de.m_pszTrad[i]; i++) {
      if (pmap->Map(de.m_pszTrad[i])) {
       fSimp = true;
     if (fSimp) {
      de.m_pszSimp = m_pool.DupString(de.m_pszTrad);
      for (int i = 0; de.m_pszTrad[i]; i++) {
       if (pmap->Map(de.m_pszTrad[i])) {
        de.m_pszSimp[i] = pmap->Map(de.m_pszTrad[i]);
     } else {
      de.m_pszSimp = NULL;

After we parse each entry from the dictionary, we scan the traditional Chinese characters to see if any of them have been simplified. If so, then we copy the traditional Chinese string and use the Trad2Simp object to convert it to simplified Chinese.

If the string is the same in both simplified and traditional Chinese, then we set m_pszSimp to NULL. This may seem a bit odd, but it'll come in handy later. Yes, it makes the m_pszSimp member difficult to use. I could have created an accessor function for it (so that it falls back to traditional Chinese if the simplified Chinese is NULL), but I'm feeling lazy right now, and this is just a one-shot program.

void RootWindow::OnGetDispInfo(NMLVDISPINFO* pnmv)
  switch (pnmv->item.iSubItem) {
   case COL_TRAD:    pszResult = de.m_pszTrad;    break;
   case COL_SIMP:    pszResult =
      de.m_pszSimp ? de.m_pszSimp : de.m_pszTrad; break;
   case COL_PINYIN:  pszResult = de.m_pszPinyin;  break;
   case COL_ENGLISH: pszResult = de.m_pszEnglish; break;

Finally, we tell our OnGetDispInfo handler what to return when the listview asks for the text that goes into the simplified Chinese column. With these changes, we can display both the traditional and simplified Chinese for each entry in our dictionary.

Next time, a minor tweak to our display code, which happens to illustrate custom-draw as a nice side-effect.

Comments (8)
  1. hmmm says:

    All well and good, but will this help get Longhorn shipped (with some features, please) any quicker? Or is this whole blogging thing (not Raymnod Chan specifically, but M$-wide) just a way to increase "visibility" and play a little CYA for the stack-rank game?

  2. Kris says:

    I just happened to come across this dictionary design. Very interesting. Just wondering if you would take this all the way thru and finally expose as it as a COM Component.

    I am also interested in how MS folks design their UI apps(like Office) with automation in their mind. Would you please blog on this sometime in future? Thanks for the wonderful insights your blogs bring.

  3. Ben says:

    hmmm: The #1 priority at all times at Microsoft is helping existing customers. The #2 priority varies between fixing security issues (when there are some assigned to you), and working on your project.

    This isn’t about ranking (god knows Raymond don’t need more reputation) — it’s about helping people deal with the strange world of Win32 programming.

  4. ryanmy says:

    Ben makes excellent points… and in any case, Raymond is known to write posts for this blog far, far in advance — sometimes months ahead — in order to ensure that they keep coming even when all of us are hunkered down for Beta 1. (That’s why I haven’t updated lately :P)

    By the way, you might want to spew your drivel over at some of the Google guys — they’re actually required to spend part of their day working on something other than their product. (But then, if it spends years in public beta, can it really be said to ship?) It’s funny how double standards work…

  5. Craig Ringer says:

    Personally, I find this weblog very interesting and useful. That’s despite the fact that I don’t even *use* win32, let alone program for it, unless I really can’t avoid it.

    Also, consider the public discussion and feedback that comes of things like this. I can’t help but see that being useful. It might not "help get longhorn shipped" any faster, but I imagine it’ll help it be better designed. Personally, I’d prefer that.

    You might also do well to get the name of the person whose weblog you are criticising correct in future.

  6. mattd says:


    Why are you in a hurry to get longhorn? What is so big about it. I thought basicly everything a dev would care about is being back ported anyway? With WinFS pulled I just don’t see much to it. Even the newly dropped screen shots were a bit *yawn*. I will say that the new driver model with WDF looks cool but…

  7. Nathan Moore says:

    Is there some reasoning behind the choice to use LPWSTR (or LPCWSTR) over WCHAR* (or const WCHAR*)? At first it seemed that LPWSTR was only being used for a null terminated array, however the StringPool::DupString method eliminates that idea.

    I guess that I have never really understood the point of the LPTYPE vs TYPE*. Or the point of the CHAR typedef for that matter.

  8. Nathan: You’re right, I should’ve used LPCWSTR since the string is null-terminated.

Comments are closed.