Locale Data in Windows 10 & CLDR

I've blogged about Locale Data Churn a couple times in the past, and mentioned how to use the Locale Builder to modify Finnish for Windows 10, but I'd like to talk a bit more about Locale Data in Windows 10.

A Data Collection Problem

As I've mentioned before, Locale Data changes over time for various reasons.  Managing the data can be pretty interesting (and time consuming).  Typically it starts something like this (hypothetical example):  "Hey, Shawn, how come the XXX field in my YYY language is wrong?"  Then I'm usually like "Hmm, it's been that way for 10 years."  And get the response "Yea, well, our government just passed regulation 1.2.3 fifteen minutes ago, so now it's wrong."

Now I'm in a pickle.  I have to try to figure out if it's "just this guy", and if this regulation is real, and, if it's real, if anyone actually cares. (A surprising number of locales don't do in practice what their government would have them do on paper.)  So then we try to find someone else to corroborate the change. Often the seconds answer is along the lines of "yea, AAA was wrong, but that other guy read the regulation wrong, it's not BBB like they said, its actually supposed to be CCC".  Like that helps at all.

And oftentimes, both guys are right.  Business does it one way, the government requires it a different way, but consumers might do it a third way.  All very confusing.

So, we're pretty cautious about changing this stuff because it's easy to get wrong, but eventually we have to make the change, and usually that causes other people saying it's wrong.  Or saying it's right but they don't like it.

Locale Data in Windows 10

In Windows 10 we decided to take advantage of CLDR, the Common Locale Data Repository.  CLDR is a collection of locale data collected cooperatively by the industry to try to be consistent and address some of the above issues by leveraging different experts.  It's a cooperative effort and many people outside of the actual software companies that consume the CLDR data contribute to the effort.

CLDR has a system of voting for the various data fields before the data is accepted.  For more information, visit the CLDR.Unicode.org site.  If you're interested in improving the locale data for your area, or if your locale isn't covered by CLDR, you might want to consider participating in the CLDR process.

By getting data from CLDR, our goal is to try to leverage the power of the community to have the best data possible, and to reduce unnecessary churn over time.  The downside is that our private data didn't exactly match a lot of the CLDR data, so aligning the locales to CLDR introduced more changes that we would typically expect.  We did this change early in the Windows 10 process so that the Windows Insiders could provide feedback on the preview builds.

The advantages of our CLDR dependency are support for additional locales we did not have data for, reduced churn over time, values more consistent with the rest of the industry, and hopefully better quality data overall due to the cooperative collection effort.