Code Pages versus Locales/Locale Names

The question recently came up on why someone's index range on a text column wasn't working properly. It turns out they forgot to set the Code Page on the column, but they were still storing Unicode data in the column.

So the question came up about just what the heck is a Code Page. I think it comes out of a long history of Windows support for globalization. It goes back to probably the 1980s, since I remember pages in the appendix of the MS-DOS 5 manual on different code pages.

ASCII is technically 7-bits only. The values 0-127.

ANSI characters are 8-bits, but the meaning of characters 128-255 change depending on which 'code page' you choose. ESE mostly cares about UTF-16 (1200) and Latin (1252). See Wikipaedia for more details.

Code pages are defined on the column itself.

 typedef struct
 {
 // ...
 unsigned short cp;
 unsigned short wCollate; /* Must be 0 */
 // ...
 } JET_COLUMNDEF;
 

But then you could say 'Code Page 1200', which meant Unicode (I think UCS-2 was the only form of Unicode initially, and UTF-7, UTF-8, UTF-16, and UTF-32 came later? But that's a different topic...).

And it was realized that you need to know the locale in order to sort Unicode text. First we used LCIDs (locale identifiers), and used them for many years. Then BCP-47 and Locale Names came along. Now, LCIDs are deprecated (starting in Windows Vista), and locale names are preferred. Example of an LCID: 1033 (0x409). The equivalent locale name is "en-US".

So we need to know the *locale* in order to know how to sort a Unicode column. Different locales sort characters in different ways. The LCID in in the JET_UNICODEINDEX structure, which is part of JET_INDEXCREATE/JET_INDEXCREATE2and the Locale Name is in JET_UNICODEINDEX2, which is referenced by JET_INDEXCREATE3. You need Windows 8 in order to use Locale Names. If you need to support Windows 7, you're stuck with LCIDs.

 typedef struct tagJET_UNICODEINDEX
 {
 unsigned long lcid;
 unsigned long dwMapFlags;
 } JET_UNICODEINDEX;
 
 #if ( JET_VERSION >= 0x0602 )
 typedef struct tagJET_UNICODEINDEX2
 {
 WCHAR *szLocaleName;
 unsigned long dwMapFlags;
 } JET_UNICODEINDEX2;
 #endif //JET_VERSION >= 0x0602
 

The 'dwMapFlags' member is passed as 'dwMapFlags' to LCMapStringEx.

Given what we know today, I'm not sure where we'd stick it, I doubt we'd stick with the current scheme of having two different places. I suspect we'd choose it at index creation time.

Note that the code below thinks that it's using Unicode data, because it stores and retrieves it successfully. But the index is completely confused.

  /// <summary>
 /// Repro of Fleischman index range
 /// </summary>
 [TestMethod]
 [Priority(2)]
 [Description("Repro of Fleischman index range.")]
 public void TestFleischmanIndexRange()
 {
 JET_COLUMNID columnidName;
 var columndefName = new JET_COLUMNDEF()
 {
 //// Uncomment this line to make it work properly.
 //// cp = JET_CP.Unicode,
 coltyp = JET_coltyp.LongText,
 grbit = ColumndefGrbit.ColumnNotNULL
 };
 Api.JetAddColumn(this.sesid, this.tableid, "Name", columndefName, null, 0, out columnidName);
 
 JET_COLUMNID columnidStatus;
 var columndefStatus = new JET_COLUMNDEF() { coltyp = JET_coltyp.UnsignedByte, };
 Api.JetAddColumn(this.sesid, this.tableid, "Status", columndefStatus, null, 0, out columnidStatus);
 
 JET_COLUMNID columnidDifferentiator;
 var columndefDifferentiator = new JET_COLUMNDEF()
 {
 cp = JET_CP.Unicode,
 coltyp = JET_coltyp.LongText,
 grbit = ColumndefGrbit.ColumnNotNULL
 };
 Api.JetAddColumn(this.sesid, this.tableid, "Differentiator", columndefDifferentiator, null, 0, out columnidDifferentiator);
 
 string indexDef = "+Name\0+Status\0\0";
 var indexcreate = new JET_INDEXCREATE[]
 {
 new JET_INDEXCREATE()
 {
 szIndexName = "fleischman",
 szKey = indexDef,
 cbKey = indexDef.Length,
 grbit = CreateIndexGrbit.None,
 ulDensity = 100,
 cbKeyMost = 1000,
 cbVarSegMac = 1000,
 pidxUnicode =
 new JET_UNICODEINDEX() { szLocaleName = "en-us", }
 },
 };
 
 Windows8Api.JetCreateIndex4(this.sesid, this.tableid, indexcreate, indexcreate.Length);
 
 var records = new[]
 {
 new { name = "a1", status = 0x2, differentiator = "a1-2" },
 new { name = "a2", status = 0x2, differentiator = "a2-2" },
 new { name = "n1", status = 0x2, differentiator = "n1-2" },
 new { name = "n1", status = 0x2, differentiator = "n1-2_dupe!" },
 new { name = "n2", status = 0x2, differentiator = "n2-2" },
 new { name = "q1", status = 0x2, differentiator = "q1-2" },
 new { name = "q2", status = 0x2, differentiator = "q2-2" },
 };
 
 using (Transaction trx = new Transaction(this.sesid))
 {
 foreach (var record in records)
 {
 using (var update = new Update(this.sesid, this.tableid, JET_prep.Insert))
 {
 Console.WriteLine("Inserting name = '{0}', status = '{1}'", record.name, record.status, record.differentiator);
 Api.SetColumn(this.sesid, this.tableid, columnidName, record.name, Encoding.Unicode);
 Api.SetColumn(this.sesid, this.tableid, columnidStatus, (byte)record.status);
 Api.SetColumn(this.sesid, this.tableid, columnidDifferentiator, record.differentiator, Encoding.Unicode);
 update.Save();
 }
 }
 trx.Commit(CommitTransactionGrbit.LazyFlush);
 }
 
 if (this.NamedKeyTrySeek(this.sesid, this.tableid, "fleischman", "n1", 2))
 {
 do
 {
 string retrievedName = Api.RetrieveColumnAsString(
 this.sesid,
 this.tableid,
 columnidName,
 Encoding.Unicode);
 byte retrievedStatus= Api.RetrieveColumnAsByte(
 this.sesid,
 this.tableid,
 columnidStatus).GetValueOrDefault();
 string retrievedDifferentiator= Api.RetrieveColumnAsString(
 this.sesid,
 this.tableid,
 columnidName,
 Encoding.Unicode);
 
 Console.WriteLine(
 "Retrieved name = '{0}', status = '{1}', differentiator = '{2}'",
 retrievedName,
 retrievedStatus,
 retrievedDifferentiator);
 }
 while (Api.TryMoveNext(this.sesid, this.tableid));
 }
 }
 
 /// <summary>
 /// Courtesy of Fleischman.
 /// </summary>
 /// <param name="sesid"></param>
 /// <param name="tableid"></param>
 /// <param name="indexName"></param>
 /// <param name="name">The first column to seek for.</param>
 /// <param name="status">The second column to seek for.</param>
 /// <returns></returns>
 public bool NamedKeyTrySeek(
 JET_SESID sesid,
 JET_TABLEID tableid,
 string indexName,
 string name,
 byte status)
 {
 Api.JetSetCurrentIndex(sesid, tableid, indexName);
 Api.MakeKey(sesid, tableid, name, Encoding.Unicode, MakeKeyGrbit.NewKey);
 Api.MakeKey(sesid, tableid, status, MakeKeyGrbit.None);
 bool ret = Api.TrySeek(sesid, tableid, SeekGrbit.SeekEQ);
 if (ret)
 {
 Api.MakeKey(sesid, tableid, name, Encoding.Unicode, MakeKeyGrbit.NewKey);
 Api.MakeKey(sesid, tableid, status, MakeKeyGrbit.None);
 ret = Api.TrySetIndexRange(sesid, tableid, SetIndexRangeGrbit.RangeInclusive | SetIndexRangeGrbit.RangeUpperLimit);
 }
 
 return ret;
 }