The Legacy Hashing Algorithm in Open XML

In Open XML, there is a feature whereby you can restrict editing, and allow only users who have a password to modify the file.  Now, understand, this isn’t a password that really protects the file.  It is easy to write an XML program that opens a file that has had its editing restricted, and modify the file.  It is easy to use an XML API to remove the hashing password.  In other words, this hashed password isn’t really about security of data, or elevation of privilege; it is about restricting editing for an average information worker.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCLet’s not confuse what this hashing function is used for.  It’s used to control UI behavior for modifying the document.  An application that needs to implement restricted editing for legacy files will need to be able to hash a password using this algorithm.  I would recommend that applications implement the legacy hash for backwards compatibility but when re-saving the document they use one of the secure algorithms recommended by the standard.  Note that this particular feature is optional.  If an application has not implemented the hashing algorithm, the default behavior would be to open the file and allow editing.

But in the interests of furthering computing in general, a friend of mine (Bob McClellan) put together a C implementation of the obsolete hashing algorithm.  He used the Open XML specification.

As an aside, Bob McClellan is one of the best developers that I’ve ever known.  He is super competent.  You will be hearing more about Bob on my blog in the next few days.

Anyway, here is the hashing algorithm in C.  It is just sample code that we have put together.  However, we’re reasonably confident about the quality of the code.  If you find anything wrong with this code, please let us know.  We’ll fix it ASAP.

Note that this code only deals with utf-16; if you need to hash a password using any other encoding, you would need to convert the string to utf-16 first.  This exercise is left to you.

#include <stdio.h>

// Replace this typedef with a more appropriate type for UTF-16, if necessary
typedef unsigned short char16_t;

// These tables are used to generate the high word of the hash.
static unsigned short lookup_length[15] = {
0xE1F0, 0x1D0F, 0xCC9C, 0x84C0, 0x110C, 0x0E10, 0xF1CE, 0x313E,
0x1872, 0xE139, 0xD40F, 0x84F9, 0x280C, 0xA96A, 0x4EC3
};

static unsigned short lookup_bits[15][7] = {
{ 0xAEFC, 0x4DD9, 0x9BB2, 0x2745, 0x4E8A, 0x9D14, 0x2A09},
{ 0x7B61, 0xF6C2, 0xFDA5, 0xEB6B, 0xC6F7, 0x9DCF, 0x2BBF},
{ 0x4563, 0x8AC6, 0x05AD, 0x0B5A, 0x16B4, 0x2D68, 0x5AD0},
{ 0x0375, 0x06EA, 0x0DD4, 0x1BA8, 0x3750, 0x6EA0, 0xDD40},
{ 0xD849, 0xA0B3, 0x5147, 0xA28E, 0x553D, 0xAA7A, 0x44D5},
{ 0x6F45, 0xDE8A, 0xAD35, 0x4A4B, 0x9496, 0x390D, 0x721A},
{ 0xEB23, 0xC667, 0x9CEF, 0x29FF, 0x53FE, 0xA7FC, 0x5FD9},
{ 0x47D3, 0x8FA6, 0x0F6D, 0x1EDA, 0x3DB4, 0x7B68, 0xF6D0},
{ 0xB861, 0x60E3, 0xC1C6, 0x93AD, 0x377B, 0x6EF6, 0xDDEC},
{ 0x45A0, 0x8B40, 0x06A1, 0x0D42, 0x1A84, 0x3508, 0x6A10},
{ 0xAA51, 0x4483, 0x8906, 0x022D, 0x045A, 0x08B4, 0x1168},
{ 0x76B4, 0xED68, 0xCAF1, 0x85C3, 0x1BA7, 0x374E, 0x6E9C},
{ 0x3730, 0x6E60, 0xDCC0, 0xA9A1, 0x4363, 0x86C6, 0x1DAD},
{ 0x3331, 0x6662, 0xCCC4, 0x89A9, 0x0373, 0x06E6, 0x0DCC},
{ 0x1021, 0x2042, 0x4084, 0x8108, 0x1231, 0x2462, 0x48C4}
};

// Returns the password hash for the specified UTF-16 password string
// Returns 0 if the password is empty
unsigned long create_password_hash( char16_t* password )
{
unsigned short low_hash = 0;
unsigned short high_hash;
unsigned short temp;
int pos;
int len = 0;
int bit;
int high_pos = 14;

    // Determine the number of characters in the password, up to a maximum of 15
for (len = 0; len < 15 && password[len] != 0; len++)
;
if ( len == 0 )
return 0;

    pos = len;

    // Initialize the high part of the hash using the length lookup table
high_hash = lookup_length[len - 1];

    // Process each character in the password, starting with the last one
while (pos-- > 0)
{
// Use only the low byte of the character, unless it is zero, then use only the high byte
temp = ((password[pos] & 0xFF) == 0) ?
(unsigned short)((password[pos] >> 8) & 0xFF) :
(unsigned short)(password[pos] & 0xFF);

        // Circular left-shift of the lower 15 bits of the current hash value
low_hash = ((low_hash >> 14) & 0x01) | ((low_hash << 1) & 0x7FFF);

        // Exclusive-or the current character
low_hash ^= temp;

        // For each of the lower 7 bits, if that bit is set, then
// exclusive-or the value from the lookup table for that
// bit and character position.
for ( bit = 0; bit < 7; bit++ )
{
if ( (temp & 1) != 0 )
high_hash ^= lookup_bits[high_pos][bit];
temp >>= 1;
}
high_pos--;
}

    // Circular left-shift of the lower 15 bits of the current hash value
low_hash = ((low_hash >> 14) & 0x01) | ((low_hash << 1) & 0x7FFF);

    // Exclusive-or the length of the password
low_hash ^= len;

    // Exclusive-or a constant
low_hash ^= 0xCE4B;

    // Combine the low and high parts of the hash
return high_hash << 16 | low_hash;
}

// Converts 7-bit characters to the UTF-16 format
// This just involves a simple type cast of each character
// and storage into the 2-byte array.
// Conversion of 8-bit values depends on the code page
// and is beyond the scope of this example.
static char16_t utf_buffer[16];

char16_t* convert_char_to_UTF16( char* password )
{
int pos;

    for ( pos = 0; *password != ' password++ )
utf_buffer[pos++] = (char16_t)*password;
utf_buffer[pos] = 0;
return utf_buffer;
}

void main()
{
// Generic specification of the UTF-16 string L"фис"
char16_t password1[] = { 0x444, 0x438, 0x441, 0 };

    // Contains 8-bit values, equivalent to Japanese string L"サインイン"
char16_t password2[] = { 0x30B5, 0x30A4, 0x30F3, 0x30A4, 0x30F3, 0 };

    // String with low-byte of zero (string is L"Ābc");
char16_t password3[] = { 0x100, 0x62, 0x63, 0 };

    fprintf( stdout, "'Example' hash is: %lXn",
create_password_hash( convert_char_to_UTF16( "Example" ) ) );
fprintf( stdout, "Extended code hash is: %lXn",
create_password_hash( password1 ) );
fprintf( stdout, "Eight-bit hash is: %lXn",
create_password_hash( password2 ) );
fprintf( stdout, "Zero-byte hash is: %lXn",
create_password_hash( password3 ) );
fprintf( stdout, "'VeryVeryLongPassword' hash is: %lXn",
create_password_hash( convert_char_to_UTF16( "VeryVeryLongPassword" ) ) );
}

This program produces the following output:

'Example' hash is: 64CEED7E
Extended code hash is: D928CC28
Eight-bit hash is: C684DE0C
Zero-byte hash is: CA21CCDA
'VeryVeryLongPassword' hash is: B5CCFCA3

 

As a post script, I can’t think of the word “hash” without thinking of corned-beef hash, and how my dad loved it. And I think he would really have enjoyed seeing me at Microsoft. And he would have enjoyed meeting his grandson (who looks quite a bit like his grandfather). Here’s to hashing and Dad…