# Efficient multiplication and division in GF(2⁸)

I talked about Rijndael in a couple of previous posts: Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(2²)

I'm going to talk some more about it today.

Rijndael does a lot of arithmetic operations (addition and multiplication) on elements of GF(28)4.

These are represented as polynomials b0 + b1 x + b2 x2 + b3 x3.

The individual coefficients bi are elements of GF(28).

These are themselves polynomials a0 + a1 x + ... + a7 x7 where the individual coefficients ai are bits (0 or 1).

Multiplication bi bj is made into a closed operation by taking the modulus of the product under the reduction polynomial m = x8 + x4 + x3 + x + 1.

The reduction polynomial is prime, which is important in establishing that this representation of the Galois field really is a field.

For convenience we will represent the elements of GF(28) as hexadecimal numbers 0x00, 0x01, ... 0xfe, 0xff.

By contrast, we will represent the elements of GF(28)4 as vectors (b0 b1 b2 b3); for example, (0x00 0x01 0x02 0x03).

Multiplying vectors in GF(28)4 induces a bunch of multiplications in GF(28). It would be good to come up with a way to make this fast. Doing a polynomial reduction every time is not fast.

Of course, one way to do this is to create a multiplication table with 256 rows and 256 columns, do each multiplication the slow way once, and compile in the results. This would require on the order of 64 kB. But there's a better way.

The trick that is used in Daemen and Rijmen's implementation is to find a primitive element e in GF(28) so that the orbit of e (that is to say, the set {e0, e1, ..., e254}) spans all of GF(28) - {0x00}.

If we find such an e, then we can save a cheat sheet which is almost as fast as the multiplication table, but requires storing only on the order of 256 bytes.

Well, let's calculate the orbits of all the elements in order until we find a primitive element.

Orbit(0x00) = { 0x000, 0x001, 0x002, ... } = { 0x01, 0x00, 0x00, ... } = { 0x00, 0x01 }. Nope.
Orbit(0x01) = { 0x010, 0x011, 0x012, ... } = { 0x01, 0x01, 0x01, ... } = { 0x01 }. Even worse.
Orbit(0x02) = { 0x020, 0x021, 0x022, ... } = { 0x01, 0x02, 0x04, ..., 0x8d }. This looks promising at first but we end up with an orbit of only 51 out of the necessary 255 elements (note that 51 divides 255:)

0x020 = 0x01
0x021 = 0x02
0x022 = 0x04
0x023 = 0x08
0x024 = 0x10
0x025 = 0x20
0x026 = 0x40
0x027 = 0x80
0x028 = 0x1b
0x029 = 0x36
0x0210 = 0x6c
0x0211 = 0xd8
0x0212 = 0xab
0x0213 = 0x4d
0x0214 = 0x9a
0x0215 = 0x2f
0x0216 = 0x5e
0x0217 = 0xbc
0x0218 = 0x63
0x0219 = 0xc6
0x0220 = 0x97
0x0221 = 0x35
0x0222 = 0x6a
0x0223 = 0xd4
0x0224 = 0xb3
0x0225 = 0x7d
0x0226 = 0xfa
0x0227 = 0xef
0x0228 = 0xc5
0x0229 = 0x91
0x0230 = 0x39
0x0231 = 0x72
0x0232 = 0xe4
0x0233 = 0xd3
0x0234 = 0xbd
0x0235 = 0x61
0x0236 = 0xc2
0x0237 = 0x9f
0x0238 = 0x25
0x0239 = 0x4a
0x0240 = 0x94
0x0241 = 0x33
0x0242 = 0x66
0x0243 = 0xcc
0x0244 = 0x83
0x0245 = 0x1d
0x0246 = 0x3a
0x0247 = 0x74
0x0248 = 0xe8
0x0249 = 0xcb
0x0250 = 0x8d
And then we circle back around to...
0x0251 = 0x01

Well, let's keep looking. What about 0x03?

Orbit(0x03) = { 0x030, 0x031, 0x032, ... } = { 0x01, 0x03, 0x05, ..., 0xf6 }. Bingo, we hit all 255 non-0x00 elements!

0x030 = 0x01
0x031 = 0x03
0x032 = 0x05
0x033 = 0x0f
0x034 = 0x11
0x035 = 0x33
0x036 = 0x55
0x037 = 0xff
0x038 = 0x1a
0x039 = 0x2e
0x0310 = 0x72
0x0311 = 0x96
0x0312 = 0xa1
0x0313 = 0xf8
0x0314 = 0x13
0x0315 = 0x35
0x0316 = 0x5f
0x0317 = 0xe1
0x0318 = 0x38
0x0319 = 0x48
0x0320 = 0xd8
0x0321 = 0x73
0x0322 = 0x95
0x0323 = 0xa4
0x0324 = 0xf7
0x0325 = 0x02
0x0326 = 0x06
0x0327 = 0x0a
0x0328 = 0x1e
0x0329 = 0x22
0x0330 = 0x66
0x0331 = 0xaa
0x0332 = 0xe5
0x0333 = 0x34
0x0334 = 0x5c
0x0335 = 0xe4
0x0336 = 0x37
0x0337 = 0x59
0x0338 = 0xeb
0x0339 = 0x26
0x0340 = 0x6a
0x0341 = 0xbe
0x0342 = 0xd9
0x0343 = 0x70
0x0344 = 0x90
0x0345 = 0xab
0x0346 = 0xe6
0x0347 = 0x31
0x0348 = 0x53
0x0349 = 0xf5
0x0350 = 0x04
0x0351 = 0x0c
0x0352 = 0x14
0x0353 = 0x3c
0x0354 = 0x44
0x0355 = 0xcc
0x0356 = 0x4f
0x0357 = 0xd1
0x0358 = 0x68
0x0359 = 0xb8
0x0360 = 0xd3
0x0361 = 0x6e
0x0362 = 0xb2
0x0363 = 0xcd
0x0364 = 0x4c
0x0365 = 0xd4
0x0366 = 0x67
0x0367 = 0xa9
0x0368 = 0xe0
0x0369 = 0x3b
0x0370 = 0x4d
0x0371 = 0xd7
0x0372 = 0x62
0x0373 = 0xa6
0x0374 = 0xf1
0x0375 = 0x08
0x0376 = 0x18
0x0377 = 0x28
0x0378 = 0x78
0x0379 = 0x88
0x0380 = 0x83
0x0381 = 0x9e
0x0382 = 0xb9
0x0383 = 0xd0
0x0384 = 0x6b
0x0385 = 0xbd
0x0386 = 0xdc
0x0387 = 0x7f
0x0388 = 0x81
0x0389 = 0x98
0x0390 = 0xb3
0x0391 = 0xce
0x0392 = 0x49
0x0393 = 0xdb
0x0394 = 0x76
0x0395 = 0x9a
0x0396 = 0xb5
0x0397 = 0xc4
0x0398 = 0x57
0x0399 = 0xf9
0x03100 = 0x10
0x03101 = 0x30
0x03102 = 0x50
0x03103 = 0xf0
0x03104 = 0x0b
0x03105 = 0x1d
0x03106 = 0x27
0x03107 = 0x69
0x03108 = 0xbb
0x03109 = 0xd6
0x03110 = 0x61
0x03111 = 0xa3
0x03112 = 0xfe
0x03113 = 0x19
0x03114 = 0x2b
0x03115 = 0x7d
0x03116 = 0x87
0x03117 = 0x92
0x03119 = 0xec
0x03120 = 0x2f
0x03121 = 0x71
0x03122 = 0x93
0x03123 = 0xae
0x03124 = 0xe9
0x03125 = 0x20
0x03126 = 0x60
0x03127 = 0xa0
0x03128 = 0xfb
0x03129 = 0x16
0x03130 = 0x3a
0x03131 = 0x4e
0x03132 = 0xd2
0x03133 = 0x6d
0x03134 = 0xb7
0x03135 = 0xc2
0x03136 = 0x5d
0x03137 = 0xe7
0x03138 = 0x32
0x03139 = 0x56
0x03140 = 0xfa
0x03141 = 0x15
0x03142 = 0x3f
0x03143 = 0x41
0x03144 = 0xc3
0x03145 = 0x5e
0x03146 = 0xe2
0x03147 = 0x3d
0x03148 = 0x47
0x03149 = 0xc9
0x03150 = 0x40
0x03151 = 0xc0
0x03152 = 0x5b
0x03153 = 0xed
0x03154 = 0x2c
0x03155 = 0x74
0x03156 = 0x9c
0x03157 = 0xbf
0x03158 = 0xda
0x03159 = 0x75
0x03160 = 0x9f
0x03161 = 0xba
0x03162 = 0xd5
0x03163 = 0x64
0x03164 = 0xac
0x03165 = 0xef
0x03166 = 0x2a
0x03167 = 0x7e
0x03168 = 0x82
0x03169 = 0x9d
0x03170 = 0xbc
0x03171 = 0xdf
0x03172 = 0x7a
0x03173 = 0x8e
0x03174 = 0x89
0x03175 = 0x80
0x03176 = 0x9b
0x03177 = 0xb6
0x03178 = 0xc1
0x03179 = 0x58
0x03180 = 0xe8
0x03181 = 0x23
0x03182 = 0x65
0x03183 = 0xaf
0x03184 = 0xea
0x03185 = 0x25
0x03186 = 0x6f
0x03187 = 0xb1
0x03188 = 0xc8
0x03189 = 0x43
0x03190 = 0xc5
0x03191 = 0x54
0x03192 = 0xfc
0x03193 = 0x1f
0x03194 = 0x21
0x03195 = 0x63
0x03196 = 0xa5
0x03197 = 0xf4
0x03198 = 0x07
0x03199 = 0x09
0x03200 = 0x1b
0x03201 = 0x2d
0x03202 = 0x77
0x03203 = 0x99
0x03204 = 0xb0
0x03205 = 0xcb
0x03206 = 0x46
0x03207 = 0xca
0x03208 = 0x45
0x03209 = 0xcf
0x03210 = 0x4a
0x03211 = 0xde
0x03212 = 0x79
0x03213 = 0x8b
0x03214 = 0x86
0x03215 = 0x91
0x03216 = 0xa8
0x03217 = 0xe3
0x03218 = 0x3e
0x03219 = 0x42
0x03220 = 0xc6
0x03221 = 0x51
0x03222 = 0xf3
0x03223 = 0x0e
0x03224 = 0x12
0x03225 = 0x36
0x03226 = 0x5a
0x03227 = 0xee
0x03228 = 0x29
0x03229 = 0x7b
0x03230 = 0x8d
0x03231 = 0x8c
0x03232 = 0x8f
0x03233 = 0x8a
0x03234 = 0x85
0x03235 = 0x94
0x03236 = 0xa7
0x03237 = 0xf2
0x03238 = 0x0d
0x03239 = 0x17
0x03240 = 0x39
0x03241 = 0x4b
0x03242 = 0xdd
0x03243 = 0x7c
0x03244 = 0x84
0x03245 = 0x97
0x03246 = 0xa2
0x03247 = 0xfd
0x03248 = 0x1c
0x03249 = 0x24
0x03250 = 0x6c
0x03251 = 0xb4
0x03252 = 0xc7
0x03253 = 0x52
0x03254 = 0xf6
... and only now do we circle around to...
0x03255 = 0x01

This gives us a one-to-one mapping between the 255 powers 0 through 254 and the 255 non-zero bytes 0x01 through 0xff.

Here are the perl scripts I used to generate these:

use strict;

my \$m = 0x11b;

my \$a = 1;
for (my \$i = 0; ; \$i++) {
printf("0x02^%2d = 0x%02x\n", \$i, \$a);
if (\$i > 0 and \$a == 1) { last; }
\$a <<= 1; # * 0x02
if (\$a > 0xff) { \$a ^= \$m; }
}

print "\n";

\$a = 1;
for (my \$i = 0; ; \$i++) {
printf("0x03^%3d = 0x%02x\n", \$i, \$a);
if (\$i > 0 and \$a == 1) { last; }
\$a = (\$a << 1) ^ \$a; # * 0x03
if (\$a > 0xff) { \$a ^= \$m; }
}

Now that we have this orbit, we use it to construct two lookup tables:

// 255 entries xPlusOne_ToThe[0] = 0x01 to xPlusOne_ToThe[254] = 0xf6
xPlusOne_ToThe[] = { 0x01, 0x03, 0x05, ..., 0x52, 0xf6 };

And the inverse mapping:

// 255 entries logXPlusOne_Of[0x01] = 0 to logXPlusOne_Of[0xff] = 7
logXPlusOne_Of[] = { UNDEFINED, 0, 25, 1, 50, ..., 112, 7 };

Now that we have paid for these 512-ish bytes, we can do multiplication very cheaply:

multiply(b1, b2) {
if (0x00 == b1 || 0x00 == b2) { return 0x00; }

logAns = logXPlusOne_Of[b1] + logXPlusOne_Of[b2];
if (logAns >= 255) { logAns -= 255; }
return xPlusOne_ToThe[logAns];
}

Tags