For some reason, people on StackOverflow calculate factorials a lot. (Nevermind that it's not necessarily the best way to evaluate a formula.) And you will see factorial functions like this:
int factorial(int n)
{
if (n < 0) return -1; // EDOM
int result = 1;
for (int i = 2; i <= n; i++) result *= i;
return result;
}
But you can do better than this by taking advantage of undefined behavior.
Since signed integer overflow results in undefined behavior
in C/C++,
you can assume that the result of the factorial does not
exceed INT_MAX,
which is 2147483647 for 32-bit signed integers.
This means that n cannot be greater than 12.
So use a lookup table.
int factorial(int n)
{
static const int results[] = {
1,
1,
1 * 2,
1 * 2 * 3,
1 * 2 * 3 * 4,
1 * 2 * 3 * 4 * 5,
1 * 2 * 3 * 4 * 5 * 6,
1 * 2 * 3 * 4 * 5 * 6 * 7,
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8,
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9,
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10,
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11,
1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11 * 12,
};
if (n < 0) return -1; // EDOM
return results[n]; // undefined behavior if n > 12
}
(If you have 64-bit signed integers, then the table needs to go up to 20.)
The fact that you have undefined behavior if n > 12 is hardly notable, because the original code also had undefined behavior if n > 12. You just replaced one undefined behavior with another.
If you want to simulate two's complement overflow,
for example, to preserve bug-for-bug compatibility,
or because the function was defined to compute unsigned
instead of signed factorial,¹
then you can do that by extending the table just a little bit more.
You will need only entries up to 33! because
because 34! is
an exact multiple of 2³².
The result of factorial(n) is zero
for n ≥ 34,
assuming 32-bit integers.
int factorial(int n)
{
static const int results[] = {
1, // 0!
1, // 1!
2, // 2!
6, // 3!
24, // 4!
120, // 5!
720, // 6!
5040, // 7!
40320, // 8!
362880, // 9!
3628800, // 10!
39916800, // 11!
479001600, // 12!
1932053504, // 13!
1278945280, // 14!
2004310016, // 15!
2004189184, // 16!
4006445056, // 17!
3396534272, // 18!
109641728, // 19!
2192834560, // 20!
3099852800, // 21!
3772252160, // 22!
862453760, // 23!
3519021056, // 24!
2076180480, // 25!
2441084928, // 26!
1484783616, // 27!
2919235584, // 28!
3053453312, // 29!
1409286144, // 30!
738197504, // 31!
2147483648, // 32!
2147483648, // 33!
};
if (n < 0) return -1; // EDOM
if (n > 33) return 0; // overflowed to zero
return results[n];
I didn't calculate all those numbers myself. I wrote a program to do it.
class Program
{
public static void Main() {
uint result = 1;
uint n = 0;
while (result != 0) {
System.Console.WriteLine(" {0,10}, // {1}!", result, n);
result *= ++n;
}
}
}
Extending the above algorithm to 64-bit integers is left as an exercise.
¹ Unsigned arithmetic is defined by the C/C++ standards to be modulo 2ⁿ for some n.
Ahhh the grotty corners of C++. I personally love to delve into undefined, implementation defines and unspecified behavior. Infact, one of my favorite small interview questions is getting the candidate to think about these things – given the definitions of them give or analyse examples of each of them and think about the dangers they may pose to your program.
It even gets better when you realize undefined behavior goes “back in time”… things that should’ve happened before the ‘undefined’ behavior hit may never actually happen… it logically follows on if you think about compiler optimizations.
And remember Vines Law of Undefined Behavior: Any non-trivial C++ program has undefined behavior. Seriously though, having been in the industry for many years I’ve never seen any reasonably sized project not include some undefined behavior if you look hard enough and are pedantic enough. Scary if you thiink about it too hard.
For those scratching their heads, Raymond showed how undefined behavior can go back in time last year: https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/ .
I wrote that code 14 years ago in class to prove an apparently preposterous statement about Big-O. There’s no way you were in that class that day. How did you get it?
What was the apparently preposterous statement?
I’m not quite sure but I think I outright claimed factorial was constant time.
That’s where Raymond inadvertently lets us in on his time machine.
I wonder if any existing sufficiently-optimising compilers already perform transformation #1, by unrolling sufficiently many initial iterations (i.e. more than 12) to spot the undefined behaviour and just exit there.
Your table contains values that exceed the range of “int”. It’d be better to have the table be of unsigned numbers, then return static_cast<int>(results[n]).
Undefined behavior is undefined. He put in a value that exceeds Int.MaxValue specifically to get the same undefined behavior that the factorial() function had doing the math itself.
No, it’s a genuine bug. I was hoping nobody would notice.
You could get really pedantic and say that you can do this for any kind of algorithm.
A real computer is not a Turing machine but always finite, more like a linear bounded automaton.
The halting problem is decidable for that. For a fixed input size you can say the algorithm runs in constant time, with the constant being the maximum required time for any input. Of course, in general that “constant” will be ridiculously large.
True, but for factorial, the number of inputs for which a defined output exists is extremely low, which makes the problem interesting.
Yup. One can write a program to output value of Pi up to 300 decimal places instantaneously in similar way too.
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066
Oh, look. We have a much wider screen space here. :)
Maybe we need word-break: break-all for .comment-body class.
Unfortunately, word-break: break-all breaks too aggressively. It will happily split a medium-length word if it happens to come close to the end of a line. Maybe break-word will work.
For such problems with fixed input size you can use memoization in most cases.
I may be missing something obvious, but what does “EDOM” mean?
Domain error: the argument is out of the domain of the function (modulo machine precision, in this case)
Domain error: http://en.cppreference.com/w/c/error/errno_macros
Standard library functions in math.h set errno to EDOM if you call them with input parameters outside of the function’s domains (but only if you’ve enabled it by setting the flag MATH_ERRNO in the math_errhandling global).
The overflow case is not really undefined behavior. I don’t think anyone sane would take advantage of overflowing factorial calculations, but the overflow of multiplications is actually defined behavior. The assembly instruction MUL with 32-bit operands, actually does the multiplication and stores the result in the 64-bit register pair eax:edx, but C++’s * operator takes only the lower 32-bits from eax and discards edx. The MulDiv function uses this (defined) functionality.
It may be defined by the CPU, but it is undefined by the C/C++ language specification. Other CPUs may behave differently. (For example, the Alpha AXP does not have an edx register.)
I find it easier to notice that 2ⁿ! ends in 2ⁿ-1 zero bits (in other prime bases pⁿ! would end in (pⁿ-1)/(p-1) zeros).
Here’s what gcc -O3 makes of your factorial function. Not quite constant time but it’s got to be pretty fast with all that SIMD stuff:
factorial(int):
cmpl $1, %edi
jle .L9
leal -5(%rdi), %eax
leal -2(%rdi), %edx
leal -1(%rdi), %esi
shrl $2, %eax
addl $1, %eax
cmpl $8, %edx
leal 0(,%rax,4), %ecx
jbe .L10
movdqa .LC0(%rip), %xmm0
xorl %edx, %edx
movdqa .LC1(%rip), %xmm2
movdqa .LC2(%rip), %xmm4
.L5:
movdqa %xmm2, %xmm3
movdqa %xmm2, %xmm1
paddd %xmm4, %xmm2
addl $1, %edx
pmuludq %xmm0, %xmm3
psrlq $32, %xmm1
psrlq $32, %xmm0
pmuludq %xmm0, %xmm1
pshufd $8, %xmm3, %xmm0
cmpl %edx, %eax
pshufd $8, %xmm1, %xmm1
punpckldq %xmm1, %xmm0
ja .L5
movdqa %xmm0, %xmm1
cmpl %ecx, %esi
movdqa %xmm0, %xmm2
psrlq $32, %xmm0
leal 2(%rcx), %edx
psrldq $8, %xmm1
pmuludq %xmm1, %xmm2
psrlq $32, %xmm1
pshufd $8, %xmm2, %xmm2
pmuludq %xmm1, %xmm0
pshufd $8, %xmm0, %xmm1
punpckldq %xmm1, %xmm2
movdqa %xmm2, %xmm0
movdqa %xmm2, %xmm1
psrldq $4, %xmm1
pmuludq %xmm1, %xmm0
movd %xmm0, %eax
je .L12
.L3:
leal 1(%rdx), %ecx
imull %edx, %eax
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 2(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 3(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 4(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 5(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 6(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
leal 7(%rdx), %ecx
cmpl %ecx, %edi
jl .L2
imull %ecx, %eax
addl $8, %edx
movl %eax, %ecx
imull %edx, %ecx
cmpl %edx, %edi
cmovge %ecx, %eax
ret
.L9:
movl $1, %eax
.L2:
rep ret
.L12:
rep ret
.L10:
movl $2, %edx
movl $1, %eax
jmp .L3
Here is a better way to do it with C++14. This works in Visual Studio 2015. This will work with any integer type.
#include
template
constexpr T cfact(unsigned f) {
return f?cfact(f – 1)*f : 1;
}
template
constexpr T factorial_max_helper(T value_so_far, unsigned f) {
return std::numeric_limits::max() / value_so_far < f ? f – 1 : factorial_max_helper(value_so_far*f, f + 1);
}
template
T factorial_helper(unsigned f, std::index_sequence indices) {
static constexpr T ar[] = { cfact(Indices)… };
return ar[f];
}
template
T factorial(unsigned f) {
return factorial_helper(f, std::make_index_sequence<factorial_max_helper(1, 1) + 1>{});
}
You use it like this
#include
int main() {
unsigned i = 1;
std::cout <> i;
volatile auto r = factorial(i);
std::cout << r << "\n";
Sorry, formatting got messed up. You can see the code at https://gist.github.com/jbandela/7fab41efe7cc37d7aef4
It would be better to have explicit overflow behavior by need of users, since many people are still not aware of this …
https://groups.google.com/a/isocpp.org/forum/?fromgroups#!topic/std-proposals/oQUOtYX4R3o
If this code was part of a network service, or say, exposed to content in a web browser with some API, it would let an attacker read memory past the end of the array. Which, depending on what’s there, could very well be used to defeat ASLR.
So, bad practice.