Making Division Faster

My friend Cort pointed me at this interesting Developing for Developers article on how to make integer division faster when the denominator is a fixed constant. As Cort says, if your processor lacks a dedicated integer divide op, and

…you want to divide an arbitrary X by an arbitrary Y, long division is pretty much the only way to go. However, if the divisor is known at compile time, there is very often a much more efficient way to compute the quotient. The simplest example, familiar to most students of computer science, is the good old x / 2n = x >> n .

For those who are curious, here’s an article that’s chock full of similar tricks for efficient division by other small integers (3, 5, 7, 9, 10, 11, etc.). Even the non-assembly programmers in the audience might benefit from these, as they’re not as commonly implemented by compiler authors.

These tricks are especially useful on the in-order PowerPC CPUs found in most modern game consoles, where the integer divide operation is microcoded: that is to say, even though it appears to be only a single opcode in the data stream, it actually causes the processor to execute a little subroutine stored in ROM. This means it stops the CPU pipeline dead in its tracks until the microcode is done executing (about twenty cycles): until the divide is finished, the processor can do nothing else.

In cases where a divide by a small constant can be replaced by five or six pipelined instructions, this can be a big speed benefit: first, because those five instructions will probably execute faster than the long-division stored in microcode; and second, because the compiler can interleave other useful work in between the dependent operations of the divide. (If that other work is on, say, the load or vector pipeline, it may even dual-dispatch and get you two operations per clock cycle.)

Derrick Coetzee hasn’t updated Developing for Developers recently, but it looks like there’s quite a few interesting articles in the archives worth checking out.

Leave a Reply