Explorations on Random Numbers, Part 3
filed in Code on Jul.25, 2011
The last few posts, we’ve explored various random number generators, and ended up with a very simple and solid implementation of George Marsaglia’s Multiply-With-Carry RNG. At this point, we’re done talking about random number generators. We’re going to switch gears and talk about how the compiler actually takes the RNG code and turns it into machine code.
Warning: we will be delving into assembly language, so be prepared!
Let’s start with some very simple code that uses our MWC RNG:
int main()
{
RAND tRand;
RandInit(tRand, 0);
unsigned int nRandom = RandGetNum(tRand);
printf("%u\n", nRandom);
return 0;
}
Now, we’re interested in performance, so the question is: what sort of code does the compiler actually generate for this? Obviously, debug mode is not going to be very interesting for performance, so let’s compile in x86 Release mode and see what the compiler comes up with:
int main()
{
RAND tRand;
RandInit(tRand, 0);
unsigned int nRandom = RandGetNum(tRand);
printf("%u\n", nRandom);
00D11000 push 15AAA322h
00D11005 push offset string "%u\n" (0D120F4h)
00D1100A call dword ptr [__imp__printf (0D120A0h)]
00D11010 add esp,8
return 0;
00D11013 xor eax,eax
}
Wait a minute. What? Let’s translate that assembly language back into C code:
printf("%u\n", 0x15AAA322);
Wow, the compiler has completely optimized away the call to RandGetNum(). It has calculated what the output of the formula would be, and just replaced the result. What happens if we try it twice:
int main()
{
RAND tRand;
RandInit(tRand, 0);
unsigned int nRandom = RandGetNum(tRand);
nRandom = RandGetNum(tRand);
printf("%u\n", nRandom);
return 0;
}
The compiler generates:
01261000 push 0DEC01A79h 01261005 push offset string "%u\n" (12620F4h) 0126100A call dword ptr [__imp__printf (12620A0h)] 01261010 add esp,8 01261013 xor eax,eax
Practically the same thing, just with a different constant. Let’s try a loop:
int main()
{
RAND tRand;
RandInit(tRand, 0);
unsigned int nRandom;
for(int i=0; i<5; i++)
{
nRandom = RandGetNum(tRand);
}
printf("%u\n", nRandom);
return 0;
}
The compiler generates:
011B1006 push 9722BEA6h 011B100B push offset string "%u\n" (11B20F4h) 011B1010 call dword ptr [__imp__printf (11B20A0h)] 011B1016 add esp,8 011B1019 xor eax,eax
The same thing! Wow. In fact, the Visual Studio compiler will follow the rabbit hole down as far as 10 times. After that, it gives up and starts to generate code. So, if we take i up to 11, then the compiler generates the following for the inside of the loop:
; Comments added by me
; For reference: v = 36969 * (v & 0xFFFFFFFF) + (v >> 32);
; ecx:eax contains v
01271017 mov edx,9069h ; Move 36969 into edx
0127101C mul eax,edx ; eax contains the low part of V, so do the multiply
; and store the result in edx:eax
0127101E xor edi,edi ; Clear edi
01271020 add eax,ecx ; Add the high part of V to the multiplied low part
; of V
01271022 adc edx,edi ; edi is 0, so this adds the carry flag from the
; eax+ecx operation to edx
01271024 dec esi ; decrement loop counter
01271025 mov ecx,edx ; Move edx (our temp register) into ecx (the high
; part of V)
01271027 jne main+17h (1271017h) ; loop
Neat! Excluding the loop instructions (dec/jne), our RNG boils down to six instructions in x86. It’s worth noting that, although 64-bit operations are nontrivial in 32-bit assembly code, this algorithm is particularly well-suited for it, because it only operates on 32-bit pieces of the 64-bit number.
Of course, that’s really just a pleasant side-effect. At its core, this really is a 64-bit operation, so let’s compile the same code for x64 and see what happens:
; Comments added by me
; For reference: v = 36969 * (v & 0xFFFFFFFF) + (v >> 32);
; rax contains v
000000013FB91020 mov edx,eax ; Move the low part of V into edx (and
; thus into rdx)
000000013FB91022 shr rax,20h ; Shift rax right 32, making rax contain
; the high part of V
000000013FB91026 imul rdx,rdx,9069h ; Multiply the low part of V by 36969
000000013FB9102D add rax,rdx ; Add the multiplied low part to the
; high part
000000013FB91030 dec r8 ; decrement loop counter
000000013FB91033 jne main+20h (13FB91020h) ; loop
The 64-bit version is even cooler: just four instructions, each of which maps exactly to an operation in the formula.
If we ever had any doubt as to whether the MWC RNG is efficient, this definitely puts those doubts to rest. We’ve also seen how the compiler can track down constant operations even across looping constructs.
Next time: It gets better
Leave a Reply