An anonymous commenter asks:
I thought that only compatable types could alias? So for example, an int array can’t alias a float array, and a CFoo* can’t alias it’s own member?
Something to remember about the __restrict keyword is that it doesn’t affect how the compiler itself aliases pointers: instead, it’s about what the compiler can assume you, the progammer, may or may not have aliased.
If there is any chance two pointers alias each other, the compiler is forced to reload the contents of each from memory after every store operation.
In the case of class member functions, the compiler of course knows that different data members of this can’t alias one another. The issue with MemberFunc in CFoo below isn’t whether &m_iBar and &m_iBaz alias one another, but whether a == &m_iBar.
class CFoo { public: int MemberFunc( int *a, int x ); int MemberFuncWithRestrict( int * __restrict a, int x ); int m_iBar; int m_iBaz; }; int CFoo::MemberFunc( int *a, int x ) { m_iBar = x; *a = 7; // if a == &m_iBar, then m_iBar might be 7 now. // the processor must load it back from memory. return m_iBar + *a; // load-hit-store }
MemberFunc compiles to something like:
; int CFoo::MemberFunc( int *a, int b ); ; by convention, CFoo * this is on register 3, ; int *a is on register 4, and ; x is on register 5 li r10,7 ; set r10 to 7 stw r5,0(r3) ; store r11 to this->m_iBar stw r10,0(r4) ; store r10 through pointer a ; now, because a might or might not point to m_iBar, ; we need to load back the contents of m_iBar just in case. ; the compiler *cannot* assume that a != &m_iBar, and ; the load below will stall until the previous store to ; m_iBar is completely finished -- at least forty cycles! lwz r11,0(r3) ; load this->m_iBar from memory to r11 add r3,r11,r10 ; r3 = r11 + r10 blr ; return r3
Now, if we use the __restrict keyword to promise the compiler that a doesn’t alias anything under this — in particular, that a != &m_iBar, it can make some assumptions and avoid loading m_iBar back from memory after it has been stored:
int CFoo::MemberFuncWithRestrict( int * __restrict a, int x ) __restrict { m_iBar = x; *a = 7; // __restrict THIS promises that a != &m_iBar return m_iBar + *a; // no load-hit-store }
compiles to
li r10,7 ; set r10 to 7 stw r5,0(r3) ; store r5 to this->m_iBar stw r10,0(r4) ; store r10 through pointer a ;; because the compiler knows that the write to a ;; cannot affect the contents of m_iBar, there is ;; no need for it to load m_iBar back into memory; ;; the store above cannot have changed it add r3, r5, r10 ; r3 = r5 + r10 blr ; return r3
This function runs much faster, avoiding the load-hit store on reading back m_iBar, but it is only correct so long as the promise is true. If you call the function like this:
CFoo foo; out = foo.MemberFuncWithRestrict( &foo.m_iBar, 5 );
you will get incorrect results, in this case a return value of 12 instead of 14.
♠
Even if two pointers are different types, the compiler still has no way of knowing whether they actually point to different locations in memory.
For example, consider the function
float slow( int *a, float *b ) { *a = 5; *b = 7.0; return *a+*b; }
This function could be called from a different cpp in a way that aliased the pointers to the same location:
int foo[8]; float bar[8]; bar[0] = slow( foo + 0, bar + 0 ); // pointers are not aliased foo[0] = slow( foo + 1, (float *)(foo + 1) ); // pointers are aliased
Granted, this is a contrived example, but also one that’s completely legal. It underscores how the compiler can’t make any assumptions about pointer aliasing unless you hold its hand. Even though a and b are different types, the programmer can still deliberately alias them. The function slow above compiles to something like this on the PowerPC:
; assembly for float slow( int *a, float *b ) ; by convention, parameter a is stored on r3 and parameter b on r4. lis r11,__real@40e00000 ; load address of constant 7.0f li r10,5 ; load 5 onto r10 stw r10,0(r3) ; save r10 through pointer a lfs fr0,__real@40e00000(r11) ; load 7.0f onto fr0 stfs fr0,0(r4) ; save 7.0f through pointer b ;; the following line causes the load-hit store: ;; because a and b might be aliased, the compiler must ;; now load back through pointer a in case the write to ;; b overwrote its contents! ;; the following instruction will cause a pipeline stall. lwz r9,0(r3) ; load integer pointed to by a std r9,-10h(r1) ; convert it to a float and save to memory ;; here is another load-hit-store: most processors have no way ;; to move data between integer and floating point registers ;; other than by saving to memory and loading back again. ;; this load will stall the pipeline: lfd fr13,-10h(r1) ; load from that address again into fr13 fadds fr1,fr13,fr0 ; add a and b and blr ; return on fr1
This simple function has two load hit store stalls, each of which stalls the processor for the entire length of the pipeline. By contrast, if we write our function like this:
float slowWithRestrict( int * __restrict a, float * __restrict b ) { *a = 5; *b = 7.0; return *a+*b; }
… then we will get incorrect results if a and b happen to point to the same address, but in every other case it will avoid stalls completely. This is because we have promised the compiler that the write to b cannot possibly overwrite the contents of *a, something it could not know otherwise. It compiles to something like this:
; assembly for float slowWithRestrict( ) ; pointer a is on r3. pointer b is on r4. lis r10,__real@40e00000 ; load address of constant "7.0f" to r10 lis r11,__real@41400000 ; load address of constant "12.0f" to r11 li r9,5 ; set r9 to "5" stw r9,0(r3) ; store r9 through pointer a lfs fr0,__real@40e00000(r10) ; load constant "7.0" from memory onto fr0 lfs fr1,__real@41400000(r11) ; load constant "12.0" from memory onto fr1 stfs fr0,0(r4) ; store fr0 through pointer b blr ; return fr1
No load-hit-stores at all.