More on __restrict

An anonymous commenter asks:

I thought that only compatable types could alias? So for example, an int array can’t alias a float array, and a CFoo* can’t alias it’s own member?

Something to remember about the __restrict keyword is that it doesn’t affect how the compiler itself aliases pointers: instead, it’s about what the compiler can assume you, the progammer, may or may not have aliased.

If there is any chance two pointers alias each other, the compiler is forced to reload the contents of each from memory after every store operation.

In the case of class member functions, the compiler of course knows that different data members of this can’t alias one another. The issue with MemberFunc in CFoo below isn’t whether &m_iBar and &m_iBaz alias one another, but whether a == &m_iBar.

class CFoo
{
public:
	int MemberFunc( int *a, int x );
	int MemberFuncWithRestrict( int * __restrict a, int x );
	int m_iBar;
	int m_iBaz;
};
 
int CFoo::MemberFunc( int *a, int x )
{
	m_iBar = x;
	*a = 7; // if a == &m_iBar, then m_iBar might be 7 now.
	        // the processor must load it back from memory.
	return m_iBar + *a; // load-hit-store
}

MemberFunc compiles to something like:

; int CFoo::MemberFunc( int *a, int b );
; by convention, CFoo * this is on register 3,
; int *a is on register 4, and
; x is on register 5
li r10,7       ; set r10 to 7
stw  r5,0(r3)  ; store r11 to this->m_iBar
stw  r10,0(r4) ; store r10 through pointer a
; now, because a might or might not point to m_iBar,
; we need to load back the contents of m_iBar just in case.
; the compiler *cannot* assume that a != &m_iBar, and
; the load below will stall until the previous store to
; m_iBar is completely finished -- at least forty cycles!
lwz r11,0(r3)  ; load this->m_iBar from memory to r11
add r3,r11,r10 ; r3 = r11 + r10
blr            ; return r3

Now, if we use the __restrict keyword to promise the compiler that a doesn’t alias anything under this — in particular, that a != &m_iBar, it can make some assumptions and avoid loading m_iBar back from memory after it has been stored:

int CFoo::MemberFuncWithRestrict( int * __restrict a, int x ) __restrict
{
	m_iBar = x;
	*a = 7; // __restrict THIS promises that a != &m_iBar
	return m_iBar + *a; // no load-hit-store
}

compiles to

li r10,7        ; set r10 to 7
stw r5,0(r3)    ; store r5 to this->m_iBar
stw r10,0(r4)   ; store r10 through pointer a
;; because the compiler knows that the write to a
;; cannot affect the contents of m_iBar, there is
;; no need for it to load m_iBar back into memory;
;; the store above cannot have changed it
add r3, r5, r10 ; r3 = r5 + r10
blr             ; return r3

This function runs much faster, avoiding the load-hit store on reading back m_iBar, but it is only correct so long as the promise is true. If you call the function like this:

CFoo foo;
out = foo.MemberFuncWithRestrict( &foo.m_iBar, 5 );

you will get incorrect results, in this case a return value of 12 instead of 14.

Even if two pointers are different types, the compiler still has no way of knowing whether they actually point to different locations in memory.

EDIT: the following section is only true if your compiler does not have strict aliasing turned on. In GCC ≥4 it is on by default unless you specify -f-no-strict-aliasing. In MSVC it is off by default and I could not find how to turn it on. Mike Acton discusses strict aliasing in great detail here. Thanks, Arseny!

For example, consider the function

float slow( int *a, float *b )
{
	*a = 5;
	*b = 7.0;
	return *a+*b;
}

This function could be called from a different cpp in a way that aliased the pointers to the same location:

int foo[8]; float bar[8];
bar[0] = slow( foo + 0, bar + 0 ); // pointers are not aliased
foo[0] = slow( foo + 1, (float *)(foo + 1) ); // pointers are aliased

Granted, this is a contrived example, but also one that’s completely legal. It underscores how the compiler can’t make any assumptions about pointer aliasing unless you hold its hand. Even though a and b are different types, the programmer can still deliberately alias them. The function slow above compiles to something like this on the PowerPC:

; assembly for float slow( int *a, float *b )
; by convention, parameter a is stored on r3 and parameter b on r4.
lis r11,__real@40e00000 ; load address of constant 7.0f
li           r10,5     ; load 5 onto r10
stw          r10,0(r3) ; save r10 through pointer a
lfs          fr0,__real@40e00000(r11) ; load 7.0f onto fr0
stfs         fr0,0(r4) ; save 7.0f through pointer b
;; the following line causes the load-hit store:
;; because a and b might be aliased, the compiler must
;; now load back through pointer a in case the write to
;; b overwrote its contents!
;; the following instruction will cause a pipeline stall.
lwz          r9,0(r3)    ; load integer pointed to by a
std          r9,-10h(r1) ; convert it to a float and save to memory
;; here is another load-hit-store: most processors have no way
;; to move data between integer and floating point registers
;; other than by saving to memory and loading back again.
;; this load will stall the pipeline:
lfd          fr13,-10h(r1) ; load from that address again into fr13
fadds        fr1,fr13,fr0  ; add a and b and
blr                        ; return on fr1

This simple function has two load hit store stalls, each of which stalls the processor for the entire length of the pipeline. By contrast, if we write our function like this:

float slowWithRestrict( int * __restrict a, float * __restrict b )
{
	*a = 5;
	*b = 7.0;
	return *a+*b;
}

… then we will get incorrect results if a and b happen to point to the same address, but in every other case it will avoid stalls completely. This is because we have promised the compiler that the write to b cannot possibly overwrite the contents of *a, something it could not know otherwise. It compiles to something like this:

; assembly for float slowWithRestrict( )
; pointer a is on r3. pointer b is on r4.
lis r10,__real@40e00000 ; load address of constant "7.0f" to r10
lis r11,__real@41400000 ; load address of constant "12.0f" to r11
li  r9,5                ; set r9 to "5"
stw r9,0(r3)            ; store r9 through pointer a
lfs fr0,__real@40e00000(r10) ; load constant "7.0" from memory onto fr0
lfs fr1,__real@41400000(r11) ; load constant "12.0" from memory onto fr1
stfs fr0,0(r4)          ; store fr0 through pointer b
blr                     ; return fr1

No load-hit-stores at all.

10 Comments

  1. As far as I understand, the example ‘slow’ function is kinda wrong.

    Compiler can assume that a and b don’t alias, because if they do alias, it’s a violation of strict aliasing rules (the calling code with (float*)(foo + 1) contains this violation). Whether the compiler actually assumes it of course depends on implementation and optimization settings.

  2. Ruskin says:

    That’s what I thought, too, until I actually compiled the function and looked at the output, which is what you see here. Apparently the Microsoft compiler doesn’t enable strict aliasing. Another example of the vast chasm between what the compiler should do and what it does do.

    In any case having ints and floats aliased like this is a contrived example and rather silly. The more likely case is two pointers of the same type.

  3. Yeah, there seems to be no strict aliasing in MSVC. It was both a good and bad thing for us – good, because of additional optimizations when switching to GCC (PS3), bad, because of additional PS3-only bugs…

  4. B says:

    you will get incorrect results, in this case a return value of 12 instead of 14.

    Did you mean

    you will get incorrect results, in this case a return value of 14 instead of 12.

  5. Ruskin says:

    No, although that underscores the confusing nature of __restrict aliasing bugs. If you take a close look at MemberFunc:

    int CFoo::MemberFuncWithRestrict( int * a, int x )
    {
    m_iBar = x;
    *a = 7;
    return m_iBar + *a;
    }

    What’s supposed to happen is that it sets m_iBar to a number. then the contents of pointer a to 7, and then adds m_iBar to whatever’s through pointer a. With parameters of (&m_iBar, 5), what should happen is that *a and m_iBar are the same thing, so m_iBar should get set to 7 at line two, and then return 7+7=14. The compiler does this by reloading m_iBar from memory after *a = 7.

    With __restrict, you have promised the compiler that a&m_iBar, so it does not reload m_iBar. Instead, it assumes that m_iBar is still 5 (and not 7) after the assignment to *a, and returns 5+7 = 12.

  6. Ruskin says:

    Thanks RC! I was hoping they’d be clear, it’s a confusing subject. I’m finally getting around to writing up all the stuff I couldn’t squeeze into my GDC talk. =)

  7. Acetone. says:

    Thats a pretty good post. Arseny, actually points out an interesting point, its good and bad too. I ran into exactly this issue.

    Anyways, What I was wondering whether its possible to tell ( or modify ) the compiler to _assume_ pointers are unaliased and insert code containing LHS inside the a branch ( which tests whether the pointers are aliased or not during runtime ).

    why whould we need this ?

    Obviously the profiling tools can point out the bits of code which are actually causing the pipeline stalls, but in some cases its not possible to refactor or know whether we can _actually_ use the restrict keyword.

    With failsafe code within a branch ( which can be predicted to be false most of the time …) to care of the rare condition of aliasing, I think it will make the job of profiling/re-coding/profiling..

    Ofcourse, this also depends on the assumption that a branch would be less expensive than a stall…

  8. Elan says:

    Well in general I think you’re better off making an algorithmic decision up front about whether or not the pointers are allowed to be aliased at all. In many cases having aliased pointers to structures is a sign that something has gone wrong — consider what would happen inside (contrived example)

    void CrossProductNoStackCopy( Vector *out, Vector *in1, Vector *in2 )
    {
      out->x = in1->y * in2->z - in1->z * in2->y ;
      out->y = in1->z * in2->x - in1->x * in2->z ;
      out->z = in1->x * in2->y - in1->y * in2->x ;
    }

    if out and in1 were the same pointer.

    But if you really really needed to make the determination at runtime I guess you could do something like

    void func(int *a, int *b) 
    {
       if ( a == b )
       {
          // aliased code
       }
       else
       {  // we know they are not aliased
          int * __restrict pA = a; 
          int * __restrict pB = b; 
          // nonaliased code runs on pA and pB
       }
    }

    which trades the 40-cycle LHS for a 5-20 cycle branch stall.

  9. Gregory says:

    Here is the working URL for Mike Acton’s article on understanding strict aliasing:

    http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Leave a Reply