Protected: How One Bug Cost Android a Customer

This content is password protected. To view it please enter your password below:

Clever flash game teaches embedded programming

Here’s a clever take on the program-the-robot game that does a nice job of teaching of how to think in terms of subroutines and perform instruction-space optimization — in this case the speed-space tradeoff definitely favors space. It feels a lot like programming shaders!

More on __restrict

An anonymous commenter asks:

I thought that only compatable types could alias? So for example, an int array can’t alias a float array, and a CFoo* can’t alias it’s own member?

Something to remember about the __restrict keyword is that it doesn’t affect how the compiler itself aliases pointers: instead, it’s about what the compiler can assume you, the progammer, may or may not have aliased.

If there is any chance two pointers alias each other, the compiler is forced to reload the contents of each from memory after every store operation.

In the case of class member functions, the compiler of course knows that different data members of this can’t alias one another. The issue with MemberFunc in CFoo below isn’t whether &m_iBar and &m_iBaz alias one another, but whether a == &m_iBar.

class CFoo
{
public:
	int MemberFunc( int *a, int x );
	int MemberFuncWithRestrict( int * __restrict a, int x );
	int m_iBar;
	int m_iBaz;
};
 
int CFoo::MemberFunc( int *a, int x )
{
	m_iBar = x;
	*a = 7; // if a == &m_iBar, then m_iBar might be 7 now.
	        // the processor must load it back from memory.
	return m_iBar + *a; // load-hit-store
}

MemberFunc compiles to something like:

; int CFoo::MemberFunc( int *a, int b );
; by convention, CFoo * this is on register 3,
; int *a is on register 4, and
; x is on register 5
li r10,7       ; set r10 to 7
stw  r5,0(r3)  ; store r11 to this->m_iBar
stw  r10,0(r4) ; store r10 through pointer a
; now, because a might or might not point to m_iBar,
; we need to load back the contents of m_iBar just in case.
; the compiler *cannot* assume that a != &m_iBar, and
; the load below will stall until the previous store to
; m_iBar is completely finished -- at least forty cycles!
lwz r11,0(r3)  ; load this->m_iBar from memory to r11
add r3,r11,r10 ; r3 = r11 + r10
blr            ; return r3

Now, if we use the __restrict keyword to promise the compiler that a doesn’t alias anything under this — in particular, that a != &m_iBar, it can make some assumptions and avoid loading m_iBar back from memory after it has been stored:

int CFoo::MemberFuncWithRestrict( int * __restrict a, int x ) __restrict
{
	m_iBar = x;
	*a = 7; // __restrict THIS promises that a != &m_iBar
	return m_iBar + *a; // no load-hit-store
}

compiles to

li r10,7        ; set r10 to 7
stw r5,0(r3)    ; store r5 to this->m_iBar
stw r10,0(r4)   ; store r10 through pointer a
;; because the compiler knows that the write to a
;; cannot affect the contents of m_iBar, there is
;; no need for it to load m_iBar back into memory;
;; the store above cannot have changed it
add r3, r5, r10 ; r3 = r5 + r10
blr             ; return r3

This function runs much faster, avoiding the load-hit store on reading back m_iBar, but it is only correct so long as the promise is true. If you call the function like this:

CFoo foo;
out = foo.MemberFuncWithRestrict( &foo.m_iBar, 5 );

you will get incorrect results, in this case a return value of 12 instead of 14.

Even if two pointers are different types, the compiler still has no way of knowing whether they actually point to different locations in memory.

EDIT: the following section is only true if your compiler does not have strict aliasing turned on. In GCC ≥4 it is on by default unless you specify -f-no-strict-aliasing. In MSVC it is off by default and I could not find how to turn it on. Mike Acton discusses strict aliasing in great detail here. Thanks, Arseny!

For example, consider the function

float slow( int *a, float *b )
{
	*a = 5;
	*b = 7.0;
	return *a+*b;
}

This function could be called from a different cpp in a way that aliased the pointers to the same location:

int foo[8]; float bar[8];
bar[0] = slow( foo + 0, bar + 0 ); // pointers are not aliased
foo[0] = slow( foo + 1, (float *)(foo + 1) ); // pointers are aliased

Granted, this is a contrived example, but also one that’s completely legal. It underscores how the compiler can’t make any assumptions about pointer aliasing unless you hold its hand. Even though a and b are different types, the programmer can still deliberately alias them. The function slow above compiles to something like this on the PowerPC:

; assembly for float slow( int *a, float *b )
; by convention, parameter a is stored on r3 and parameter b on r4.
lis r11,__real@40e00000 ; load address of constant 7.0f
li           r10,5     ; load 5 onto r10
stw          r10,0(r3) ; save r10 through pointer a
lfs          fr0,__real@40e00000(r11) ; load 7.0f onto fr0
stfs         fr0,0(r4) ; save 7.0f through pointer b
;; the following line causes the load-hit store:
;; because a and b might be aliased, the compiler must
;; now load back through pointer a in case the write to
;; b overwrote its contents!
;; the following instruction will cause a pipeline stall.
lwz          r9,0(r3)    ; load integer pointed to by a
std          r9,-10h(r1) ; convert it to a float and save to memory
;; here is another load-hit-store: most processors have no way
;; to move data between integer and floating point registers
;; other than by saving to memory and loading back again.
;; this load will stall the pipeline:
lfd          fr13,-10h(r1) ; load from that address again into fr13
fadds        fr1,fr13,fr0  ; add a and b and
blr                        ; return on fr1

This simple function has two load hit store stalls, each of which stalls the processor for the entire length of the pipeline. By contrast, if we write our function like this:

float slowWithRestrict( int * __restrict a, float * __restrict b )
{
	*a = 5;
	*b = 7.0;
	return *a+*b;
}

… then we will get incorrect results if a and b happen to point to the same address, but in every other case it will avoid stalls completely. This is because we have promised the compiler that the write to b cannot possibly overwrite the contents of *a, something it could not know otherwise. It compiles to something like this:

; assembly for float slowWithRestrict( )
; pointer a is on r3. pointer b is on r4.
lis r10,__real@40e00000 ; load address of constant "7.0f" to r10
lis r11,__real@41400000 ; load address of constant "12.0f" to r11
li  r9,5                ; set r9 to "5"
stw r9,0(r3)            ; store r9 through pointer a
lfs fr0,__real@40e00000(r10) ; load constant "7.0" from memory onto fr0
lfs fr1,__real@41400000(r11) ; load constant "12.0" from memory onto fr1
stfs fr0,0(r4)          ; store fr0 through pointer b
blr                     ; return fr1

No load-hit-stores at all.

A really specific Facebook phishing virus?

I just received a message through Facebook that points at how malware authors can be really, really specific with their attack vectors, and how they exploit social networks to make their messages appear to come from a legitimate, trusted source.

We’ve all received links to malware websites in email, of course, and usually we can reject them out of hand because the sender is obviously fake. If the sender’s name is someone you actually know and trust, you’re more likely to open the email, but knowing how easily email headers can be forged you might still be a little suspicious. But Facebook messages require someone to have logged in and authenticated — if I get a message from my friend Tim, it means that Tim has actually gone to facebook.com and opened up his little message thingy and typed something out to me.

Or, at least, Tim’s browser has.

I got this Wall post on Facebook earlier today from an old friend I’ve had in my list since forever.

I got a free iPhone! Go to (URL) to get yours!

At first I was a little confused, because I couldn’t see how a phisher could forge a Facebook message, but it was still suspicious enough that I accessed the linked website with kid gloves — and indeed, it’s a webpage that hosts the Javascript malware enclosed below the jump (posted as an image for your safety). I didn’t bother to decode exactly what it does, but it clearly decrypts some kind of encrypted exploit into the browser and executes it.

There’s a couple of interesting things about this: first, the incredible specificity of this virus to maximize the chances that it would appear to come to me from a trusted source. Unless the virus author has hacked into Facebook’s back end, the only way this could work is if the virus snagged my friend Tim’s Facebook password, then logged into his account on its own, accessed his friends list, then mechanically transmitted that message to all those friends. This is a lot of specific code to write, stuff that reads out of Facebook’s HTML and knows how to find the friends list there, and then and knows how to navigate the website to send out messages. It’s a lot of work to take advantage of one social networking site, which goes to show how valuable it is to take advantage of our own assumptions of trust in our friends (or how little virus writers value their time).

Second: even though the actual malware is hosted on the Google-owned Blogspot service, Google’s own malware-detecting tools don’t list it as malicious. In fact, when I tested the site with Google Safe Browsing, it told me “This site is not listed as suspicious” and “Google has not visited this site within the past 90 days”, which is to say that Google can’t even patrol its own webhosting service for exploits. Maybe that explains why Google hosts 2% of the world’s malware.

Here’s the source code for the exploiting Blogspot page that the URL goes to, in case you feel like figuring out what it does. I’m posting it as an image to prevent any chance of that Javascript from actually being parsed on your browser.

Read the rest of this entry »

Load-hit-stores and the __restrict keyword

The load-hit-store is one of those quirky CPU implementation details that can cause significant performance problems in high-level code without it really being clear why, especially on in-order cores such as the PowerPCs inside the 360 and PS3. It’s especially insidious because it’s exactly the sort of thing we expect our compilers to transparently handle for us, and yet the compiler can’t handle correctly without explicit hinting from the programmer. Fortunately, once you know what’s going on underneath the hood, there is a C++ keyword that helps avoid this problem without having to resort to inline assembly and darker arts.

Basically, a load-hit-store is a large stall that occurs when the CPU writes data to an address x and then tries to load that data from x again too soon afterwards. The reason for this problem has to do with the deep instruction pipeline of a modern processor, but the consequence is clear: the processor’s execution comes to a complete halt for between 40 and 80 cycles. The simplest way to think of it is is that the data has to “bounce off of the L1 cache”.  That is, when you write data from a CPU register to memory, the “store” operation has to complete, writing the data all the way out to main memory, before the data can be read back into a register again. Here’s a trivial example in C:

int CauseLHS(int *ptrA)
{
   int a, b;
   int *ptrB = ptrA; // B and A point to the same address
   *ptrA = 5; // write data to address ptrA
   b = *ptrB; // read that data back in again (won't be
                  // available for 80 cycles)
   a = b + 10; // stalls! the data isn't available yet.
}

This seems like the sort of thing the compiler should notice and fix by simply keeping the contents of *ptrA in a register, but in most real-world cases the compiler won’t be able to tell that ptrA and ptrB point to the same address. Thus, it’s obliged to read memory back from a pointer every time you dereference it, because any other pointer in the function might have aliased and modified that data. There is, however, a way to help out the compiler a little: the __restrict keyword.

__restrict is a compiler directive that helps avoid load-hit-store stalls.

__restrict on a pointer promises the compiler that it has no aliases: nothing else in the function points to that same data. Thus the compiler knows that if it writes data to a pointer, it doesn’t need to read it back into a register later on because nothing else could have written to that address. Without __restrict, the compiler is forced to read data from every pointer every time it is used, because another pointer may have aliased x.

For example, this code will run slowly:

int slow(int *a, int *b)
{
   *a = 5;
   *b = 7;
   return *a + *b; // LHS stall: the compiler doesn't
                  // know whether a == b, so it has to
                  //  reload both before the add
}

Whereas this code will run quickly:

int fast(int * __restrict a, int * __restrict b)
{
   *a = 5;
   *b = 7; // RESTRICT promises that a != b
   return *a + *b; // no stall; the compiler hangs onto
                         // 5 and 7 in the registers.
}

It bears repeating that __restrict is a promise you make to your compiler. If you break your promise, you can get incorrect results. If pointers pA and pB are __restrict, and pA == pB, that will cause mysterious bugs.

For example, in the fast() function above, if a does equal b, then the correct return value would be 14. However, the compiler won’t know that *b = 7 has changed the value of *a, and so you might actually get a return value of 12.

When working with C++ member functions, you can tell the compiler that the implicit this pointer is __restrict like so:

struct CFoo
{
  int MemberFunc();
  int m_iBar;
}
int CFoo::MemberFunc( int *a ) __restrict
{
  m_iBar = 5;
  *a = 7; // __restrict THIS promises that a != &m_iBar
  return m_iBar + 12; // no load-hit-store
}

There is unfortunately no way to mark a C++ reference as __restrict, so function parameters declared like int foo(int &a, int &b) cannot benefit from __restrict. In those cases, either copy a and b to local variables inside your function, then write the final values back out again at the end; or change your function signature to use pointers instead.

In summary, __restrict makes code involving pointers faster, so long as the pointers never alias one another. It is the usual first step when your profiler complains that a function has too many load-hit-stores, but it is not magic: it depends on the programmer’s attention to where all the data is going, and on that no-alias promise being kept.

See also the followup to this article here.

Making Division Faster

My friend Cort pointed me at this interesting Developing for Developers article on how to make integer division faster when the denominator is a fixed constant. As Cort says, if your processor lacks a dedicated integer divide op, and

…you want to divide an arbitrary X by an arbitrary Y, long division is pretty much the only way to go. However, if the divisor is known at compile time, there is very often a much more efficient way to compute the quotient. The simplest example, familiar to most students of computer science, is the good old x / 2n = x >> n .

For those who are curious, here’s an article that’s chock full of similar tricks for efficient division by other small integers (3, 5, 7, 9, 10, 11, etc.). Even the non-assembly programmers in the audience might benefit from these, as they’re not as commonly implemented by compiler authors.

These tricks are especially useful on the in-order PowerPC CPUs found in most modern game consoles, where the integer divide operation is microcoded: that is to say, even though it appears to be only a single opcode in the data stream, it actually causes the processor to execute a little subroutine stored in ROM. This means it stops the CPU pipeline dead in its tracks until the microcode is done executing (about twenty cycles): until the divide is finished, the processor can do nothing else.

In cases where a divide by a small constant can be replaced by five or six pipelined instructions, this can be a big speed benefit: first, because those five instructions will probably execute faster than the long-division stored in microcode; and second, because the compiler can interleave other useful work in between the dependent operations of the divide. (If that other work is on, say, the load or vector pipeline, it may even dual-dispatch and get you two operations per clock cycle.)

Derrick Coetzee hasn’t updated Developing for Developers recently, but it looks like there’s quite a few interesting articles in the archives worth checking out.

How To Go From PC To Crossplatform Development (Without Killing Your Studio)

From Half-Life 2 to The Orange Box 

The slides for my talk at the 2008 Game Developers’ Conference, “How To Go From PC To Console Development Without Killing Your Studio,” are now available online at the Valve website. 

The talk describes some of the many pitfalls that face developers who are used to developing their games exclusively for the home personal computer, but are now about to embark upon their first crossplatform title, shipping on both PC and console at the same time. These pitfalls have brought some studios to their doom, but fear not: I also provide techniques for how to deal with each of these issues, without having to rewrite most of your game from scratch.  Some of the techniques are production issues, like dealing with certification. Others address programming problems like improving framerate on the PowerPC — some assembly is required.

My speaker’s notes are embedded in the slides: hover over or click the small dialog bubble that Adobe Acrobat superimposes at the top left.

I’ve also started a Q&A page to address questions people have asked me about this talk, either in person at the conference, or through email since then. If you have questions of your own, please feel free to leave a comment.