In the realm of “things I really wish I had paid more attention to while we were crunching on Left 4 Dead“, I’ve recently been going through the talks from last June’s Gamefest. There is a lot of really great stuff in there, and I’ve only skimmed through a few of them so far, but among those few the standouts are:
- Xbox 360 Compiler and PgoLite Update: where Bruce Dawson describes the pure awesome that has been poured into the XDK’s CPU performance-analysis tools, such as the Continuous Capture mode for PIX that lets you determine what caused a perf spike after it happens. (Yes, a perf tool can be said to contain pure awesome, for definitions of awesome encompassing branch prediction.) He also presents a lock-free pipe implementation added to the XDK libraries for interthread communication.
- New CPU Features in the XDK: where Dawson continues his reign of awesome in a discussion of improvements to the 360 compiler. He covers faster asserts in array bounds-checking, an improvement to the __fsel intrinsic (the ?: operator isn’t so useless after all), caveats on certain VMX branching operations, and a better way to do SIMD constants. There’s also a bunch of improvements to the 360 compiler itself including better load-hit-store avoidance, better inline assembly, and a more useful /QXSTALLS pipeline simulator (which makes the compiler emit a nerd-readable assembly file for each .CPP with all your performance issues highlighted).
- At Least We Aren’t Doing That: Real Life Performance Pitfalls: Allan Murphy lays out the fifteen most common performance screwups that the Microsoft support team sees. Each of them has been something I’ve gnashed my teeth over in the past ten years myself, and more than one is something game programmers been dealing with since the days of the PS2 (so really we should know better by now).
I’m still working my way through these docs and planning to put together a précis on them for the devs at Valve, so hopefully I’ll be able to post that here along with some of my own thoughts on the topics as well. Half the things in “At Least We Aren’t Doing That” are certainly issues I’ve been known to get into long lunchtime rants over!
Some other interesting documents on my vast and teetering to-read pile are Gustavo Duarte’s articles on PC internals like x86 memory translation, Christer Ericson’s old GDC2003 talk on memory optimization, and Ulrich Drepper’s definitive but dense What Every Programmer Should Know About Memory. The last one in particular I’m reviewing as a background citation to some notes I’m writing myself on console data caches (because I’d rather have something online and free to link to rather than send everyone to the bookstore to for a copy of Computer Organization And Design): it has a lot of good stuff in it, though much of it is specific to Linux running on x86 CPUs and very different from the way game consoles work. In particular, passages such as this one may bring a chuckle to PS3/Cell developers:
A computer can have a small amount of high-speed SRAM in addition to the large amount of DRAM. One possible implementation would be to dedicate a certain area of the address space of the processor as containing the SRAM and the rest the DRAM. The task of the operating system would then be to optimally distribute data to make use of the SRAM…. While this is a possible implementation it is not viable … this approach would require each process to administer in software the allocation of this memory region…. So instead of putting the SRAM under the control of the OS or user, it becomes a resource which is transparently used and administered by the processors.
The transparent, all-knowing processor indeed.