Archive for the ‘Uncategorized’ Category

Following my earlier article on timing various square-root functions on the x86, commenter LeeN suggested that it would be useful to also test their impact on a more realistic scenario than square-rooting long arrays of independent numbers. In real gameplay code the most common use for sqrts is in finding the length of a vector or normalizing it, like when you need to perform a distance check between two characters to determine whether they can see/shoot/etc each other. So, I wrote up a group of normalize functions, each using a different sqrt technique, and timed them.

The testbed was, as last time, an array of 2048 single-precision floating point numbers, this time interpreted as a packed list of 682 three-dimensional vectors. This number was chosen so that both it and the output array were sure to fit in the L1 cache; however, because three floats add up to twelve bytes, this means that three out of four vectors were not aligned to a 16-byte boundary, which is significant for the SIMD test case as I had to use the movups unaligned load op. Each timing case consisted of looping over this array of vectors 2048 times, normalizing each and writing the result to memory.

Each normalize function computed the length of the vector 1/√(x2 + y2 + z2), multiplied each component by the reciprocal, and then wrote it back through an output pointer. The main difference was in how the reciprocal square root was computed:

  • via the x87 FPU, by simply compiling 1.0f/sqrt( x*x + y*y + z*z )
  • via the SSE scalar unit, by compiling 1.0f/sqrt( x*x + y*y + z*z ) with the /arch:SSE2 option set; this causes the compiler to issue a sqrtss followed by an fdivie, it computes the square root and then divides one by it
  • via the SSE scalar unit, by using the estimated reciprocal square root intrinsic and then performing one step of Newton-Raphson iteration
  • via the SSE SIMD unit, working on the whole vector at once

In all cases the results were accurate to 22 bits of precision. The results for 1,396,736 vector normalizations were:

Method Total time Time per vector
Compiler 1.0/sqrt(x)
x87 FPU FSQRT
52.469ms 37.6ns
Compiler 1.0/sqrt(x)
SSE scalar sqrtss
27.233ms 19.5ns
SSE scalar ops
rsqrtss with one NR step
21.631ms 15.5ns
SSE SIMD ops
rsqrtss with one NR step
20.034ms 14.3ns

Two things jump out here. First, even when the square root op is surrounded by lots of other math — multiplies, adds, loads, stores — optimizations such as this can make a huge difference. It’s not just the cost of the sqrt itself, but also that it’s unpipelined, which means it ties up an execution unit and prevents any other work from being done until it’s entirely completed.

Second, in this case, SIMD is only a very modest benefit. That’s because the input vectors are unaligned, and the two key steps of this operation, the dot product and the square root, are scalar in nature. (This is what’s meant by “horizontal” SIMD computation — operations between the components of one vector, rather than between the corresponding words of two vectors. Given a vector V ∋ <x,y,z>, the sum x + y + z is horizontal, but with two vectors V1 and V2, V3 = <x1+x2, y1+y2, z1+z2> is vertical.) So it really doesn’t play to SIMD’s strengths at all.

On the other hand, if I were to normalize four vectors at a time, so that four dot products and four rsqrts could be performed in parallel in the four words of a vector register, then the speed advantage of SIMD would be much greater. But, again, my goal wasn’t to test performance in tight loops over packed data — it was to figure out the best way to do something like an angle check in the middle of a character’s AI, where you usually deal with one vector at a time.

Source code for my testing functions below the jump. Note that each function writes the normalized vector through an out pointer, but also returns the original vector’s length. The hand-written intrinsic versions probably aren’t totally optimal, but they ought to be good enough to make the point.
Continue reading ‘Square Roots in vivo: normalizing vectors’ »

This is the week when all of Santa’s Elves who work in game-making-places all over the world take a hopefully well deserved break, so no game development post today. Instead, here’s some nice audio books and podcasts that I’ve enjoyed on my commutes in the past year. Happy Holidays!

  • Escape Pod’s short-fiction podcast has not one but two Christmas specials.
  • X Minus One, The Lifeboat Mutiny. 50s-era science fiction radio play: two prospectors find themselves in possession of a lifeboat that’s more than it seems. Worth listening to for the Drone National Anthem alone!
  • G.K. Chesterton’s  The Wisdom Of Father Brown is available as a public-domain audio book from LibriVox, read by Martin Clifton. I found this collection of short stories about the eponymous priest-turned-private-detective to be fine treadmill listening.
  • WNYC’s science documentary special Radio Lab is always a singular pleasure.
  • And Subterranean Press was kind enough to produce a free audio book of Charlie Stross’ Trunk and Disorderly, a humorous adventure in a style that defies description (the closest anyone’s come is “Heinlein meets Wodehouse in space with a woolly mammoth”).

I’ll be back next week with some thoughts on replacing floating-point conditionals with branchless math.

I just received a message through Facebook that points at how malware authors can be really, really specific with their attack vectors, and how they exploit social networks to make their messages appear to come from a legitimate, trusted source.

We’ve all received links to malware websites in email, of course, and usually we can reject them out of hand because the sender is obviously fake. If the sender’s name is someone you actually know and trust, you’re more likely to open the email, but knowing how easily email headers can be forged you might still be a little suspicious. But Facebook messages require someone to have logged in and authenticated — if I get a message from my friend Tim, it means that Tim has actually gone to facebook.com and opened up his little message thingy and typed something out to me.

Or, at least, Tim’s browser has.

I got this Wall post on Facebook earlier today from an old friend I’ve had in my list since forever.

I got a free iPhone! Go to (URL) to get yours!

At first I was a little confused, because I couldn’t see how a phisher could forge a Facebook message, but it was still suspicious enough that I accessed the linked website with kid gloves — and indeed, it’s a webpage that hosts the Javascript malware enclosed below the jump (posted as an image for your safety). I didn’t bother to decode exactly what it does, but it clearly decrypts some kind of encrypted exploit into the browser and executes it.

There’s a couple of interesting things about this: first, the incredible specificity of this virus to maximize the chances that it would appear to come to me from a trusted source. Unless the virus author has hacked into Facebook’s back end, the only way this could work is if the virus snagged my friend Tim’s Facebook password, then logged into his account on its own, accessed his friends list, then mechanically transmitted that message to all those friends. This is a lot of specific code to write, stuff that reads out of Facebook’s HTML and knows how to find the friends list there, and then and knows how to navigate the website to send out messages. It’s a lot of work to take advantage of one social networking site, which goes to show how valuable it is to take advantage of our own assumptions of trust in our friends (or how little virus writers value their time).

Second: even though the actual malware is hosted on the Google-owned Blogspot service, Google’s own malware-detecting tools don’t list it as malicious. In fact, when I tested the site with Google Safe Browsing, it told me “This site is not listed as suspicious” and “Google has not visited this site within the past 90 days”, which is to say that Google can’t even patrol its own webhosting service for exploits. Maybe that explains why Google hosts 2% of the world’s malware.

Here’s the source code for the exploiting Blogspot page that the URL goes to, in case you feel like figuring out what it does. I’m posting it as an image to prevent any chance of that Javascript from actually being parsed on your browser.

Continue reading ‘A really specific Facebook phishing virus?’ »