<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Some Assembly Required</title>
	<atom:link href="http://assemblyrequired.crashworks.org/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://assemblyrequired.crashworks.org</link>
	<description>Technical Notes On Game Development</description>
	<lastBuildDate>Wed, 21 Jul 2010 18:22:16 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on How Slow Are Virtual Functions Really? by Mark&#8217;s Testblog &#187; Blog Archive &#187; Data oriented design links - &#8230;for these are testing times, indeed.</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/19/how-slow-are-virtual-functions-really/comment-page-1/#comment-5224</link>
		<dc:creator>Mark&#8217;s Testblog &#187; Blog Archive &#187; Data oriented design links - &#8230;for these are testing times, indeed.</dc:creator>
		<pubDate>Wed, 21 Jul 2010 18:22:16 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=181#comment-5224</guid>
		<description>[...] catch-all declarations of &#8220;virtual functions are slow!&#8221;  For the general case, this can be proved as a nonsense as virtual function calls are blatantly not slow!  They&#8217;re very, very fast.  However, if [...]</description>
		<content:encoded><![CDATA[<p>[...] catch-all declarations of &#8220;virtual functions are slow!&#8221;  For the general case, this can be proved as a nonsense as virtual function calls are blatantly not slow!  They&#8217;re very, very fast.  However, if [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Load-hit-stores and the __restrict keyword by Elan</title>
		<link>http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/comment-page-1/#comment-5223</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Sun, 11 Jul 2010 22:43:55 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.wordpress.com/?p=8#comment-5223</guid>
		<description>&quot;x86&quot; is a very large family of processors with very different implementations. You can make a prediction about how an Intel Core Duo might behave, but it wouldn&#039;t necessarily be true about an i7 or an AMD chip, because their internal pipelines are dissimilar.  The upshot is you need to run your own timings on the processor you&#039;re targeting. 

One general prediction I can make is that every x86 processor I&#039;ve timed seems to have a huge latency when shuffling data between x87 float and general-purpose registers.</description>
		<content:encoded><![CDATA[<p>&#8220;x86&#8243; is a very large family of processors with very different implementations. You can make a prediction about how an Intel Core Duo might behave, but it wouldn&#8217;t necessarily be true about an i7 or an AMD chip, because their internal pipelines are dissimilar.  The upshot is you need to run your own timings on the processor you&#8217;re targeting. </p>
<p>One general prediction I can make is that every x86 processor I&#8217;ve timed seems to have a huge latency when shuffling data between x87 float and general-purpose registers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Load-hit-stores and the __restrict keyword by Foo Bar</title>
		<link>http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/comment-page-1/#comment-5222</link>
		<dc:creator>Foo Bar</dc:creator>
		<pubDate>Sun, 11 Jul 2010 21:59:36 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.wordpress.com/?p=8#comment-5222</guid>
		<description>Uhm, what about x86?</description>
		<content:encoded><![CDATA[<p>Uhm, what about x86?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why You Should Never Cast Floats To Ints by GameCoder.it &#8722; Il cast floatint</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/12/why-you-should-never-cast-floats-to-ints/comment-page-1/#comment-5220</link>
		<dc:creator>GameCoder.it &#8722; Il cast floatint</dc:creator>
		<pubDate>Thu, 01 Jul 2010 09:20:28 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=124#comment-5220</guid>
		<description>[...] why-you-should-never-cast-floats-to-ints [1] fast-floating-point-to-integer-conversions [...]</description>
		<content:encoded><![CDATA[<p>[...] why-you-should-never-cast-floats-to-ints [1] fast-floating-point-to-integer-conversions [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Sentences That Should Be Carved Into Foreheads by On _purecall and the Overhead(s) of Virtual Functions &#171; Ofek&#8217;s Visual C++ stuff</title>
		<link>http://assemblyrequired.crashworks.org/2008/12/22/ea-stl-prevents-memory-leaks/comment-page-1/#comment-5214</link>
		<dc:creator>On _purecall and the Overhead(s) of Virtual Functions &#171; Ofek&#8217;s Visual C++ stuff</dc:creator>
		<pubDate>Thu, 03 Jun 2010 20:05:27 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=92#comment-5214</guid>
		<description>[...] to optimize – you can get tangible results by eliminating virtual calls. It&#8217;s widely considered a good practice to use the added flexibility of virtual functions when you have a concrete reason and not just for [...]</description>
		<content:encoded><![CDATA[<p>[...] to optimize – you can get tangible results by eliminating virtual calls. It&#8217;s widely considered a good practice to use the added flexibility of virtual functions when you have a concrete reason and not just for [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How Slow Are Virtual Functions Really? by On _purecall and the Overhead(s) of Virtual Functions &#171; Ofek&#8217;s Visual C++ stuff</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/19/how-slow-are-virtual-functions-really/comment-page-1/#comment-5213</link>
		<dc:creator>On _purecall and the Overhead(s) of Virtual Functions &#171; Ofek&#8217;s Visual C++ stuff</dc:creator>
		<pubDate>Thu, 03 Jun 2010 20:03:40 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=181#comment-5213</guid>
		<description>[...] calls are known to be more costly than calls that are resolved at compile time.&#160; Elan Ruskin measured ~50% difference &#8211; I measured a bit less, but the difference is certainly there. For functions [...]</description>
		<content:encoded><![CDATA[<p>[...] calls are known to be more costly than calls that are resolved at compile time.&#160; Elan Ruskin measured ~50% difference &#8211; I measured a bit less, but the difference is certainly there. For functions [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A better Windows environment variable editor by Mikhael</title>
		<link>http://assemblyrequired.crashworks.org/2008/12/17/rapid-environment-editor-better-than-windows-dialog-for-editing-path/comment-page-1/#comment-5183</link>
		<dc:creator>Mikhael</dc:creator>
		<pubDate>Tue, 16 Mar 2010 12:06:38 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=89#comment-5183</guid>
		<description>Bloody hell. My devs have this awe inspiring lust -- nay, full-on demented obsession -- for environment variables. With new users coming in over the last couple of weeks *THIS* tool would have made my life a hell of a lot easier.

&lt;!--Obviously, I need to read you more, Elan. :D--&gt;</description>
		<content:encoded><![CDATA[<p>Bloody hell. My devs have this awe inspiring lust &#8212; nay, full-on demented obsession &#8212; for environment variables. With new users coming in over the last couple of weeks *THIS* tool would have made my life a hell of a lot easier.</p>
<p><!--Obviously, I need to read you more, Elan. :D--></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Down With fcmp: Conditional Moves For Branchless Math by &#187; Stupid C++ vs C# performance comparison Florent Clairambault</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/comment-page-1/#comment-5174</link>
		<dc:creator>&#187; Stupid C++ vs C# performance comparison Florent Clairambault</dc:creator>
		<pubDate>Thu, 04 Mar 2010 18:34:17 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=50#comment-5174</guid>
		<description>[...] Conditional moves [...]</description>
		<content:encoded><![CDATA[<p>[...] Conditional moves [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Square Roots in vivo: normalizing vectors by Soylent</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/comment-page-1/#comment-5171</link>
		<dc:creator>Soylent</dc:creator>
		<pubDate>Wed, 03 Mar 2010 08:24:27 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=329#comment-5171</guid>
		<description>I see. 

Assume the following simplified scenario:

You have a base class for objects that exist in the game world called CEntity that is maybe a couple of hundred bytes or so of data members that describe all its necessary state data, the position, a pair of angles or maybe a transformation matrix, the velocity, whether it has a collision mesh, whether it is static and so on.

From this class you derive a bunch of CRockets, CPlayers, CPhysicsObject etc. To improve locality of reference you store all your CEntities in a simple array rather than spreading them out in memory. You might just allocate a big enough array that it&#039;s not going to reach capacity under any but pathological cases or you might use some dynamic array that you reallocate to double the size if it&#039;s filled to capacity or reallocate to half the size if emptied to one quarter or something like that.

In order to give the processor the best possible chance to precache necessary data your update loop calls the Update() method on the CEntity at index zero, then index one and so on in the order they are stored in memory. Each Update() does one normalization among various other stuff.

Under these conditions, which are about as predictable as possible, will the processor look at the memory access pattern and correctly identify that there&#039;s a 224(or whatever) byte stride and that it is supposed to cache this data ahead of time? If the data is not precached the processor is going to sit idle and wait on RAM for at least several hundred clock cycles in which case it totally swamps the performance of your arithmetic to the point that it barely matters.</description>
		<content:encoded><![CDATA[<p>I see. </p>
<p>Assume the following simplified scenario:</p>
<p>You have a base class for objects that exist in the game world called CEntity that is maybe a couple of hundred bytes or so of data members that describe all its necessary state data, the position, a pair of angles or maybe a transformation matrix, the velocity, whether it has a collision mesh, whether it is static and so on.</p>
<p>From this class you derive a bunch of CRockets, CPlayers, CPhysicsObject etc. To improve locality of reference you store all your CEntities in a simple array rather than spreading them out in memory. You might just allocate a big enough array that it&#8217;s not going to reach capacity under any but pathological cases or you might use some dynamic array that you reallocate to double the size if it&#8217;s filled to capacity or reallocate to half the size if emptied to one quarter or something like that.</p>
<p>In order to give the processor the best possible chance to precache necessary data your update loop calls the Update() method on the CEntity at index zero, then index one and so on in the order they are stored in memory. Each Update() does one normalization among various other stuff.</p>
<p>Under these conditions, which are about as predictable as possible, will the processor look at the memory access pattern and correctly identify that there&#8217;s a 224(or whatever) byte stride and that it is supposed to cache this data ahead of time? If the data is not precached the processor is going to sit idle and wait on RAM for at least several hundred clock cycles in which case it totally swamps the performance of your arithmetic to the point that it barely matters.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Square Roots in vivo: normalizing vectors by Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/20/square-roots-in-vivo-normalizing-vectors/comment-page-1/#comment-5166</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Thu, 25 Feb 2010 01:22:32 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=329#comment-5166</guid>
		<description>Gregory: Yes, the magic number sqrt wasn&#039;t worth trying given its performance previously. It also induces a certain pathological behavior on the Xenon core that makes it a nonstarter.

Soylent: What you&#039;re suggesting indeed makes sense when one has long lists of packed vectors to transform, but that&#039;s not what I&#039;m timing here. I&#039;m looking at cases where you have to perform a single vector normalization as part of the logic in some larger function -- like performing an angle comparison when an AI selects which enemy to target next. &lt;a href=&quot;http://cellperformance.beyond3d.com/articles/2008/03/three-big-lies.html&quot; rel=&quot;nofollow&quot;&gt;Mike Acton might say that even game logic should be packed structures of arrays&lt;/a&gt; so that instead of having one entity per AI, you have a big structure of all the positions for all AIs, all velocities for all AIs, all nav state for all AIs, and so on; but games just aren&#039;t built like that yet. We&#039;re still in a world where each rocket is represented by an instance of a CRocket class, and each CRocket has its own Update() function, and each CRocket does its own logic one at a time.</description>
		<content:encoded><![CDATA[<p>Gregory: Yes, the magic number sqrt wasn&#8217;t worth trying given its performance previously. It also induces a certain pathological behavior on the Xenon core that makes it a nonstarter.</p>
<p>Soylent: What you&#8217;re suggesting indeed makes sense when one has long lists of packed vectors to transform, but that&#8217;s not what I&#8217;m timing here. I&#8217;m looking at cases where you have to perform a single vector normalization as part of the logic in some larger function &#8212; like performing an angle comparison when an AI selects which enemy to target next. <a href="http://cellperformance.beyond3d.com/articles/2008/03/three-big-lies.html" rel="nofollow">Mike Acton might say that even game logic should be packed structures of arrays</a> so that instead of having one entity per AI, you have a big structure of all the positions for all AIs, all velocities for all AIs, all nav state for all AIs, and so on; but games just aren&#8217;t built like that yet. We&#8217;re still in a world where each rocket is represented by an instance of a CRocket class, and each CRocket has its own Update() function, and each CRocket does its own logic one at a time.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
