<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Timing square root</title>
	<atom:link href="http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/feed/" rel="self" type="application/rss+xml" />
	<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/</link>
	<description>Technical Notes On Game Development</description>
	<lastBuildDate>Wed, 21 Jul 2010 18:22:16 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Gregory</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5165</link>
		<dc:creator>Gregory</dc:creator>
		<pubDate>Wed, 24 Feb 2010 13:38:22 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5165</guid>
		<description>Hello Elan,

Thank you for the answer.

Since I posted my comment, I came across http://msdn.microsoft.com/en-us/library/bb173458(VS.85).aspx so I think I&#039;m going to implement my high precision timer using QPC on Windows and gettimeofday on Linux and Mac.

Or the easy way seems to be Boost.PTime</description>
		<content:encoded><![CDATA[<p>Hello Elan,</p>
<p>Thank you for the answer.</p>
<p>Since I posted my comment, I came across <a href="http://msdn.microsoft.com/en-us/library/bb173458(VS.85).aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/bb173458(VS.85).aspx</a> so I think I&#8217;m going to implement my high precision timer using QPC on Windows and gettimeofday on Linux and Mac.</p>
<p>Or the easy way seems to be Boost.PTime</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5163</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Tue, 23 Feb 2010 04:28:21 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5163</guid>
		<description>We use rtdsc to implement our cycle counter, since it seems to have the best resolution of any of the timers available.

It&#039;s best to put target code in a loop for the same reason you would weigh rice by the thousand rather than one grain at a time on your kitchen scale: systematic and random error.

First, there is a certain &lt;a href=&quot;http://en.wikipedia.org/wiki/Systematic_error&quot; rel=&quot;nofollow&quot;&gt;systematic error&lt;/a&gt; in querying the cycle counter -- the StartCycleCounter() inline function call and the rtdsc op itself have some latency, and the timer is probably only accurate to within a couple of nanoseconds, measuring a single iteration of an 86ns operation would have a large relative error. On the other hand, a relative error of 10&lt;sup&gt;-8&lt;/sup&gt; in 10&lt;sup&gt;-3&lt;/sup&gt; seconds is much smaller, and so more accurate. 

Also, timings &lt;em&gt;in vivo&lt;/em&gt; can be &lt;a href=&quot;http://en.wikipedia.org/wiki/Random_error&quot; rel=&quot;nofollow&quot;&gt;messy&lt;/a&gt;: any single iteration of the loop might take a little longer than expected because of other threads, memory bus contention, clock variability, operating system intervention, even CPU temperature. Taking multiple measurements, or a single measurement of multiple iterations, improves statistical significance and narrows the &lt;a href=&quot;http://en.wikipedia.org/wiki/Confidence_interval&quot; rel=&quot;nofollow&quot;&gt;error bars&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>We use rtdsc to implement our cycle counter, since it seems to have the best resolution of any of the timers available.</p>
<p>It&#8217;s best to put target code in a loop for the same reason you would weigh rice by the thousand rather than one grain at a time on your kitchen scale: systematic and random error.</p>
<p>First, there is a certain <a href="http://en.wikipedia.org/wiki/Systematic_error" rel="nofollow">systematic error</a> in querying the cycle counter &#8212; the StartCycleCounter() inline function call and the rtdsc op itself have some latency, and the timer is probably only accurate to within a couple of nanoseconds, measuring a single iteration of an 86ns operation would have a large relative error. On the other hand, a relative error of 10<sup>-8</sup> in 10<sup>-3</sup> seconds is much smaller, and so more accurate. </p>
<p>Also, timings <em>in vivo</em> can be <a href="http://en.wikipedia.org/wiki/Random_error" rel="nofollow">messy</a>: any single iteration of the loop might take a little longer than expected because of other threads, memory bus contention, clock variability, operating system intervention, even CPU temperature. Taking multiple measurements, or a single measurement of multiple iterations, improves statistical significance and narrows the <a href="http://en.wikipedia.org/wiki/Confidence_interval" rel="nofollow">error bars</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gregory</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5155</link>
		<dc:creator>Gregory</dc:creator>
		<pubDate>Wed, 17 Feb 2010 19:14:26 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5155</guid>
		<description>I&#039;m curious about what&#039;s behind StartClockCycleCounter(); and StopClockCycleCounter();

What&#039;s the best way to benchmark a portion of code on PC? QueryPerformanceCounter (windows only), RTDSC? something else?

And is there a point putting the target code inside a loop?</description>
		<content:encoded><![CDATA[<p>I&#8217;m curious about what&#8217;s behind StartClockCycleCounter(); and StopClockCycleCounter();</p>
<p>What&#8217;s the best way to benchmark a portion of code on PC? QueryPerformanceCounter (windows only), RTDSC? something else?</p>
<p>And is there a point putting the target code inside a loop?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5081</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Thu, 29 Oct 2009 22:25:49 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5081</guid>
		<description>Elias: 4096 loops over 4096 floats with &lt;code&gt;x = powf(x, 0.5f)&lt;/code&gt; took 1449.538ms, or  86.4ns/float. This is 3.6 times worse than even the compiler&#039;s naive x87 &lt;code&gt;sqrt(x)&lt;/code&gt;, and twenty-seven times worse than rsqrtss with one step of Newton-Rhapson iteration (which is equally accurate). Taking an exponent is a function call, and a very slow function call at that.</description>
		<content:encoded><![CDATA[<p>Elias: 4096 loops over 4096 floats with <code>x = powf(x, 0.5f)</code> took 1449.538ms, or  86.4ns/float. This is 3.6 times worse than even the compiler&#8217;s naive x87 <code>sqrt(x)</code>, and twenty-seven times worse than rsqrtss with one step of Newton-Rhapson iteration (which is equally accurate). Taking an exponent is a function call, and a very slow function call at that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elias</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5079</link>
		<dc:creator>Elias</dc:creator>
		<pubDate>Thu, 29 Oct 2009 20:43:50 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5079</guid>
		<description>Here is one technique you apparently haven&#039;t tried: taking the number to the 0.5 power. How does that compare?</description>
		<content:encoded><![CDATA[<p>Here is one technique you apparently haven&#8217;t tried: taking the number to the 0.5 power. How does that compare?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5077</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Thu, 29 Oct 2009 18:48:56 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5077</guid>
		<description>Is someone shipping x64-only games?</description>
		<content:encoded><![CDATA[<p>Is someone shipping x64-only games?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fioj</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5076</link>
		<dc:creator>fioj</dc:creator>
		<pubDate>Thu, 29 Oct 2009 18:48:44 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5076</guid>
		<description>&quot;a cryptic obscure command line option (GCC)&quot;

What&#039;s obscure about that command line option?  It is well known to all GCC users, as well as the fact that GCC uses SSE floating point maths on amd64, as Alex mentioned.</description>
		<content:encoded><![CDATA[<p>&#8220;a cryptic obscure command line option (GCC)&#8221;</p>
<p>What&#8217;s obscure about that command line option?  It is well known to all GCC users, as well as the fact that GCC uses SSE floating point maths on amd64, as Alex mentioned.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5075</link>
		<dc:creator>Jason</dc:creator>
		<pubDate>Thu, 29 Oct 2009 18:27:14 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5075</guid>
		<description>“no compiler in the business defaults to using the new, faster SSE scalar opcodes”

AFAIK all compilers generating x64 code use SSE by default (since all 64 bit x86 CPUs have SSE, this makes sense)</description>
		<content:encoded><![CDATA[<p>“no compiler in the business defaults to using the new, faster SSE scalar opcodes”</p>
<p>AFAIK all compilers generating x64 code use SSE by default (since all 64 bit x86 CPUs have SSE, this makes sense)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: S.M</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5074</link>
		<dc:creator>S.M</dc:creator>
		<pubDate>Thu, 29 Oct 2009 18:18:17 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5074</guid>
		<description>Before you bash gcc, did you set the &quot;-mcpu= &quot; correctly for gcc ??</description>
		<content:encoded><![CDATA[<p>Before you bash gcc, did you set the &#8220;-mcpu= &#8221; correctly for gcc ??</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/comment-page-1/#comment-5073</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Thu, 29 Oct 2009 17:17:34 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=234#comment-5073</guid>
		<description>Mean absolute error, of course. Otherwise I&#039;d just be computing a random walk.</description>
		<content:encoded><![CDATA[<p>Mean absolute error, of course. Otherwise I&#8217;d just be computing a random walk.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
