<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Down With fcmp: Conditional Moves For Branchless Math</title>
	<atom:link href="http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/feed/" rel="self" type="application/rss+xml" />
	<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/</link>
	<description>Technical Notes On Game Development</description>
	<lastBuildDate>Wed, 21 Jul 2010 18:22:16 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: &#187; Stupid C++ vs C# performance comparison Florent Clairambault</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/comment-page-1/#comment-5174</link>
		<dc:creator>&#187; Stupid C++ vs C# performance comparison Florent Clairambault</dc:creator>
		<pubDate>Thu, 04 Mar 2010 18:34:17 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=50#comment-5174</guid>
		<description>[...] Conditional moves [...]</description>
		<content:encoded><![CDATA[<p>[...] Conditional moves [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adisak</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/comment-page-1/#comment-5026</link>
		<dc:creator>Adisak</dc:creator>
		<pubDate>Wed, 21 Oct 2009 22:08:31 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=50#comment-5026</guid>
		<description>FWIW, if you code your isel to do a mask and mask complement, it will be faster on PowerPC since the compiler is smart enough to generate an &#039;andc&#039; opcode.  It&#039;s the same number of opcodes but there is one fewer result-to-input-register dependency in the opcodes.  The two mask operations can also be issued in parallel on a superscalar processor.  It can be 2-3 cycles faster if everything is lined up correctly.

return (x &amp; (~mask)) + (y &amp; mask);</description>
		<content:encoded><![CDATA[<p>FWIW, if you code your isel to do a mask and mask complement, it will be faster on PowerPC since the compiler is smart enough to generate an &#8216;andc&#8217; opcode.  It&#8217;s the same number of opcodes but there is one fewer result-to-input-register dependency in the opcodes.  The two mask operations can also be issued in parallel on a superscalar processor.  It can be 2-3 cycles faster if everything is lined up correctly.</p>
<p>return (x &amp; (~mask)) + (y &amp; mask);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elan</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/comment-page-1/#comment-4921</link>
		<dc:creator>Elan</dc:creator>
		<pubDate>Wed, 20 May 2009 00:13:51 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=50#comment-4921</guid>
		<description>In theory perhaps, but as game developers we have exactly four configurations to care about: MSVC for Windows; MSVC for XBox360; Sony-gcc for PS3; CodeWarrior for Wii. Each of these architectures sign-extends on right shift. In fact the whole discussion above is specific to the pipeline of the PPC inside the 360 and the PS3; on other platforms the situation might be entirely different. 

Optimization at this level really means taking advantage of the hardware and its particular implementation; if not, we wouldn&#039;t be able to use SSE or VMX or the SPU either because they&#039;re not perfectly portable.</description>
		<content:encoded><![CDATA[<p>In theory perhaps, but as game developers we have exactly four configurations to care about: MSVC for Windows; MSVC for XBox360; Sony-gcc for PS3; CodeWarrior for Wii. Each of these architectures sign-extends on right shift. In fact the whole discussion above is specific to the pipeline of the PPC inside the 360 and the PS3; on other platforms the situation might be entirely different. </p>
<p>Optimization at this level really means taking advantage of the hardware and its particular implementation; if not, we wouldn&#8217;t be able to use SSE or VMX or the SPU either because they&#8217;re not perfectly portable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/comment-page-1/#comment-4920</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Tue, 19 May 2009 22:45:06 +0000</pubDate>
		<guid isPermaLink="false">http://assemblyrequired.crashworks.org/?p=50#comment-4920</guid>
		<description>The C/C++ standards state the result of right shifts on negative signed values is undefined? I believe the correctness of your isel function is at the mercy of the compiler/hardware platform.</description>
		<content:encoded><![CDATA[<p>The C/C++ standards state the result of right shifts on negative signed values is undefined? I believe the correctness of your isel function is at the mercy of the compiler/hardware platform.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
