<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Some Assembly Required &#187; C++</title>
	<atom:link href="http://assemblyrequired.crashworks.org/tag/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://assemblyrequired.crashworks.org</link>
	<description>Technical Notes On Game Development</description>
	<lastBuildDate>Sun, 16 Oct 2011 04:04:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>More on __restrict</title>
		<link>http://assemblyrequired.crashworks.org/2008/09/06/more-on-__restrict/</link>
		<comments>http://assemblyrequired.crashworks.org/2008/09/06/more-on-__restrict/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 11:30:17 +0000</pubDate>
		<dc:creator>Elan</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[LHS]]></category>

		<guid isPermaLink="false">http://assemblyrequired.wordpress.com/?p=14</guid>
		<description><![CDATA[The __restrict keyword doesn't affect how the compiler itself aliases pointers: instead, it's about what the compiler can assume you, the progammer, may or may not have aliased.]]></description>
			<content:encoded><![CDATA[<p>An anonymous commenter asks:</p>
<blockquote><p>I thought that only compatable types could alias? So for example, an int array can’t alias a float array, and a CFoo* can’t alias it’s own member?</p></blockquote>
<p>Something to remember about the <tt>__restrict</tt> keyword is that it doesn&#8217;t affect how the compiler itself aliases pointers: instead, it&#8217;s about what the compiler can assume you, the progammer, may or may not have aliased.</p>
<p>If there is any chance two pointers alias each other, the compiler is forced to reload the contents of each from memory after <em>every store operation</em>.</p>
<p>In the case of class member functions, the compiler of course knows that different data members of <tt>this</tt> can&#8217;t alias one another. The issue with MemberFunc in CFoo below isn&#8217;t whether &amp;m_iBar and &amp;m_iBaz alias one another, but whether <tt>a == &amp;m_iBar</tt>.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> CFoo
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
	<span style="color: #0000ff;">int</span> MemberFunc<span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>a, <span style="color: #0000ff;">int</span> x <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">int</span> MemberFuncWithRestrict<span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span> __restrict a, <span style="color: #0000ff;">int</span> x <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">int</span> m_iBar<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">int</span> m_iBaz<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> CFoo<span style="color: #008080;">::</span><span style="color: #007788;">MemberFunc</span><span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>a, <span style="color: #0000ff;">int</span> x <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	m_iBar <span style="color: #000080;">=</span> x<span style="color: #008080;">;</span>
	<span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">7</span><span style="color: #008080;">;</span> <span style="color: #666666;">// if a == &amp;amp;m_iBar, then m_iBar might be 7 now.</span>
	        <span style="color: #666666;">// the processor must load it back from memory.</span>
	<span style="color: #0000ff;">return</span> m_iBar <span style="color: #000040;">+</span> <span style="color: #000040;">*</span>a<span style="color: #008080;">;</span> <span style="color: #666666;">// load-hit-store</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>MemberFunc compiles to something like:</p>
<pre>; int CFoo::MemberFunc( int *a, int b );
; by convention, CFoo * this is on register 3,
; int *a is on register 4, and
; x is on register 5
li r10,7       ; set r10 to 7
stw  r5,0(r3)  ; store r11 to this-&gt;m_iBar
stw  r10,0(r4) ; store r10 through pointer a
; now, because a might or might not point to m_iBar,
; we need to load back the contents of m_iBar just in case.
; the compiler *cannot* assume that a != &amp;m_iBar, and
; the load below will stall until the previous store to
; m_iBar is completely finished -- at least forty cycles!
lwz r11,0(r3)  ; load this-&gt;m_iBar from memory to r11
add r3,r11,r10 ; r3 = r11 + r10
blr            ; return r3</pre>
<p>Now, if we use the <tt>__restrict</tt> keyword to promise the compiler that  <tt>a</tt> doesn&#8217;t alias anything under <tt>this</tt> — in particular, that <tt>a != &amp;m_iBar</tt>, it can make some assumptions and avoid loading m_iBar back from memory after it has been stored:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> CFoo<span style="color: #008080;">::</span><span style="color: #007788;">MemberFuncWithRestrict</span><span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span> __restrict a, <span style="color: #0000ff;">int</span> x <span style="color: #008000;">&#41;</span> __restrict
<span style="color: #008000;">&#123;</span>
	m_iBar <span style="color: #000080;">=</span> x<span style="color: #008080;">;</span>
	<span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">7</span><span style="color: #008080;">;</span> <span style="color: #666666;">// __restrict THIS promises that a != &amp;amp;m_iBar</span>
	<span style="color: #0000ff;">return</span> m_iBar <span style="color: #000040;">+</span> <span style="color: #000040;">*</span>a<span style="color: #008080;">;</span> <span style="color: #666666;">// no load-hit-store</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>compiles to</p>
<pre>li r10,7        ; set r10 to 7
stw r5,0(r3)    ; store r5 to this-&gt;m_iBar
stw r10,0(r4)   ; store r10 through pointer a
;; because the compiler knows that the write to a
;; cannot affect the contents of m_iBar, there is
;; no need for it to load m_iBar back into memory;
;; the store above cannot have changed it
add r3, r5, r10 ; r3 = r5 + r10
blr             ; return r3</pre>
<p>This function runs much faster, avoiding the load-hit store on reading back m_iBar, but <em>it is only correct so long as the promise is true</em>. If you call the function like this:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">CFoo foo<span style="color: #008080;">;</span>
out <span style="color: #000080;">=</span> foo.<span style="color: #007788;">MemberFuncWithRestrict</span><span style="color: #008000;">&#40;</span> <span style="color: #000040;">&amp;</span>amp<span style="color: #008080;">;</span>foo.<span style="color: #007788;">m_iBar</span>, <span style="color: #0000dd;">5</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p><em>you will get incorrect results</em>, in this case a return value of 12 instead of 14.</p>
<p style="text-align:center;">&spades;</p>
<p>Even if two pointers are different types, the compiler still has no way of knowing whether they actually point to different locations in memory.</p>
<div style="padding-left:1em;"><strong>EDIT</strong>: the following section is only true if your compiler does not have <a href="http://xania.org/200712/cpp-strict-aliasing">strict aliasing</a> turned on. In GCC &ge;4 it is on by default unless you specify -f-no-strict-aliasing. In MSVC it is off by default and I could not find how to turn it on. <a href="http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html">Mike Acton discusses strict aliasing in great detail here</a>. Thanks, <a href="http://zeuxcg.blogspot.com/">Arseny</a>!</div>
<p>For example, consider the function</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">float</span> slow<span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>a, <span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span>b <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
	<span style="color: #000040;">*</span>b <span style="color: #000080;">=</span> <span style="color:#800080;">7.0</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">return</span> <span style="color: #000040;">*</span>a<span style="color: #000040;">+*</span>b<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>This function could be called from a different cpp in a way that aliased the pointers to the same location:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> foo<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span> <span style="color: #0000ff;">float</span> bar<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
bar<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> slow<span style="color: #008000;">&#40;</span> foo <span style="color: #000040;">+</span> <span style="color: #0000dd;">0</span>, bar <span style="color: #000040;">+</span> <span style="color: #0000dd;">0</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// pointers are not aliased</span>
foo<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> slow<span style="color: #008000;">&#40;</span> foo <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span>, <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#40;</span>foo <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// pointers are aliased</span></pre></div></div>

<p>Granted, this is a contrived example, but also one that&#8217;s completely legal. It underscores how the compiler can&#8217;t make any assumptions about pointer aliasing unless you hold its hand. Even though a and b are different types, the <em>programmer</em> can still deliberately alias them. The function <tt>slow</tt> above compiles to something like this on the PowerPC:</p>
<pre>; assembly for float slow( int *a, float *b )
; by convention, parameter a is stored on r3 and parameter b on r4.
lis r11,__real@40e00000 ; load address of constant 7.0f
li           r10,5     ; load 5 onto r10
stw          r10,0(r3) ; save r10 through pointer a
lfs          fr0,__real@40e00000(r11) ; load 7.0f onto fr0
stfs         fr0,0(r4) ; save 7.0f through pointer b
;; the following line causes the load-hit store:
;; because a and b might be aliased, the compiler must
;; now load back through pointer a in case the write to
;; b overwrote its contents!
;; the following instruction will cause a pipeline stall.
lwz          r9,0(r3)    ; load integer pointed to by a
std          r9,-10h(r1) ; convert it to a float and save to memory
;; here is another load-hit-store: most processors have no way
;; to move data between integer and floating point registers
;; other than by saving to memory and loading back again.
;; this load will stall the pipeline:
lfd          fr13,-10h(r1) ; load from that address again into fr13
fadds        fr1,fr13,fr0  ; add a and b and
blr                        ; return on fr1</pre>
<p>This simple function has <em>two</em> load hit store stalls, each of which stalls the processor for the entire length of the pipeline. By contrast, if we write our function like this:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">float</span> slowWithRestrict<span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span> __restrict a, <span style="color: #0000ff;">float</span> <span style="color: #000040;">*</span> __restrict b <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	<span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
	<span style="color: #000040;">*</span>b <span style="color: #000080;">=</span> <span style="color:#800080;">7.0</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">return</span> <span style="color: #000040;">*</span>a<span style="color: #000040;">+*</span>b<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>&#8230; then we will get incorrect results if <tt>a</tt> and <tt>b</tt> happen to point to the same address, but in every other case it will avoid stalls completely. This is because we have promised the compiler that the write to <tt>b</tt> cannot possibly overwrite the contents of <tt>*a</tt>, <em>something it could not know otherwise</em>. It compiles to something like this:</p>
<pre>; assembly for float slowWithRestrict( )
; pointer a is on r3. pointer b is on r4.
lis r10,__real@40e00000 ; load address of constant "7.0f" to r10
lis r11,__real@41400000 ; load address of constant "12.0f" to r11
li  r9,5                ; set r9 to "5"
stw r9,0(r3)            ; store r9 through pointer a
lfs fr0,__real@40e00000(r10) ; load constant "7.0" from memory onto fr0
lfs fr1,__real@41400000(r11) ; load constant "12.0" from memory onto fr1
stfs fr0,0(r4)          ; store fr0 through pointer b
blr                     ; return fr1</pre>
<p>No load-hit-stores at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://assemblyrequired.crashworks.org/2008/09/06/more-on-__restrict/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Load-hit-stores and the __restrict keyword</title>
		<link>http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/</link>
		<comments>http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/#comments</comments>
		<pubDate>Tue, 08 Jul 2008 17:00:24 +0000</pubDate>
		<dc:creator>Elan</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[LHS]]></category>

		<guid isPermaLink="false">http://assemblyrequired.wordpress.com/?p=8</guid>
		<description><![CDATA[A load-hit-store is a large stall that occurs when the processor writes data to an address x and then tries to load that data from x again too soon. __restrict is a C++ compiler directive that helps avoid load-hit-store stalls.]]></description>
			<content:encoded><![CDATA[<p>The load-hit-store is one of those quirky CPU implementation details that can cause significant performance problems in high-level code without it really being clear why, especially on in-order cores such as the PowerPCs inside the 360 and PS3. It&#8217;s especially insidious because it&#8217;s exactly the sort of thing we expect our compilers to transparently handle for us, and yet the compiler <em>can&#8217;t</em> handle correctly without explicit hinting from the programmer. Fortunately, once you know what&#8217;s going on underneath the hood, there is a C++ keyword that helps avoid this problem without having to resort to inline assembly and darker arts.</p>
<p>Basically, a load-hit-store is a large stall that occurs when the CPU writes data to an address <em>x</em> and then tries to load that data from <em>x</em> again too soon afterwards. The reason for this problem has to do with the deep <a href="http://en.wikipedia.org/wiki/Instruction_pipeline">instruction pipeline</a> of a modern processor, but the consequence is clear: the processor&#8217;s execution comes to a complete halt for between 40 and 80 cycles. The simplest way to think of it is is that the data has to &#8220;bounce off of the L1 cache&#8221;.  That is, when you write data from a CPU register to memory, the &#8221;store&#8221; operation has to complete, writing the data all the way out to main memory, before the data can be read back into a register again. Here&#8217;s a trivial example in C:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> CauseLHS<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>ptrA<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
   <span style="color: #0000ff;">int</span> a, b<span style="color: #008080;">;</span>
   <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>ptrB <span style="color: #000080;">=</span> ptrA<span style="color: #008080;">;</span> <span style="color: #666666;">// B and A point to the same address</span>
   <span style="color: #000040;">*</span>ptrA <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span> <span style="color: #666666;">// write data to address ptrA</span>
   b <span style="color: #000080;">=</span> <span style="color: #000040;">*</span>ptrB<span style="color: #008080;">;</span> <span style="color: #666666;">// read that data back in again (won't be</span>
                  <span style="color: #666666;">// available for 80 cycles)</span>
   a <span style="color: #000080;">=</span> b <span style="color: #000040;">+</span> <span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> <span style="color: #666666;">// stalls! the data isn't available yet.</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>This seems like the sort of thing the compiler should notice and fix by simply keeping the contents of *ptrA in a register, but in most real-world cases the compiler won&#8217;t be able to tell that ptrA and ptrB point to the same address. Thus, it&#8217;s obliged to read memory back from a pointer every time you dereference it, because any other pointer in the function might have aliased and modified that data. There is, however, a way to help out the compiler a little: <a href="http://msdn.microsoft.com/en-us/library/5ft82fed(VS.80).aspx">the __restrict keyword</a>.</p>
<p>__restrict is a compiler directive that helps avoid load-hit-store stalls.</p>
<p>__restrict on a pointer promises the compiler that it has no aliases: nothing else in the function points to that same data. Thus the compiler knows that if it writes data to a pointer, it doesn&#8217;t need to read it back into a register later on because nothing else could have written to that address. Without __restrict, the compiler is forced to read data from every pointer every time it is used, because another pointer may have aliased x.</p>
<p>For example, this code will run slowly:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> slow<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>a, <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>b<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
   <span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
   <span style="color: #000040;">*</span>b <span style="color: #000080;">=</span> <span style="color: #0000dd;">7</span><span style="color: #008080;">;</span>
   <span style="color: #0000ff;">return</span> <span style="color: #000040;">*</span>a <span style="color: #000040;">+</span> <span style="color: #000040;">*</span>b<span style="color: #008080;">;</span> <span style="color: #666666;">// LHS stall: the compiler doesn't</span>
                  <span style="color: #666666;">// know whether a == b, so it has to</span>
                  <span style="color: #666666;">//  reload both before the add</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Whereas this code will run quickly:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> fast<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span> __restrict a, <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span> __restrict b<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
   <span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
   <span style="color: #000040;">*</span>b <span style="color: #000080;">=</span> <span style="color: #0000dd;">7</span><span style="color: #008080;">;</span> <span style="color: #666666;">// RESTRICT promises that a != b</span>
   <span style="color: #0000ff;">return</span> <span style="color: #000040;">*</span>a <span style="color: #000040;">+</span> <span style="color: #000040;">*</span>b<span style="color: #008080;">;</span> <span style="color: #666666;">// no stall; the compiler hangs onto</span>
                         <span style="color: #666666;">// 5 and 7 in the registers.</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>It bears repeating that <strong>__restrict is a <em>promise</em> you make to your compiler</strong>. If you break your promise, you can get incorrect results. If pointers pA and pB are __restrict, and <code>pA == pB</code>, that will cause mysterious bugs.</p>
<p>For example, in the <code>fast()</code> function above, if <code>a</code> <em>does</em> equal <code>b</code>, then the correct return value would be 14. However, the compiler won&#8217;t know that <code>*b = 7</code> has changed the value of <code>*a</code>, and so you might actually get a return value of 12.</p>
<p>When working with C++ member functions, you can tell the compiler that the implicit this pointer is __restrict like so:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> CFoo
<span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">int</span> MemberFunc<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">int</span> m_iBar<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
<span style="color: #0000ff;">int</span> CFoo<span style="color: #008080;">::</span><span style="color: #007788;">MemberFunc</span><span style="color: #008000;">&#40;</span> <span style="color: #0000ff;">int</span> <span style="color: #000040;">*</span>a <span style="color: #008000;">&#41;</span> __restrict
<span style="color: #008000;">&#123;</span>
  m_iBar <span style="color: #000080;">=</span> <span style="color: #0000dd;">5</span><span style="color: #008080;">;</span>
  <span style="color: #000040;">*</span>a <span style="color: #000080;">=</span> <span style="color: #0000dd;">7</span><span style="color: #008080;">;</span> <span style="color: #666666;">// __restrict THIS promises that a != &amp;amp;m_iBar</span>
  <span style="color: #0000ff;">return</span> m_iBar <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span><span style="color: #008080;">;</span> <span style="color: #666666;">// no load-hit-store</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>There is unfortunately no way to mark a C++ reference as __restrict, so function parameters declared like <code>int foo(int &amp;a, int &amp;b)</code> cannot benefit from __restrict. In those cases, either copy <code>a</code> and <code>b</code> to local variables inside your function, <a href="http://www.gamasutra.com/view/feature/3687/sponsored_feature_common_.php?page=2">then write the final values back out again at the end</a>; or change your function signature to use pointers instead.</p>
<p>In summary, __restrict makes code involving pointers faster, so long as the pointers never alias one another. It is the usual first step when your profiler complains that a function has too many load-hit-stores, but it is not magic: it depends on the programmer&#8217;s attention to where all the data is going, and on that no-alias promise being kept.</p>
<p><b>See also</b> <a href="http://assemblyrequired.crashworks.org/2008/09/06/more-on-__restrict/">the followup to this article here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

