rsqrt(y) give this iterate: x += 0.5*(x – y*x*x*x)

– No division in the second scheme, maybe this is why rsqrt() is faster both in software and hardware design.

1/y give this iterate: x += x*(1-y*x)

– Maybe this is why sqrt() is an unpipelined instruction (two interleaved iterative loop).

]]>sqrt(x) = x*rsqrt(x) is definitely fast but suffers from a NaN problem for x==0.

sqrt(0) = 0*rsqrt(0) = 0*(1/sqrt(0)) = 0*(1/0) = 0*INF = NaN

using the rcp instruction (_mm_rcp_ss) computing 1/x, the square root could be computed as

sqrt(x) = rcp( rsqrt(x) )

This doesn’t produce a NaN

sqrt(0) = rcp(rsqrt(0)) = 1/(1/sqrt(0)) = 1/INF = 0

It’s still very fast (not very accurate though!)