Precise and IEEE Strictness in HLSL

Last week on HLSL there was relatively little activity, given the Memorial Day weekend and the awesome weather we had in Seattle. However, there were a few changes related to precise that I thought were worth commenting on.

First, 'precise' doesn't do anything to the numerical precision of any one operation. Instead, it indicates that shader computations should behave as spelled by the program, without any optimizations that might reorder them, and that denorms, NaNs and INFs should be honored strictly. The precision at which each operation runs is the same with our without this modifier. Note that for symbolically equivalent computations, reordering of steps can produce different results, as is the case for ignoring NaN or INF behavior.

There are three things that are involved in figuring out how operations will behave in this regard:

1. The refactoring allowed flag in the global flags.

2. The precise modifier on an instruction, which in HLSL is applied to the contributing values to a declaration.

3. The IEEE strict flag /Gis as specified to fxc/dxc, which marks everything as precise.

The refactoring allowed flag has effectively been set for a long time now in recent shader models. In the current state of affairs, any operation can be refactored unless otherwise specified, and /Gis makes every computation in the shader precise, and so it boils down to where the precise modifier is applied (in DXIL, it's encoded somewhat differently, but the concept is the same).

Let's look at some disassembly!

[code lang="cpp"]
// $> fxc /T ps_5_0 samples.hlsl
float a, b, c;
float4 main() : SV_Target {
  float4 result = 0 * a;
  return result;
}

That results in the following:

 ps_5_0
dcl_globalFlags refactoringAllowed
dcl_output o0.xyzw
mov o0.xyzw, l(0,0,0,0)
ret

In this first example, we see the compiler optimize a multiplication by zero. Now let's recompile with /Gis.

[code lang="cpp"]
// $> fxc /T ps_5_0 /Gis samples.hlsl
// ...

 ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.xyzw
mul [precise] o0.xyzw, cb0[0].xxxx, l(0.000000, 0.000000, 0.000000, 0.000000)
ret

You can see that the shader will now multiply by a constant of zero, and that the instruction has the precise qualification so the drivers know to preserve it.

You can get the same bytecode by not using /Gis and marking the value as precise instead.

[code lang="cpp"]
// $> fxc /T ps_5_0 samples.hlsl
float a, b, c;
float4 main() : SV_Target {
precise float4 result = 0 * a;
return result;
}

A more obvious example is the following, where mul and add aren't fused when precise is set.

[code lang="cpp"]
// $> fxc /T ps_5_0 samples.hlsl
float4 main() : SV_Target {
  float f = a;
  f += b * c;
  precise float4 result = f.xxxx;
  return result;
}

 ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.xyzw
dcl_temps 1
mul [precise(x)] r0.x, cb0[0].z, cb0[0].y
add [precise] o0.xyzw, r0.xxxx, cb0[0].xxxx
ret

Note how it differs when the precise specifier is removed.

[code lang="cpp"]

// $> fxc /T ps_5_0 samples.hlsl
float4 main() : SV_Target {
  float f = a;
  f += b * c;
  float4 result = f.xxxx;
  return result;
}

 ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.xyzw
mad o0.xyzw, cb0[0].yyyy, cb0[0].zzzz, cb0[0].xxxx
ret

Enjoy!