mardi 27 juillet 2021

map range of IEEE 32bit float [1:2) to some arbitrary [a:b)

Back story : uniform PRNG with arbitrary endpoints

I've got a fast uniform pseudo random number generator that creates uniform float32 numbers in range [1:2) i.e. u : 1 <= u <= 2-eps. Unfortunately mapping the endpoints [1:2) to that of an arbitrary range [a:b) is non-trivial in floating point math. I'd like to exactly match the endpoints with a simple affine calculation.

Formally stated

I want to make an IEEE-754 32 bit floating point affine function f(x,a,b) for 1<=x<2 and arbitrary a,b that exactly maps 1 -> a and nextlower(2) -> nextlower(b)

where nextlower(q) is the next lower FP representable number (e.g. in C++ std::nextafter(float(q),float(q-1)))

What I've tried

The simple mapping f(x,a,b) = (x-1)*(b-a) + a always achieves the f(1) condition but sometimes fails the f(2) condition due to floating point rounding.

I've tried replacing the 1 with a free design parameter to cancel FP errors in the spirit of Kahan summation. i.e. with f(x,c0,c1,c2) = (x-c0)*c1 + c2 one mathematical solution is c0=1,c1=(b-a),c2=a (the simple mapping above), but the extra parameter lets me play around with constants c0,c1,c2 to match the endpoints. I'm not sure I understand the principles behind Kahan summation well enough to apply them to determine the parameters or even be confident a solution exists. It feels like I'm bumping around in the dark where others might've found the light already.

Aside: I'm fine assuming the following

  • a < b
  • both a and b are far from zero, i.e. OK to ignore subnormals
  • a and b are far enough apart (measuered in representable FP values) to mitigate non-uniform quantization and avoid degenerate cases



Aucun commentaire:

Enregistrer un commentaire