I had some code that random-initialized some numpy arrays with:
rng = np.random.default_rng(seed=seed)
new_vectors = rng.uniform(-1.0, 1.0, target_shape).astype(np.float32) # [-1.0, 1.0)
new_vectors /= vector_size
And all was working well, all project tests passing.
Unfortunately, uniform()
returns np.float64
, though downstream steps only want np.float32
, and in some cases, this array is very large (think millions of 400-dimensional word-vectors). So the temporary np.float64
return-value momentarily uses 3X the RAM necessary.
Thus, I replaced the above with what definitionally should be equivalent:
rng = np.random.default_rng(seed=seed)
new_vectors = rng.random(target_shape, dtype=np.float32) # [0.0, 1.0)
new_vectors *= 2.0 # [0.0, 2.0)
new_vectors -= 1.0 # [-1.0, 1.0)
new_vectors /= vector_size
And after this change, all closely-related functional tests still pass, but a single distant, fringe test relying on far-downstream calculations from the vectors so-initialized has started failing. And failing in a very reliable way. It's a stochastic test, and passes with large margin-for-error in top case, but always fails in bottom case. So: something has changed, but in some very subtle way.
The superficial values of new_vectors
seem properly and similarly distributed in both cases. And again, all the "close-up" tests of functionality still pass.
So I'd love theories for what non-intuitive changes this 3-line change may have made that could show up far-downstream.
(I'm still trying to find a minimal test that detects whatever's different. If you'd enjoy doing a deep-dive into the affected project, seeing the exact close-up tests that succeed & one fringe test that fails, and commits with/without the tiny change, at https://github.com/RaRe-Technologies/gensim/pull/2944#issuecomment-704512389. But really, I'm just hoping a numpy expert might recognize some tiny corner-case where something non-intuitive happens, or offer some testable theories of same.)
Any ideas, proposed tests, or possible solutions?
Aucun commentaire:
Enregistrer un commentaire