I was going through the BERT repo and found the following piece of code:
for _ in range(10):
random_document_index = rng.randint(0, len(all_documents) - 1)
if random_document_index != document_index:
break
The idea here being to generate a random integer on [0, len(all_documents)-1]
that cannot equal document_index
. Because len(all_documents)
is suppose to be a very large number, the first iteration is almost guaranteed to produce a valid randint, but just to be safe, they try it for 10 iterations. I can't help but think there has to be a better way to do this.
I found this answer which is easy enough to implement in python:
random_document_index = rng.randint(0, len(all_documents) - 2)
random_document_index += 1 if random_document_index >= document_index else 0
I was just wondering if there's a better way to achieve this in python using the in-built functions (or even with numpy
), or if this is the best you can do.
Aucun commentaire:
Enregistrer un commentaire