random: Numpy randint not really random?

lundi 13 février 2017

Numpy randint not really random?

The situation: I have a big dataset with more than 18 million examples. I train several models and want to track the accuracy.

When forwarding all examples and computing accuracy this is approximately 83 percent. But this takes a long time.

So I try to sample a small subset of the whole dataset and compute accuracy for that. I expect to see approximately the same number (around 80 percent)

total = 4096
N = dataset.shape[0]
indices = np.random.randint(N-1, size=total)
batch = dataset[indices,:]

However, now the output looks like this, when running it for 10 'random' batches:

> satisfied 4096/4096
> 1.0 satisfied 4095/4096
> 0.999755859375 satisfied 4095/4096
> 0.999755859375 satisfied 4094/4096
> 0.99951171875 satisfied 4095/4096
> 0.999755859375 satisfied 4095/4096
> 0.999755859375 satisfied 4094/4096
> 0.99951171875 satisfied 4096/4096
> 1.0 satisfied 4095/4096
> 0.999755859375 satisfied 4096/4096
> 1.0

So here it performs always way too good and seems to only almost only sample from the 80 percent good examples. What can I do to make it really random, such that it gives a good view of the accuracy?

This makes also the training go wrong, because for the next training batch only the good examples are sampled.

random

lundi 13 février 2017

Numpy randint not really random?

Aucun commentaire:

Enregistrer un commentaire