mardi 11 décembre 2018

cifar10 randomize train and test set

I would like to randomize the 60000 observations of the CIFAR-10 dataset present in the keras.datasets library. I know that it may not be so relevant in order to construct a neural network, but I'm a Python novice and I would like to learn data handling with this programming language.

So, to import the dataset, I run

from keras.datasets import cifar10
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

which automatically gives me a default subdivision of the train and test set; but I would like to mix them. The steps I have in mind are:

  • concatenate the train and test sets in a dataset X of shape (60000, 32, 32, 3) and a dataset Y of shape (60000, 1)
  • generate some random indeces to subset the X and Y dataset in, say, a training set of 50000 obs and a test set of 10000 obs
  • create new datasets (in a ndarray format) X_train, X_test, Y_train, Y_test with the same shapes as the original ones, so that I can start training my convolutional neural network

but maybe there's even a quicker approach to this.

I have tried different methods for a couple of hours but I didn't manage to achieve anything. Can somebody help me? I would really appreciate it, thanks.




Aucun commentaire:

Enregistrer un commentaire