I would like to randomize the 60000 observations of the CIFAR-10 dataset present in the keras.datasets library. I know that it may not be so relevant in order to construct a neural network, but I'm a Python novice and I would like to learn data handling with this programming language.
So, to import the dataset, I run
from keras.datasets import cifar10
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()
which automatically gives me a default subdivision of the train and test set; but I would like to mix them. The steps I have in mind are:
- concatenate the train and test sets in a dataset X of shape (60000, 32, 32, 3) and a dataset Y of shape (60000, 1)
- generate some random indeces to subset the X and Y dataset in, say, a training set of 50000 obs and a test set of 10000 obs
- create new datasets (in a ndarray format) X_train, X_test, Y_train, Y_test with the same shapes as the original ones, so that I can start training my convolutional neural network
but maybe there's even a quicker approach to this.
I have tried different methods for a couple of hours but I didn't manage to achieve anything. Can somebody help me? I would really appreciate it, thanks.
Aucun commentaire:
Enregistrer un commentaire