dimanche 25 octobre 2020

How do I randomly get a certain number of elements of a numpy array with at least one element from each class?

I have a dataset of 400 images, 10 images of 40 different people. There are 2 NumPy arrays, "olivetti_faces" contains the images (400x64x64), and "olivetti_faces_target" contains the classes of those images (400), one class for each person. So "olivetti_faces" is of the form: array([<img1>, <img2>, ..., <img400>]) where <img> is a 64x64 array of numbers, and "olivetti_faces_target" is of the form: array([0, 0, ..., 39]).

You can access the dataset here. You can load them after downloading as follows:

import numpy as np
data=np.load("olivetti_faces.npy")
target=np.load("olivetti_faces_target.npy")

I would like to randomly choose 100 of the images, with at least one image of each of the 40 people. How can I achieve this in NumPy?

So far I could randomly get 100 images using the following code:

n = 100 # number of images to retrieve
rand_indeces = np.random.choice(data.shape[0], n, replace=False)
data_random = data[rand_indeces]
target_random = target_random[rand_indeces]

But it does not guarantee that at least one image of each of the 40 classes is included in data_random.




Aucun commentaire:

Enregistrer un commentaire