mercredi 20 janvier 2021

Fast randomly sample a varied number of ones by row from one hot encoded matrix

I have a one hot encoded M x N matrix, A, with the following properties:

  • 1 or more columns in each row can eq 1
  • Every column in the matrix will have exactly one cell with a value of one (all other cells will be zero)
  • M << N

I also an M x 1 array, B, that contains integers (i.e. number of random samples I want to select). Each cell of B has the following property:

  • B[i] <= np.sum(M[i])

I’m looking for the most efficient way to randomly sample a subset of the ones in each row of A. The number of samples returned for each row is given by the the integer values in the corresponding cells of B. The output will be an M x N matrix, let's call it C, where B == np.sum(C, axis=1)

Example
A = np.array([[0, 0, 1, 0, 0, 1, 0, 0], 
              [0, 1, 0, 0, 0, 0, 1, 1], 
              [1, 0, 0, 1, 1, 0, 0, 0]])
B = np.array([1, 3, 2])

A valid output of running this algorithm would be

array([[0, 0, 1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 1, 1],
       [1, 0, 0, 0, 1, 0, 0, 0]])

Another possible output would be

array([[0, 0, 0, 0, 0, 1, 0, 0], 
       [0, 1, 0, 0, 0, 0, 1, 1], 
       [0, 0, 0, 1, 1, 0, 0, 0]])

Looking for the ability to generate X random samples as fast as possible




Aucun commentaire:

Enregistrer un commentaire