I have a one hot encoded M x N
matrix, A
, with the following properties:
- 1 or more columns in each row can eq 1
- Every column in the matrix will have exactly one cell with a value of one (all other cells will be zero)
M
<<N
I also an M x 1
array, B
, that contains integers (i.e. number of random samples I want to select). Each cell of B
has the following property:
B[i]
<=np.sum(M[i])
I’m looking for the most efficient way to randomly sample a subset of the ones in each row of A
. The number of samples returned for each row is given by the the integer values in the corresponding cells of B
. The output will be an M x N
matrix, let's call it C
, where B == np.sum(C, axis=1)
A = np.array([[0, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1],
[1, 0, 0, 1, 1, 0, 0, 0]])
B = np.array([1, 3, 2])
A valid output of running this algorithm would be
array([[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 1, 0, 0, 0]])
Another possible output would be
array([[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 0, 0, 0]])
Looking for the ability to generate X
random samples as fast as possible
Aucun commentaire:
Enregistrer un commentaire