mercredi 12 juin 2019

Downsample numpy array while preserving distribution

I'm trying to write a function that can randomly sample a numpy.ndarray that has floating point numbers while preserving the distribution of the numbers in the array. I have this function for now:

import random
from collections import Counter

def sample(A, N):
    population = np.zeros(sum(A))
    counter = 0
    for i, x in enumerate(A):
            for j in range(x):
                    population[counter] = i
                    counter += 1

    sampling = population[np.random.choice(0, len(population), N)]
    return np.histogram(sampling, bins = np.arange(len(A)+1))[0]

So I would like the function to work something like this(doesn't include accounting for distribution for this example):

a = np.array([1.94, 5.68, 2.77, 7.39, 2.51])
new_a = sample(a,3)

new_a
array([1.94, 2.77, 7.39])

However, when I apply the function to an array like this I'm getting:

TypeError                                 Traceback (most recent call last)
<ipython-input-74-07e3aa976da4> in <module>
----> 1 sample(a, 3)

<ipython-input-63-2d69398e2a22> in sample(A, N)
      3 
      4 def sample(A, N):
----> 5     population = np.zeros(sum(A))
      6     counter = 0
      7     for i, x in enumerate(A):

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Any help on modifying or create a function that would work for this would be really appreciated!




Aucun commentaire:

Enregistrer un commentaire