jeudi 21 mai 2020

Skewed random sample from Numpy random generator sample (numpy.random.Generator.choice)

I have made a piece of Python to generate mixture of normal distributions and I would want to sample from it. As the result is my probability density function I would want the sample to be representative of the original distribution. So I have developped the function to create the pdf:

def gaussian_pdf(amplitude, mean, std, sample_int):
    coeff = (amplitude / std) / np.sqrt(2 * np.pi)
    if len(amplitude > 1):
        # create mixture distribution
        # get distribution support
        absciss_array = np.linspace(np.min(mean) - 4 * std[np.argmin(mean)],
                                    np.max(mean) + 4 * std[np.argmax(mean)],
                                    sample_int)
        normal_array = np.zeros(len(absciss_array))
        for index in range(0, len(amplitude)):
            normal_array += coeff[index] * np.exp(-((absciss_array - mean[index]) / std[index]) ** 2)
    else:
        # create simple gaussian distribution
        absciss_array = np.linspace(mean - 4*std, mean + 4*std, sample_int)
        normal_array = coeff * np.exp(-((absciss_array - mean) / 2*std) ** 2)

    return np.ascontiguousarray(normal_array / np.sum(normal_array))

An I have tested a sampling with the main part of the script :

def main():
    amplitude = np.asarray([1, 2, 1])
    mean = np.asarray([0.5, 1, 2.5])
    std = np.asarray([0.1, 0.2, 0.3])
    no_sample = 10000

    # create mixture gaussian array
    gaussian_array = gaussian_pdf(amplitude, mean, std, no_sample)

    # pot data
    fig, ax = plt.subplots()
    absciss = np.linspace(np.min(gaussian_array), np.max(gaussian_array), no_sample)
    ax.plot(absciss, gaussian_array)

    # create random generator to sample from distribution
    rng = np.random.default_rng(424242)
    # sample from distribution
    sample = rng.choice(a=gaussian_array, size=100, replace=True, p=gaussian_array)
    # plot results
    ax.plot(sample, np.full_like(sample, -0.00001), '|k', markeredgewidth=1)

    plt.show()

    return None

I then have the result :

enter image description here

You can see with the dark lines the samples that have been extracted from the distribution. The problem is that, even if I specify to use the probability array in the numpy function, the sampling is skewed towards the end of the distribution. I have tried several times with other seeds but the result does not change... I expect to have more samples in the area where the probability density is greater...

Would someone please help me ? Am I missing something here ? Thanks in advance.




Aucun commentaire:

Enregistrer un commentaire