Before going to the topic, let's first take a look on the python's default sampling method,
>>> import random
>>> c=[1,2,3,100,101,102,103,104,105,106,109,110,111,112,113,114]
>>> random.sample(c,1)
[103]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[2]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[2]
>>> random.sample(c,1)
[106]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[105]
>>> random.sample(c,1)
[110]
>>> random.sample(c,1)
[103]
>>> random.sample(c,1)
From the source code we can easily see what it actually does (below is the major portion of the code from the link),
selected = set()
selected_add = selected.add
for i in xrange(k):
j = _int(random() * n)
while j in selected:
j = _int(random() * n)
selected_add(j)
result[i] = population[j]
This sampling method has randomly chosen an index. In case of that, there is a chance that a very non-likely population member got selected. Say for example 1
in the above example.
But let's concentrate on a more realistic scenario. Let's assume you have 16 number which represents the frequency of some label from 0-15
.
freq array = [1, 2, 3, 100, 100, 100, 102, 102, 102, 100, 99, 50, 20, 1, 2, 3]
index of each position represents the label type. Like from the above list we can say that the total number of population on label 0 is 1, the total number of population on label 3 is 100, the total number of population of label 2 is 3 etc.
now if you want to select 5 members from the population, can we generate a new list which tells that I should take X
number of members from label Y
,
A sample: (maybe not the answer)
new_array = [0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
it means we should take 1 member from label 4-7.
So maybe the question is well ask in the following manner,
How to sample members from a population based on some Normal distribution. (For the time being, let's strict it to Normal Distribution)
I searched for functions in both python.random
and np.random
library but could not get anything useful. Your idea or suggestion is highly appreciated and if possible code also.
Aucun commentaire:
Enregistrer un commentaire