I retrieved some category values from a json file and saved it into pois object.
def subsamples(object):
data = []
#n stand for the number of categories
n, pois = read_json(object)
count = 0
while (count < n):
count += 10
if (count > n):
break
subsample = np.random.choice(pois, count, replace=True)
unique, counts = np.unique(subsample, return_counts=True)
group_cat = dict(zip(unique, counts))
data.append(group_cat)
return data
the output of pois object looks as follows:
pois = [u'208' u'469' u'566' u'570' u'156' u'395' u'566' u'570' u'426' u'564'
u'391' u'156' u'134' u'518' u'426' u'570' u'156' u'156' u'192' u'570'
u'426' u'469' u'133' u'134' u'192' u'564' u'280' u'208' u'395' u'564'
u'459' u'271']
The subsamples function selects randomly 10,20,30 values from the pois object and groups same values into a dictionary for each sample size as key/values pairs. I saved the dictonaries into a list called data. You can see the output below:
data = [{u'469': 1, u'156': 1, u'570': 2, u'566': 2, u'518': 1, u'395': 2, u'426': 1},
{u'156': 3, u'208': 1, u'570': 5, u'564': 1, u'566': 2, u'192': 2, u'271': 1, u'395': 2, u'134': 1, u'426': 2},
{u'459': 3, u'469': 2, u'156': 4, u'208': 1, u'570': 3, u'564': 5, u'566': 1, u'192': 2, u'133': 2, u'271': 1, u'395': 3, u'134': 1, u'280': 1, u'426': 1}]
the number_of_single_doubletons function saves for each sample size (in my example 10, 20, 30) the number of categories which are represented by exactly one individual/value.
def number_of_single_doubletons(object):
data = subsamples(object)
f_1 = []
f_2 = []
for b in data:
m = np.sum(b.values())
individuals = [xi for xi in b.values() if xi != 0]
#number of categories which are represented by k individuals
counter = collections.Counter(individuals)
cat = len(b.values())
#k are individuals which are represented x times
#v are number of categories which has k individuals
for k,v in counter.items():
if k == 1:
f1 = v
k1 = k
if k == 2:
f2 = v
k2 = k
f_1.append(f1)
f_2.append(f2)
return f_1, f_2
print number_of_single_doubletons(contents)
My aim is to select randomly 10,20,30 pois several times, say 100 times, and calculate the average number of singletons in a sample of size 10 then of size 20 then of size 30 etc. But I don't now how I can bring all functions together.
Aucun commentaire:
Enregistrer un commentaire