dimanche 7 octobre 2018

repeatedly resampling a given sample size and calculate some average value

I retrieved some category values from a json file and saved it into pois object.

def subsamples(object):
    data = []
    #n stand for the number of categories 
    n, pois = read_json(object)
    count = 0
    while (count < n):
       count += 10
       if (count > n):
           break 

       subsample = np.random.choice(pois, count, replace=True)
       unique, counts = np.unique(subsample, return_counts=True)
       group_cat = dict(zip(unique, counts))
       data.append(group_cat)
    return data

the output of pois object looks as follows:

pois = [u'208' u'469' u'566' u'570' u'156' u'395' u'566' u'570' u'426' u'564'
u'391' u'156' u'134' u'518' u'426' u'570' u'156' u'156' u'192' u'570'
u'426' u'469' u'133' u'134' u'192' u'564' u'280' u'208' u'395' u'564'
u'459' u'271']

The subsamples function selects randomly 10,20,30 values from the pois object and groups same values into a dictionary for each sample size as key/values pairs. I saved the dictonaries into a list called data. You can see the output below:

data = [{u'469': 1, u'156': 1, u'570': 2, u'566': 2, u'518': 1, u'395':     2, u'426': 1},
{u'156': 3, u'208': 1, u'570': 5, u'564': 1, u'566': 2, u'192': 2, u'271': 1, u'395': 2, u'134': 1, u'426': 2},
{u'459': 3, u'469': 2, u'156': 4, u'208': 1, u'570': 3, u'564': 5, u'566': 1, u'192': 2, u'133': 2, u'271': 1, u'395': 3, u'134': 1, u'280': 1, u'426': 1}] 

the number_of_single_doubletons function saves for each sample size (in my example 10, 20, 30) the number of categories which are represented by exactly one individual/value.

def number_of_single_doubletons(object):
    data = subsamples(object)
    f_1 = []
    f_2 = []

    for b in data:
        m = np.sum(b.values())  
        individuals = [xi for xi in b.values() if xi != 0]
        #number of categories which are represented by k individuals 
        counter = collections.Counter(individuals)
        cat = len(b.values())
        #k are individuals which are represented x times
        #v are number of categories which has k individuals
        for k,v in counter.items(): 
            if k == 1:
               f1 = v
               k1 = k
            if k == 2:
               f2 = v
               k2 = k 

        f_1.append(f1)
        f_2.append(f2)

return f_1, f_2
print number_of_single_doubletons(contents)

My aim is to select randomly 10,20,30 pois several times, say 100 times, and calculate the average number of singletons in a sample of size 10 then of size 20 then of size 30 etc. But I don't now how I can bring all functions together.




Aucun commentaire:

Enregistrer un commentaire