mardi 24 juillet 2018

passing two input files to a function leads to bad results

First of all sorry for my english. I am not native english speaker. In general I am trying to retrieve indices of variable sample size for each input file. Afterwards I want to compare the values between the two input files. Thus, I have two input files as arguments. The first function selects randomly values in a certain range. The second function takes the output of the first function and calculates an Index for each sample.

For this aim, I firstly passed two input files as arguments to my script.

file_name1 = sys.argv[1]
file_name2 = sys.argv[2]

I picked the values of each input file and saved them to a list, as following:

data1 = [2, 6, 4, 8, 9, 8, 6, 6, 6,, 7, 7, 4, 2, 2, 2, ......] #sample size 835
data2 = [7, 7, 5, 3,4, 2, 8, 6, 5, 1, 1, 9, 7 ......] #sample size 2010

I wrote a function, which picks randomly numbers from the list within a certain range (0, 200, 400, n). After that I grouped same values and saved the value, and the number of values as key values in a dictionary.

def subsamples(list_object):

   val = np.array(list_object)
   n = len(val)
   count = 0
   while (count < n ):
       count += 200
     if (count > n):
        break
     subsample = np.random.choice(val, count, replace=False)
     unique, counts = np.unique(subsample, return_counts=True)
     group_cat = dict(zip(unique, counts))
     pois_group.append(group_cat)

     return pois_group

Additionally I have a second function that calculates an Index for each sample size.

def list_sample_size(object):
   data = subsamples(object)
   def p(n, N):
        if n is 0:
            #return 0
        else:
            return (float(n)/N) * ln(float(n)/N)
    for i in data:
        N = sum(i.values())
        #calculate the Index
        sh = -sum(p(n,N) for n in i.values() if n is not 0)
        index = round(math.exp(sh),2)
        print("Index: %f, sample size: %s" % (index, N))
        y.append(index)
        x.append(N)
    return x,y

x_1, y_1= list_sample_size(data1)
print "--------------------"
x_2, y_2 = list_sample_size(data2)

but I got following ouput when I call the function for each input file. It outputs the first Input correctly but the second output prints the first input1 and then its own, does anyone now what am I doing wrong?

Input1 has 835 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
---------------------------
Input1 has 2010 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
Index: 22.610000 , sample size: 200
Index: 21.110000 , sample size: 400
Index: 25.920000 , sample size: 600
Index: 27.670000 , sample size: 800
Index: 28.630000 , sample size: 1000
Index: 28.110000 , sample size: 1200
Index: 28.380000 , sample size: 1400
Index: 28.610000 , sample size: 1600
Index: 28.910000 , sample size: 1800
Index: 29.120000 , sample size: 2000

Does anyone know waht am I doing wrong?




Aucun commentaire:

Enregistrer un commentaire