First of all sorry for my english. I am not native english speaker. In general I am trying to retrieve indices of variable sample size for each input file. Afterwards I want to compare the values between the two input files. Thus, I have two input files as arguments. The first function selects randomly values in a certain range. The second function takes the output of the first function and calculates an Index for each sample.
For this aim, I firstly passed two input files as arguments to my script.
file_name1 = sys.argv[1]
file_name2 = sys.argv[2]
I picked the values of each input file and saved them to a list, as following:
data1 = [2, 6, 4, 8, 9, 8, 6, 6, 6,, 7, 7, 4, 2, 2, 2, ......] #sample size 835
data2 = [7, 7, 5, 3,4, 2, 8, 6, 5, 1, 1, 9, 7 ......] #sample size 2010
I wrote a function, which picks randomly numbers from the list within a certain range (0, 200, 400, n). After that I grouped same values and saved the value, and the number of values as key values in a dictionary.
def subsamples(list_object):
val = np.array(list_object)
n = len(val)
count = 0
while (count < n ):
count += 200
if (count > n):
break
subsample = np.random.choice(val, count, replace=False)
unique, counts = np.unique(subsample, return_counts=True)
group_cat = dict(zip(unique, counts))
pois_group.append(group_cat)
return pois_group
Additionally I have a second function that calculates an Index for each sample size.
def list_sample_size(object):
data = subsamples(object)
def p(n, N):
if n is 0:
#return 0
else:
return (float(n)/N) * ln(float(n)/N)
for i in data:
N = sum(i.values())
#calculate the Index
sh = -sum(p(n,N) for n in i.values() if n is not 0)
index = round(math.exp(sh),2)
print("Index: %f, sample size: %s" % (index, N))
y.append(index)
x.append(N)
return x,y
x_1, y_1= list_sample_size(data1)
print "--------------------"
x_2, y_2 = list_sample_size(data2)
but I got following ouput when I call the function for each input file. It outputs the first Input correctly but the second output prints the first input1 and then its own, does anyone now what am I doing wrong?
Input1 has 835 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
---------------------------
Input1 has 2010 pois
Index: 37.720000 , sample size: 200
Index: 43.590000 , sample size: 400
Index: 46.010000 , sample size: 600
Index: 48.770000 , sample size: 800
Index: 22.610000 , sample size: 200
Index: 21.110000 , sample size: 400
Index: 25.920000 , sample size: 600
Index: 27.670000 , sample size: 800
Index: 28.630000 , sample size: 1000
Index: 28.110000 , sample size: 1200
Index: 28.380000 , sample size: 1400
Index: 28.610000 , sample size: 1600
Index: 28.910000 , sample size: 1800
Index: 29.120000 , sample size: 2000
Does anyone know waht am I doing wrong?
Aucun commentaire:
Enregistrer un commentaire