I am using Line Profiler to identify the performance bottleneck in my code and realized that my code spends most of the time whenever I perform random sampling/number generation. Here are two parts which take %26.3 and %57.3 of the total time, respectively.
limit = 4950
change_num = random.randint(1, limit)
indexSet = random.sample(range(limit),change_num)
upper_bound = 1000
#iter, i, and k are indices coing from for loops
temp = my_array[iter][edge_list[i][k]]
my_array[iter][edge_list[i][k]] = temp + random.randint(1,upper_bound-temp)
The report indicates that my function takes Total time: 133.993 s
, which is unexpectedly long considering that I haven't even used large-scale data. I was wondering if there could be any fix for this issue.
Here is a MWE.
def foo():
iter =0
upper_bound = 1000
limit = 4950
first_set_size= 100
sec_set_size = 20
my_array = np.zeros((first_set_size*sec_set_size, limit))
for i in range(100):
for j in range(20):
if j <= sec_set_size/2:
change_num = random.randint(1, limit)
indexSet = random.sample(range(limit),change_num)
iter +=1
else:
for k in indexSet:
temp = my_array[iter][4]
my_array[iter][4] =temp + random.randint(1,upper_bound)
iter +=1
When I call the Line profiler
Timer unit: 1e-07 s
Total time: 34.288 s
File: <ipython-input-18-6ee4c9ba5a53>
Function: foo at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def foo():
2 1 19.0 19.0 0.0 iter =0
3 1 12.0 12.0 0.0 upper_bound = 1000
4 1 6.0 6.0 0.0 limit = 4950
5 1 5.0 5.0 0.0 first_set_size= 100
6 1 5.0 5.0 0.0 sec_set_size = 20
7
8 1 5437.0 5437.0 0.0 my_array = np.zeros((first_set_size*sec_set_size, limit))
9 101 662.0 6.6 0.0 for i in range(100):
10 2100 18483.0 8.8 0.0 for j in range(20):
11 2000 38713.0 19.4 0.0 if j <= sec_set_size/2:
12 1100 112967.0 102.7 0.0 change_num = random.randint(1, limit)
13 1100 120611194.0 109646.5 35.2 indexSet = random.sample(range(limit),change_num)
14 1100 22579.0 20.5 0.0 iter +=1
15 else:
16 2389545 15598263.0 6.5 4.5 for k in indexSet:
17 2388645 24359732.0 10.2 7.1 temp = my_array[iter][4]
18 2388645 182104698.0 76.2 53.1 my_array[iter][4] =temp + random.randint(1,upper_bound)
19 900 7512.0 8.3 0.0 iter +=1
Aucun commentaire:
Enregistrer un commentaire