mardi 7 septembre 2021

how to increase the performance of random number generation

I am using Line Profiler to identify the performance bottleneck in my code and realized that my code spends most of the time whenever I perform random sampling/number generation. Here are two parts which take %26.3 and %57.3 of the total time, respectively.

limit = 4950 
change_num = random.randint(1, limit) 
indexSet = random.sample(range(limit),change_num) 
upper_bound = 1000
#iter, i, and k are indices coing from for loops
temp = my_array[iter][edge_list[i][k]]                    
my_array[iter][edge_list[i][k]] = temp + random.randint(1,upper_bound-temp) 

The report indicates that my function takes Total time: 133.993 s, which is unexpectedly long considering that I haven't even used large-scale data. I was wondering if there could be any fix for this issue.

Here is a MWE.

def foo():
    iter =0
    upper_bound = 1000
    limit = 4950 
    first_set_size= 100
    sec_set_size = 20

    my_array = np.zeros((first_set_size*sec_set_size, limit))
    for i in range(100):
        for j in range(20):
            if j <=  sec_set_size/2:
                change_num = random.randint(1, limit) 
                indexSet = random.sample(range(limit),change_num)
                iter +=1
            else:
                for k in indexSet:
                    temp = my_array[iter][4]   
                    my_array[iter][4] =temp + random.randint(1,upper_bound) 
                iter +=1

When I call the Line profiler

Timer unit: 1e-07 s

Total time: 34.288 s
File: <ipython-input-18-6ee4c9ba5a53>
Function: foo at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def foo():
     2         1         19.0     19.0      0.0      iter =0
     3         1         12.0     12.0      0.0      upper_bound = 1000
     4         1          6.0      6.0      0.0      limit = 4950 
     5         1          5.0      5.0      0.0      first_set_size= 100
     6         1          5.0      5.0      0.0      sec_set_size = 20
     7                                           
     8         1       5437.0   5437.0      0.0      my_array = np.zeros((first_set_size*sec_set_size, limit))
     9       101        662.0      6.6      0.0      for i in range(100):
    10      2100      18483.0      8.8      0.0          for j in range(20):
    11      2000      38713.0     19.4      0.0              if j <=  sec_set_size/2:
    12      1100     112967.0    102.7      0.0                  change_num = random.randint(1, limit) 
    13      1100  120611194.0 109646.5     35.2                  indexSet = random.sample(range(limit),change_num)
    14      1100      22579.0     20.5      0.0                  iter +=1
    15                                                       else:
    16   2389545   15598263.0      6.5      4.5                  for k in indexSet:
    17   2388645   24359732.0     10.2      7.1                      temp = my_array[iter][4]   
    18   2388645  182104698.0     76.2     53.1                      my_array[iter][4] =temp + random.randint(1,upper_bound) 
    19       900       7512.0      8.3      0.0                  iter +=1



Aucun commentaire:

Enregistrer un commentaire