I wrote this code which creates all combinations of n lists of m elements in python, samples a given number of unique combinations (max possible or 1000) and outputs it in excel. It basically works, but the problem is that when product(m_i) becomes very large, it is extremely slow.
A realistic use case could be that I have 32 lists with each 2-3 elements in each, from which I would need to sample 1000 unique combinations. That could be 10 billion combinations, but it is slow to create all these combinations, when I actually only need 1000 unique combinations.
I did consider just creating random samples and checking whether I already created this one, but that would become slow when numbers of samples approach number of possible permutations.
import pandas as pd
df = pd.read_excel('Variables.xlsx',sheet_name="Variables" ,index_col=0)
df_out = pd.DataFrame(columns=df.index)
df.shape[0]
def for_recursive(number_of_loops, range_list, execute_function, current_index=0, iter_list = []):
if iter_list == []:
iter_list = [0]*number_of_loops
if current_index == number_of_loops-1:
for iter_list[current_index] in range_list.iloc[current_index].dropna():
execute_function(iter_list)
else:
for iter_list[current_index] in range_list.iloc[current_index].dropna():
for_recursive(number_of_loops, iter_list = iter_list, range_list = range_list, current_index = current_index+1, execute_function = execute_function)
def do_whatever(index_list):
df_out.loc[len(df_out)] = index_list
for_recursive(range_list = df, execute_function = do_whatever , number_of_loops=len(df))
df_out = df_out.sample(n=min(len(df_out),1000))
with pd.ExcelWriter("Variables.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df_out.to_excel(writer, 'Simulations', index=False)
Aucun commentaire:
Enregistrer un commentaire