I have a table after randomization:
parameter1 | parameter2 | random |
---|---|---|
Q1 | 12 | A3 |
L3 | 15 | A4 |
K3 | 13 | A1 |
O1 | 14 | A2 |
N2 | 12 | A1 |
L33 | 19 | A3 |
O7 | 11 | A4 |
E3 | 16 | A2 |
I would like to calculate the count, mean, median, standard deviation and relative standard deviation based on parameter2 for n-interations, save all the iterations data, and pick the table with the most optimal relative standard deviation data. How do I do that?
What I tried so far?
def randomization(df):
# do randomization
data = [randomization(df)for i in range(10)] # assuming 10 iterations
def statistical_analysis(data):
stat_table = pd.DataFrame()
stat_table['N'] = data.groupby('random')['parameter2'].count()
stat_table['mean'] = data.groupby('random')['parameter2'].mean()
stat_table['median'] = data.groupby('random')['parameter2'].median()
stat_table['std'] = data.groupby('random')['parameter2'].std()
stat_table['relative std'] = stat_table['std']/stat_table['mean']
return stat_table
temp_list = []
for i in data:
temp = statistical_analysis(i)
temp_list.append(temp)
How do I select the optimal relative standard deviation table from the n-iterations automatically?
Aucun commentaire:
Enregistrer un commentaire