How can I generate a df with random values having same covariance and mean as my input df?
I have an input dataframe: "my_input_df" with size= (240, 8) I want an output dataframe "my_output_df" with size (10,000 , 8), where each of the 10,000 rows is a random draw of the corresponding column such that cov and mean of "my_output_df" are the same as covariance and mean of "my_input_df".
That is:
my_input_df:
A B C D
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
....
a240 b240 c240 d240
my_output_df:
A B C D
rand_a1 rand_b1 rand_c1 rand_d1
rand_a2 rand_b2 rand_c2 rand_d2
rand_a3 rand_b3 rand_c3 rand_d3
...
rand_a10000 rand_b10000 rand_c10000 rand_d10000
My_ouput_df must have random values for each column, satisfying:
my_output_df.cov() = my_input_df.cov()
and:
mean of my_output_df['A'] = mean of my_input_df['A']
mean of my_output_df['B'] = mean of my_input_df['B']
mean of my_output_df['C'] = mean of my_input_df['C']
.... etc
I suspect it is related to numpy.random.multivariate_normal, but it is not
clear to me how to use it in dataframes.
Aucun commentaire:
Enregistrer un commentaire