jeudi 17 octobre 2019

Python: simulate time series satisfying a covariance matrix

How can I generate a df with random values having same covariance and mean as my input df?

I have an input dataframe: "my_input_df" with size= (240, 8) I want an output dataframe "my_output_df" with size (10,000 , 8), where each of the 10,000 rows is a random draw of the corresponding column such that cov and mean of "my_output_df" are the same as covariance and mean of "my_input_df".

That is:
my_input_df:

A       B      C       D
a1      b1     c1     d1
a2      b2     c2     d2
a3      b3     c3     d3
....
a240   b240    c240   d240

my_output_df:
A              B             C               D
rand_a1       rand_b1        rand_c1         rand_d1
rand_a2       rand_b2        rand_c2         rand_d2
rand_a3       rand_b3        rand_c3         rand_d3
... 
rand_a10000   rand_b10000    rand_c10000     rand_d10000

My_ouput_df must have random values for each column, satisfying:
my_output_df.cov() = my_input_df.cov()
and:
mean of my_output_df['A'] = mean of my_input_df['A']
mean of my_output_df['B'] = mean of my_input_df['B']
mean of my_output_df['C'] = mean of my_input_df['C']
.... etc

I suspect it is related to numpy.random.multivariate_normal, but it is not 
clear to me how to use it in dataframes.



Aucun commentaire:

Enregistrer un commentaire