mardi 18 février 2020

Pandas: fill missing value of a feature based on correlation with another feature

If I know the correlation between two features (say f1 and f2) of a dataframe, how can I fill missing values which gives the desired correlation between f1 and f2? The motivation is that f1 and f2 are from some measurements that contain many NaN and therefore I'd like to simulate the ideal case of having the full data.

import pandas as pd
import numpy as np

data = pd.DataFrame({'f1': [.01, 0.2, 0.4, NaN, 0.06]},{'f2': [.5, 0.022, NaN, NaN, NaN]})

The desired correlation should be corr(f1,f2)= 0.8 (e.g). How to replace NaN in both f1 and f2?

I found some relevant question here but not exactly the what I'd like. Thank you in advance!




Aucun commentaire:

Enregistrer un commentaire