I have been playing with pandas lately and I now I tried to replace NaN value inside a dataframe with different random value of normal distribution.
Assuming I have this CSV file without header
0
0 343
1 483
2 101
3 NaN
4 NaN
5 NaN
My expected result should be something like this
0
0 343
1 483
2 101
3 randomnumber1
4 randomnumber2
5 randomnumber3
But instead I got the following :
0
0 343
1 483
2 101
3 randomnumber1
4 randomnumber1
5 randomnumber1 # all NaN filled with same number
My code so far
import numpy as np
import pandas as pd
df = pd.read_csv("testfile.csv", header=None)
mu, sigma = df.mean(), df.std()
norm_dist = np.random.normal(mu, sigma, 1)
for i in norm_dist:
print df.fillna(i)
I am thinking to get the number of NaN row from the dataframe, and replace the number 1 in np.random.normal(mu, sigma, 1)
with the total of NaN row so each NaN might have different value.
But I want to ask if there is other simple method to do this?
Thank you for your help and suggestion.
Aucun commentaire:
Enregistrer un commentaire