mardi 3 octobre 2017

How to replace every NaN in a column with different random values using pandas?

I have been playing with pandas lately and I now I tried to replace NaN value inside a dataframe with different random value of normal distribution.

Assuming I have this CSV file without header

      0
0    343
1    483
2    101
3    NaN
4    NaN
5    NaN

My expected result should be something like this

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber2
5     randomnumber3

But instead I got the following :

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber1
5     randomnumber1    # all NaN filled with same number

My code so far

import numpy as np
import pandas as pd

df = pd.read_csv("testfile.csv", header=None)
mu, sigma = df.mean(), df.std()
norm_dist = np.random.normal(mu, sigma, 1)
for i in norm_dist:
    print df.fillna(i)

I am thinking to get the number of NaN row from the dataframe, and replace the number 1 in np.random.normal(mu, sigma, 1) with the total of NaN row so each NaN might have different value.

But I want to ask if there is other simple method to do this?

Thank you for your help and suggestion.




Aucun commentaire:

Enregistrer un commentaire