random: How to replace null values with the random sampled values in pyspark?

jeudi 17 juin 2021

How to replace null values with the random sampled values in pyspark?

I have a spark data frame, I want to replace the null values with the random sampled value into the null values. I know how to do it in python when I am trying the same in pyspark getting an erorr. Since I am ne to pyspark I am wondering where am I going wrong?

df =
   Name   age
0  Jhon  20.0
1   NaN  30.0
2  jack   NaN
3  jhon  40.0
4  jack   NaN
5  prem  20.0

random_sample = df['Name'].dropna().sample(df['Name'].isnull().sum(),random_state =0)
print(random_sample)
random_sample.index=df[df['Name'].isnull()].index
df.loc[df['Name'].isnull(),'Name']=random_sample
df
3 jhon
    Name    age
0   Jhon    20.0
1   jhon    30.0
2   jack    NaN
3   jhon    40.0
4   jack    NaN
5   prem    20.0

Pyspark:-

rand = df.filter(df['Name']. isNull())
null=df.where(col("Name").isNull()).count()
rand.sample(null, random_sample = 1)


TypeError: sample() got an unexpected keyword argument 'random_state'

Is the function sampling is different in pyspark. How to fill the null values using random sampling method in pyspark?

random

jeudi 17 juin 2021

How to replace null values with the random sampled values in pyspark?

Aucun commentaire:

Enregistrer un commentaire