vendredi 10 septembre 2021

I faced an issue regarding random value manipulation in pyspark

1.Create a spark session object 2.Create 10 random values as a column and name the column as rand1 3.Create another 10 random values as a column and name the column as rand2 4.Calculate the co-variance and correlation between these two columns 5.Create a new dataframe with header name as "stats" and "value" 6.Fill the new dataframe with the obtained value as "Co-varience" and "Correlation" 7.Save the resultant dataframe in a folder named Result.




Aucun commentaire:

Enregistrer un commentaire