jeudi 9 avril 2020

PySpark: How do I fix 'function' object has no attribute 'rand' error?

I am trying to randomly select 100 rows from my PySpark Dataframe. For that I would like to use the code as described in this post:

training_data= data.orderBy(F.rand()).limit(100)

However I get the error:

AttributeError: 'function' object has no attribute 'rand'

I imported rand() the following way:

from pyspark.sql.functions import rand as F

I tried to import rand the same way as decribed in the post, but I get the error:

ModuleNotFoundError: No module named 'org'

I also tried to use the function just as such:

training_data= data.orderBy(rand()).limit(100)

But then I get the following name error:

NameError: name 'rand' is not defined

Does anyone know how to fix it ? I am new to PySpark and I think I am missing something obvious here. Note that I am working on Databricks.

Thank you




Aucun commentaire:

Enregistrer un commentaire