I am trying to randomly select 100 rows from my PySpark Dataframe. For that I would like to use the code as described in this post:
training_data= data.orderBy(F.rand()).limit(100)
However I get the error:
AttributeError: 'function' object has no attribute 'rand'
I imported rand() the following way:
from pyspark.sql.functions import rand as F
I tried to import rand the same way as decribed in the post, but I get the error:
ModuleNotFoundError: No module named 'org'
I also tried to use the function just as such:
training_data= data.orderBy(rand()).limit(100)
But then I get the following name error:
NameError: name 'rand' is not defined
Does anyone know how to fix it ? I am new to PySpark and I think I am missing something obvious here. Note that I am working on Databricks.
Thank you
Aucun commentaire:
Enregistrer un commentaire