I would like to create a sample from a dataframe of a specific size in pyspark. I am aware of proportional sampling but this doesn't always give the desired sample size. For example df.sample(0.1)
gives a sample of 10%. Is there are way to define the size of a random sample? Ie code that gives a random sample of X amount?
Aucun commentaire:
Enregistrer un commentaire