mercredi 24 novembre 2021

How to create a sample of dataframe of a specific size in Pyspark?

I would like to create a sample from a dataframe of a specific size in pyspark. I am aware of proportional sampling but this doesn't always give the desired sample size. For example df.sample(0.1) gives a sample of 10%. Is there are way to define the size of a random sample? Ie code that gives a random sample of X amount?




Aucun commentaire:

Enregistrer un commentaire