mercredi 12 février 2020

Modifying probability of selection in PySpark sampling without replacement

I understand that to sample efficiently, Spark uses Bernouilli Sampling where it allocates each row in the sample the same probability of being included.

I would like to know if there is a way of sampling in PySpark modifying this probability of selection, to say, a mantissa, instead of each row having the same probability of being selected.




Aucun commentaire:

Enregistrer un commentaire