I understand that to sample efficiently, Spark uses Bernouilli Sampling where it allocates each row in the sample the same probability of being included.
I would like to know if there is a way of sampling in PySpark modifying this probability of selection, to say, a mantissa, instead of each row having the same probability of being selected.
Aucun commentaire:
Enregistrer un commentaire