mardi 29 août 2023

Pick n random records with same index from 2 different list column pyspark

Hi I have dataframe like this:

cc   doc_1      doc_2

4   [a, b,..]  [1, 6,..]
9   [t, s,..]  [4, 5,..]
4   [q, f,..]  [6, 7,..]

I want to pick n(1/2 of cc col) random records from both the col doc_1 doc_2 with same index value. I can use f.rand() to pick 1 record from column but I'm not sure how I'll pick multiple records with same index from different column as well

Expected Output has randomly picked value in column

doc_1  doc_2
cc  doc_1                 doc_2
4   [c, f]                [5, 3]
9   [s, g,..](4 records)  [6, 5,..](4 records)
4   [r, g]                [7, 9]



Aucun commentaire:

Enregistrer un commentaire