jeudi 4 octobre 2018

Scala Window Partitionby Update Random Record

I have following data:

group_id    id  name
----        --  ----
G1          1   apple
G1          2   orange
G1          3   apple
G1          4   banana
G1          5   apple
G2          6   orange
G2          7   apple
G2          8   apple
G3          7   banana
G3          8   orange

I want to update 1 random record of each group with 1, rest everything should be zero, like this:

group_id    id  name   random_pick
----        --  ----   -------------------
G1          1   apple       0
G1          2   orange      0
G1          3   apple       0
G1          4   banana      0
G1          5   apple       1
G2          6   orange      0
G2          7   apple       1
G2          8   apple       0
G3          7   banana      0
G3          8   orange      1

My thoughts:

  1. Add column with 0 as default value
  2. use Window.partitionBy("group_id"), then get count of each group, take random between 1 and the count, update the record to 1

But how in scala?! :(

Thanks in advance!




Aucun commentaire:

Enregistrer un commentaire