I'm using PySpark (a new thing for me). Now, suppose I Have the following table: +-------+-------+----------+ | Col1 | Col2 | Question | +-------+-------+----------+ | val11 | val12 | q1 | | val21 | val22 | q2 | | val31 | val32 | q3 | +-------+-------+----------+
and I would like to append to it a new column, random_qustion
which is in fact a permutation of the values in the Question
column, so the result might look like this: +-------+-------+----------+-----------------+ | Col1 | Col2 | Question | random_question | +-------+-------+----------+-----------------+ | val11 | val12 | q1 | q2 | | val21 | val22 | q2 | q3 | | val31 | val32 | q3 | q1 | +-------+-------+----------+-----------------+
I'v tried to do that as follow: python df.withColumn( 'random_question' ,df.orderBy(rand(seed=0))['question'] ).createOrReplaceTempView('with_random_questions')
The problem is that the above code does append the required column by WITHOUT permuting the values in it.
What am I doing wrong and how can I fix this?
Thank you,
Gilad
Aucun commentaire:
Enregistrer un commentaire