Let's say the dataframe looks like this:
ls = [
['1', -9.78],
['2', 5.38],
['1', 8.86],
['2', -0.47],
['1', -0.19],
['1', 4.78],
['1', -9.23],
['2', -89.32]
]
test = spark.createDataFrame(pd.DataFrame(ls, columns=['col1', 'col2']))
test.show()
output:
+----+------+
|col1| col2|
+----+------+
| 1| -9.78|
| 2| 5.38|
| 1| 8.86|
| 2| -0.47|
| 1| -0.19|
| 1| 4.78|
| 1| -9.23|
| 2|-89.32|
+----+------+
I want to replace all row where the value in col1 == 1 with random pick from a list of items: ['a', 'b', 'c'] (with replacement).
For example, the result would look like this:
+----+------+
|col1| col2|
+----+------+
| a| -9.78|
| 2| 5.38|
| a| 8.86|
| 2| -0.47|
| c| -0.19|
| b| 4.78|
| a| -9.23|
| 2|-89.32|
+----+------+
How can I do this in pyspark?
Aucun commentaire:
Enregistrer un commentaire