jeudi 25 juillet 2019

Pandas: select value from random column on each row

Suppose I have the following Pandas DataFrame:

df = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 5, 6],
    'c': [7, 8, 9]
})

    a   b   c
0   1   4   7
1   2   5   8
2   3   6   9


I want to generate a new pandas.Series so that the values of this series are selected, row by row, from a random column in the DataFrame. So, a possible output for that would be the series:

0    7
1    2
2    9
dtype: int64

(where in row 0 it randomly chose 'c', in row 1 it randomly chose 'a' and in row 2 it randomly chose 'c' again).

I know this can be done by iterating over the rows and using random.choice to choose each row, but iterating over the rows not only has bad performance but also is "unpandonic", so to speak. Also, df.sample(axis=1) would choose whole columns, so all of them would be chosen from the same column, which is not what I want. Is there a better way to do this with vectorized pandas methods?




Aucun commentaire:

Enregistrer un commentaire