jeudi 14 octobre 2021

How can I remap values in a pandas column using a random draw from a list?

Context

I have a dataframe where I need to remap a column to different values. For some values the mapping is ambiguous, the resulting value should be chosen randomly from a list everytime the value to be mapped is encountered.

For example, the values in the columns should be remapped in the following way:

  • 1 ➝ 'a'
  • 2 ➝ 'b' or 'c', chosen at random
  • 3 ➝ 'd'

If there are two rows with a 2, a random draw should be done each time to determine if the value should be mapped to b or to c.

Example data

Here is some example data:

import pandas as pd
df = pd.DataFrame({"col1": [1, 2, 3, 4, 5, 6, 7, 8], "col2": [2, 2, 2, 3, 1, 2, 2, 1]})

What I've looked into

I've tried using map and a random.choice call with a mapping dictionary (as described in this answer):

choice_list = ["b", "c"]
map_dict = {1: "a", 2: random.choice(choice_list), 3: "d"}
df["remap"] = df.col2.map(map_dict)

I found that in the remapping of value 2, always a single value was chosen from the choice_list for all rows, e.g. all b's:

   col1  col2 remap
0     1     2     b
1     2     2     b
2     3     2     b
3     4     3     d
4     5     1     a
5     6     2     b
6     7     2     b
7     8     1     a

Something similar happens when I use the replace method.

My expected outcome would be something like:

   col1  col2 remap
0     1     2     b
1     2     2     c
2     3     2     b
3     4     3     d
4     5     1     a
5     6     2     b
6     7     2     c
7     8     1     a



Aucun commentaire:

Enregistrer un commentaire