lundi 10 mai 2021

Shuffle columns in pairs

I have the following dataframe:

d = {'Name1':['jaap','piet','tim'],'Name2':['bas','max','piet'], 'Count1':[1,5,2],'Count2' :[2,6,8]}

data = pd.DataFrame(d)

  Name1 Name2  Count1  Count2
0  jaap   bas       1       2
1  piet   max       5       6
2   tim  piet       2       8

Now I want to randomly shuffle the columns in pairs, row by row. So Count1 belongs to Name1 and Count2 belongs to Name2. So in case the name in the column Name1 is shuffled with the name in Name2, then also the value in column Count1 is shuffled with the value in column Count2.

Example output would be:

  Name1 Name2  Count1  Count2
0  bas   jaap       2       1
1  piet  max        5       6
2  piet  tim        8       2

Hereby row 0 and 2 are shuffled.

Proceedings:

np.apply_along_axis(np.random.permutation, 1, data[['Name1','Name2']])

np.apply_along_axis(np.random.permutation, 1, data[['Count1','Count2']])

This however doesn't ensure the same shuffle is applied for Name1 and Name2 as for Count1 and Count2.

And:

data['random'] = np.random.choice(2,len(data))
data['random1'] = data['random'].replace([1,0],[0,1])

name1 = data['Name1'].copy()
name2 = data['Name2'].copy()
count1 = dft['Count1'].copy()
count2 = data['Count2'].copy()
data['Name1'] = name1 * data['random'] + name2 *data['random1']
data['Name2'] = name1 * data['random1'] + name2 * data['random']
data['Count1'] = odds1 * data['random'] + count2 *data['random1']
data['Count2'] = odds1 * data['random1'] + count2 * data['random']

The second approach works but I am looking for a better method that is easily applied to multiple column pairs.




Aucun commentaire:

Enregistrer un commentaire