I have the following dataframe:
d = {'Name1':['jaap','piet','tim'],'Name2':['bas','max','piet'], 'Count1':[1,5,2],'Count2' :[2,6,8]}
data = pd.DataFrame(d)
Name1 Name2 Count1 Count2
0 jaap bas 1 2
1 piet max 5 6
2 tim piet 2 8
Now I want to randomly shuffle the columns in pairs, row by row. So Count1
belongs to Name1
and Count2
belongs to Name2
. So in case the name in the column Name1
is shuffled with the name in Name2
, then also the value in column Count1
is shuffled with the value in column Count2
.
Example output would be:
Name1 Name2 Count1 Count2
0 bas jaap 2 1
1 piet max 5 6
2 piet tim 8 2
Hereby row 0
and 2
are shuffled.
Proceedings:
np.apply_along_axis(np.random.permutation, 1, data[['Name1','Name2']])
np.apply_along_axis(np.random.permutation, 1, data[['Count1','Count2']])
This however doesn't ensure the same shuffle is applied for Name1 and Name2 as for Count1 and Count2.
And:
data['random'] = np.random.choice(2,len(data))
data['random1'] = data['random'].replace([1,0],[0,1])
name1 = data['Name1'].copy()
name2 = data['Name2'].copy()
count1 = dft['Count1'].copy()
count2 = data['Count2'].copy()
data['Name1'] = name1 * data['random'] + name2 *data['random1']
data['Name2'] = name1 * data['random1'] + name2 * data['random']
data['Count1'] = odds1 * data['random'] + count2 *data['random1']
data['Count2'] = odds1 * data['random1'] + count2 * data['random']
The second approach works but I am looking for a better method that is easily applied to multiple column pairs.
Aucun commentaire:
Enregistrer un commentaire