mercredi 22 août 2018

Pandas: anonymize client_id column without any possiblity to roll back [duplicate]

This question already has an answer here:

i have a dataframe with client_id column that i want to anonymize without any possibility to roll back.

i want to delete client_id but create the same value for each raw linked to the client (new column)

import pandas as pd

df = pd.DataFrame({
    'client_id':[111, 222, 111, 222, 333, 222, 111, 333], 
    'date':['2018-08-20', '2018-08-22', '2018-08-21', '2018-08-21', '2018-08-18', '2018-08-20', '2018-08-18', '2018-08-19'], 
    'action':['test1', 'test2', 'test3', 'test4', 'test5', 'test6', 'test7', 'test8']
    })

My dataframe:

client_id |    date   |  action |
    -----------------------------
    111 | '2018-08-20'| test1   | 
    222 | '2018-08-22'| test2   | 
    111 | '2018-08-21'| test3   |
    222 | '2018-08-21'| test4   |
    333 | '2018-08-18'| test5   |
    222 | '2018-08-20'| test6   |
    111 | '2018-08-18'| test7   | 
    333 | '2018-08-19'| test8   | 

The result expected:

 id |    date   |  action |
-----------------------------
1   | '2018-08-20'| test1   | 
2   | '2018-08-22'| test2   | 
1   | '2018-08-21'| test3   |
2   | '2018-08-21'| test4   |
3   | '2018-08-18'| test5   |
2   | '2018-08-20'| test6   |
1   | '2018-08-18'| test7   | 
3   | '2018-08-19'| test8   | 

i tried to use pandas.core.groupby.DataFrameGroupBy.rank but it did show the expected result

 df['id']= df.groupby("client_id")["date"].rank(ascending=True)




Aucun commentaire:

Enregistrer un commentaire