I have a pandas DataFrame that looks:
df=pd.DataFrame({'user': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'i': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
'value': [0.64, 0.93, 0.53, 0, 0.74, 0.61, 0.41, 0.64, 0, 0.51, 0.78, 0.21]})
df
output:
user i value
0 1 1 0.64
1 2 1 0.93
2 3 1 0.53
3 1 2 0.00
4 2 2 0.74
5 3 2 0.61
6 1 3 0.41
7 2 3 0.64
8 3 3 0.00
9 1 4 0.51
10 2 4 0.78
11 3 4 0.21
For each value of (i), I want to select no value or only one random value of user (e.g. 1, 2, or 3) and check if its value is less than 0.5, then remove this user from dataframe (df).
For example: when i=1, let's say we randomly select user 2, then when we check its value 0.93 which is higher than 0.5, we keep user 2. So, df is the same.
user i value 0 1 1 0.64 1 2 1 0.93 2 3 1 0.53 3 1 2 0.00 4 2 2 0.74 5 3 2 0.61 6 1 3 0.41 7 2 3 0.64 8 3 3 0.00 9 1 4 0.51 10 2 4 0.78 11 3 4 0.21
when i= 2, let's say we randomly select user 1, then we know its value 0 is less than 0.5, so we remove it from df. Now, df contains only users 2 and 3.
user i value
1 2 1 0.93
2 3 1 0.53
4 2 2 0.74
5 3 2 0.61
7 2 3 0.64
8 3 3 0.00
10 2 4 0.78
11 3 4 0.21
When i=3, let's say we select no user, we keep users 2 and 3 in df. df keep the same values.
When i=4, let's say we randomly select user 3, and when we check its value 0.21, this value is lower than 0.5. So, in the end, df contains only user 2.
the final data frame:
user i value
1 2 1 0.93
4 2 2 0.74
7 2 3 0.64
10 2 4 0.78
Aucun commentaire:
Enregistrer un commentaire