I'm trying to do random sampling method on an unbalanced dataset to predict the appropriate 'category' for the given 'description'.
df_1['Category'].value_counts().loc[lambda x : x>1]
the categories are too many and uneven. I want to bring them all to an equal level so the machine learning model will not predict always let say 'iam~ki-000' as they are too many.
iam~ki-000 378 iam~ki-002 180 iam~ki-049 99 iam~ki-050 91 iam~ki-057 91 ... iam~ki-077 2
So far I can come up with only one solution and that is very ineffective:(
That is to do an individual calculation to multiply each category to oversample the dataset. There are almost 90 categories in total. Can someone help me out to write a function that aggregates all categories evenly?
ki-057 = dataframe['Category'] == iam~ki-000
df_try = df[ki-057]
df = df.append([df_try]*4,ignore_index=True)
Aucun commentaire:
Enregistrer un commentaire