I have dataframe which looks like this:
revisionId itemId wikidataType
1 307190482 23 Q5
6 305019084 80 Q5
8 303692414 181 Q5
9 306600439 192 Q5
11 294597048 206 Q5
In complete dataframe, there are 100 such different values present in column wikidataType. Its a large dataframe, so I want to restrict it to 1000 records per wikidataType. Hence, I used following thing:
df = df[df.groupby('wikidataType')['wikidataType'].cumcount() < 1000]
This gives e like first 1000 records for each wikidataType. I want to choose these 1000 records randomly. So I tried using
df = df[random.sample(list(df.groupby('wikidataType')['wikidataType']), 1000)]
But gave an error as:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I even tried
df = df[df.groupby('wikidataType')['wikidataType'].cumcount().random() < 1000]
But that also didn't work. Anyone know how can I do this?
Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire