samedi 21 octobre 2017

Picking random elements from groupby using pandas

I have dataframe which looks like this:

    revisionId  itemId wikidataType
1    307190482      23           Q5
6    305019084      80           Q5
8    303692414     181           Q5
9    306600439     192           Q5
11   294597048     206           Q5

In complete dataframe, there are 100 such different values present in column wikidataType. Its a large dataframe, so I want to restrict it to 1000 records per wikidataType. Hence, I used following thing:

df = df[df.groupby('wikidataType')['wikidataType'].cumcount() < 1000]

This gives e like first 1000 records for each wikidataType. I want to choose these 1000 records randomly. So I tried using

df = df[random.sample(list(df.groupby('wikidataType')['wikidataType']), 1000)]

But gave an error as:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I even tried

 df = df[df.groupby('wikidataType')['wikidataType'].cumcount().random() < 1000]

But that also didn't work. Anyone know how can I do this?

Thanks in advance.




Aucun commentaire:

Enregistrer un commentaire