mercredi 13 avril 2022

Randomly selecting a subset of rows from a pandas dataframe based on existing column values

I have a dataframe that contains multiple columns and would like to randomly select an equal number of rows based on the values of specific column. I thought of using a df.groupby['...'] but it did not work. Here is an example:

dataframe

Assume that I would like to randomly select one row per GroupID, how can I accomplish this? For example, let's say I select one random row per GroupID, the result would yield the following:

dataframe random selection

such that it outputs a single row based on the values in GroupID. Assume, for example, that rows are sorted by GroupID (from asc to desc), then select an "n" number of rows from ones that pertain to GroupID 1, 2, 3, and so on. Any information would definitely be helpful.




Aucun commentaire:

Enregistrer un commentaire