samedi 17 avril 2021

Python: Randomly Split Dataframe into smaller chunks based on multiple (and similar) string conditions (rows can only used once)

I am new to python, and I need some help!

I have a data frame with 800 items (rows), and each row is located in a different area. The areas are: 'Allston,' Boston,' Brighton's,' Fenway,' Brookline, 'Cambridge,' Newton.'

Example Pandas Dataframe:

       area       price      location                   Bedroom
1      boston     3074        1 Devonshire Place        1
2      boston     3310       72 Staniford Street        2
3      allston    1825  1156 Commonwealth Avenue        1
4      cambridge  3895         39 Clinton Street        3
5      fenway     2325     98 Queensberry Street        1

I try to divide the rows of this data frame into three groups RANDOMLY:

  • Group A has 60% of the rows from the data frame and could only have the following areas: 'Allston,'Boston,'Brighton's,'Fenway,'Brookline,' Cambridge,'newton'

  • Group B has 30% of the rows from the data frame: and could only have the following areas: 'Allston,'Boston,' Brighton's, 'Fenway.

  • Group C has 10% of the rows from the data frame and could only have the following areas: 'Boston,' Brighton,' Fenway

Every item/row can only be distributed once. It does not matter if some of the areas are not covered in one of the groups. If group C only has items that are in 'Boston, and/or Brighton, that would be okay. But group C cannot have an item that is in Newton, for instance.

I have tried dataframe.sample(), np.split(), np.random.choice(), however with all these techniques, rows get duplicated. I plan to write a loop so that the randomly selected rows will be different every time the groups get created.

Any idea on how to solve it?

Your help is appreciated!




Aucun commentaire:

Enregistrer un commentaire