random: Python: Randomly Split Dataframe into smaller chunks based on multiple (and similar) string conditions (rows can only used once)

samedi 17 avril 2021

Python: Randomly Split Dataframe into smaller chunks based on multiple (and similar) string conditions (rows can only used once)

I am new to python, and I need some help!

I have a data frame with 800 items (rows), and each row is located in a different area. The areas are: 'Allston,' Boston,' Brighton's,' Fenway,' Brookline, 'Cambridge,' Newton.'

Example Pandas Dataframe:

       area       price      location                   Bedroom
1      boston     3074        1 Devonshire Place        1
2      boston     3310       72 Staniford Street        2
3      allston    1825  1156 Commonwealth Avenue        1
4      cambridge  3895         39 Clinton Street        3
5      fenway     2325     98 Queensberry Street        1

I try to divide the rows of this data frame into three groups RANDOMLY:

Group A has 60% of the rows from the data frame and could only have the following areas: 'Allston,'Boston,'Brighton's,'Fenway,'Brookline,' Cambridge,'newton'
Group B has 30% of the rows from the data frame: and could only have the following areas: 'Allston,'Boston,' Brighton's, 'Fenway.
Group C has 10% of the rows from the data frame and could only have the following areas: 'Boston,' Brighton,' Fenway

Every item/row can only be distributed once. It does not matter if some of the areas are not covered in one of the groups. If group C only has items that are in 'Boston, and/or Brighton, that would be okay. But group C cannot have an item that is in Newton, for instance.

I have tried dataframe.sample(), np.split(), np.random.choice(), however with all these techniques, rows get duplicated. I plan to write a loop so that the randomly selected rows will be different every time the groups get created.

Any idea on how to solve it?

Your help is appreciated!

random

samedi 17 avril 2021

Python: Randomly Split Dataframe into smaller chunks based on multiple (and similar) string conditions (rows can only used once)

Aucun commentaire:

Enregistrer un commentaire