I am new to python, and I need some help!
I have a data frame with 800 items (rows), and each row is located in a different area. The areas are: 'Allston,' Boston,' Brighton's,' Fenway,' Brookline, 'Cambridge,' Newton.'
Example Pandas Dataframe:
area price location Bedroom
1 boston 3074 1 Devonshire Place 1
2 boston 3310 72 Staniford Street 2
3 allston 1825 1156 Commonwealth Avenue 1
4 cambridge 3895 39 Clinton Street 3
5 fenway 2325 98 Queensberry Street 1
I try to divide the rows of this data frame into three groups RANDOMLY:
-
Group A has 60% of the rows from the data frame and could only have the following areas: 'Allston,'Boston,'Brighton's,'Fenway,'Brookline,' Cambridge,'newton'
-
Group B has 30% of the rows from the data frame: and could only have the following areas: 'Allston,'Boston,' Brighton's, 'Fenway.
-
Group C has 10% of the rows from the data frame and could only have the following areas: 'Boston,' Brighton,' Fenway
Every item/row can only be distributed once. It does not matter if some of the areas are not covered in one of the groups. If group C only has items that are in 'Boston, and/or Brighton, that would be okay. But group C cannot have an item that is in Newton, for instance.
I have tried dataframe.sample(), np.split(), np.random.choice(), however with all these techniques, rows get duplicated. I plan to write a loop so that the randomly selected rows will be different every time the groups get created.
Any idea on how to solve it?
Your help is appreciated!
Aucun commentaire:
Enregistrer un commentaire