I am fairly new to python and i would like to sample sets of data in the following dataframe by their group, without selecting the same group twice. The code i have written does sample the sets of data correctly, however, it can select the same set twice.
please note: the following data is testing data and the actual data i am using the code on is much larger in size and therefore using indexes will not be possible.
DATA:
d={'group': ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'], 'number': [1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],'weather':['hot','hot','hot','cold','cold','cold','hot','hot','hot','cold','cold','cold','hot','hot','hot']}```
df = pd.DataFrame(data=d)
df
group number weather
A 1 hot
A 2 hot
A 3 hot
B 1 cold
B 2 cold
B 3 cold
C 1 hot
C 2 hot
C 3 hot
D 1 cold
D 2 cold
D 3 cold
E 1 hot
E 2 hot
E 3 hot
MY CODE
df_s=[]
for typ in df.group.sample(3,replace=False):
df_s.append(df[df['group']==typ])
df_s=pd.concat(df_s)
df_s
OUTCOME
group number weather
E 1 hot
E 2 hot
E 3 hot
E 1 hot
E 2 hot
E 3 hot
D 1 cold
D 2 cold
D 3 cold
The outcome should give 3 different groups data however as can be seen there is only 2 (E & D) meaning the code can select the same group more than once.
Aucun commentaire:
Enregistrer un commentaire