mardi 20 juillet 2021

Can i sample sets of data within a dataframe without selecting the same set twice (without replacement)?

I am fairly new to python and i would like to sample sets of data in the following dataframe by their group, without selecting the same group twice. The code i have written does sample the sets of data correctly, however, it can select the same set twice.

please note: the following data is testing data and the actual data i am using the code on is much larger in size and therefore using indexes will not be possible.

DATA:

d={'group': ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'], 'number': [1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],'weather':['hot','hot','hot','cold','cold','cold','hot','hot','hot','cold','cold','cold','hot','hot','hot']}```
df = pd.DataFrame(data=d)
df
group   number  weather
A       1       hot
A       2       hot
A       3       hot
B       1       cold
B       2       cold
B       3       cold
C       1       hot
C       2       hot
C       3       hot
D       1       cold
D       2       cold
D       3       cold
E       1       hot
E       2       hot
E       3       hot

MY CODE

df_s=[]
for typ in df.group.sample(3,replace=False):
    df_s.append(df[df['group']==typ])
df_s=pd.concat(df_s)
df_s

OUTCOME

group   number  weather
E       1       hot
E       2       hot
E       3       hot
E       1       hot
E       2       hot
E       3       hot
D       1       cold
D       2       cold
D       3       cold

The outcome should give 3 different groups data however as can be seen there is only 2 (E & D) meaning the code can select the same group more than once.




Aucun commentaire:

Enregistrer un commentaire