lundi 22 mai 2017

Sample random rows in dataframe, where number of samples exceeds number of rows. Assign sampling probability

Consider the following example data, stored in a dataframe called df

df
x  y
2  4
1  5
0  8

As you can see, there are 3 rows to this dataframe. What I'd like to do is take 100 row samples, where each row has an equal probability of being selecting (in this case 1/3). My output, let's call it df_result would look something like this:

df_result
x  y
0  8
2  4
0  8
1  5
1  5
2  4

etc..... until 100 samples are taken.

I saw this previous stackoverflow post which detailed how to take random samples for a dataframe: df[sample(nrow(df), 3), ]

However, when I tried to sample 100 rows, this (predictably) did not work, and did not allow for the sampling probability to be assigned.

Any tips?

Thanks`




Aucun commentaire:

Enregistrer un commentaire