jeudi 23 mars 2017

R: Randomly sampling (with replacement) each column of a data frame independently

I am trying to create a new data frame by randomly sampling an existing data frame. Specifically, I want create a data frame that is the same size as the original data frame, but each column of the new data frame is a random sample (with replacement) of the corresponding column in the original data frame. My first attempt looked like this:

# Create toy data set
data.set <- as.data.frame(matrix(1:50, ncol = 5)) 

# Change names
colnames(data.set) <- c("Stuff", "Things", "Foo", "Bar", "Guff")

# Try to create randomly sampled data frame
data.set %>% sample_n(replace = TRUE, size = nrow(data.set))

The problem here is that it just randomly samples rows, but not elements within each column individually. For example, here is some output.

    Stuff Things Foo Bar Guff
2       2     12  22  32   42
10     10     20  30  40   50
2.1     2     12  22  32   42
3       3     13  23  33   43
5       5     15  25  35   45
3.1     3     13  23  33   43
8       8     18  28  38   48
9       9     19  29  39   49
1       1     11  21  31   41
6       6     16  26  36   46 

Notice that the first and third rows are exactly the same, as are the fourth and sixth rows. What I would like is for each and every column to be randomly sampled independently. So, I tried this.

apply(data.set, MARGIN = 2, sample_n, replace = TRUE, size = nrow(data.set))

which produced the following error:

Error: Don't know how to sample from objects of class integer

although, I don't see what I did incorrectly. Can anyone offer a concise way of achieving my goal?




Aucun commentaire:

Enregistrer un commentaire