lundi 28 octobre 2019

Generating n new datasets by randomly samplng existing data, and then applying a function to new datasets

I'm very grateful in advance for any advice.

For a paper I'm writing I have subsetted a larger dataset into 3 groups, because I thought the strength of correlations between 2 variables in those groups would differ (they did). I want to see if subsetting my data into random groupings would also significantly affect the strength of correlations (i.e. whether what I'm seeing is just an effect of subsetting, or if those groupings are actually significant).

To this end, I am trying to generate n new data frames by randomly sampling 150 rows from an existing dataset, and then want to calculate correlation coefficients for two variables in those n new data frames, saving the correlation coefficient and significance in a new file.

But...HOW?

I can do it manually, e.g. with dplyr, something like

newdata <- sample_n(Random_sample_data, 150)
output <- cor.test(newdata$x, newdata$y, method="kendall")

I'd obviously like to not type this out 1000 or 100000 times, and have been trying things with loops and lapply (see below) but they've not worked (undoubtedly due to something really obvious that I'm missing!!!).

Here I have tried to assign each row to a different group, with 10 groups in total, and then to do correlations between x and y by those groups:

Random_sample_data<-select(Range_corrected, x, y)
cat <- sample(1:10, 1229, replace=TRUE)
Random_sample_cats<-cbind(Random_sample_data,cat)

correlation <- function(c) {
  c <- cor.test(x,y, method="kendall")
  return(c)
}
b<- daply(Random_sample_cats, .(cat), correlation)

Error message:

Error in cor.test(x, y, method = "kendall") : 
      object 'x' not found



Aucun commentaire:

Enregistrer un commentaire