mercredi 9 août 2023

Resample and select a given number of rows and calculate the mean, variance and confidence intervals?

I need to select a especific number of rows per resampling from 4 to 50 rows.

So, in the first run I need that the function select 4 random rows and calculate the mean, variance and confidence intervals for a given variable and this has to be done 1000 times. The second run I need the same thing, but instead of selecting 4 rows I need that selct 5 random rows 1000 times... until 54 rows.

The example:

x1 <- matrix(rnorm(200,mean=10), nrow= 100, ncol=2)
x2 <- c(replicate(5, "AA"),replicate(15, "BB"),replicate(15, "CC"),
        replicate(10, "DD"),replicate(10, "EE"),replicate(10, "FF"),
        replicate(10, "GG"),replicate(5, "HH"),replicate(5, "II"),
        replicate(15, "JJ"))
df <- data.frame(cbind(x1,x2))
colnames(df) <- c("variable1", "variable2","group")

I'm running these code below, manually, and it is seems that is right.

samples <- vector(mode="list", length=1000)
for (i in 1:1000){
  samples[[i]]=sample(as.numeric(df$variable1),size=4,replace=F)
}

# funtionc to calculate confidence interval
conf <- function(x) {
  error <- qnorm(0.975)*sd(x)/sqrt(length(x))
  return (data.frame("lower" = mean(x)-error,
                     "upper" = mean(x)+error))
}

# calculating mean, variance and confidence interval of the simulations

mean1 <- lapply(samples,mean) # calculating the mean of these 4 select rows per simulation
mean2 <- unlist(mean1) # unlist the list of the means values
mean_4rows <- mean(mean2) # the total mean of the randomly selected rows

var1 <- lapply(samples,var) # calculating the var of these 4 select rows per simulation
var2 <- unlist(var1)
var_4rows <- var(var2) # the total variance of the randomly selected rows

conf1 <- lapply(samples,conf) # calculating the var of these 4 select rows per simulation
conf2 <- unlist(conf1)
conf_4rows <- conf(conf2) # the total conf interval of the randomly selected rows

However, I have to automate this code, to be able to run it so that I can select from 4 to 50 random rows (1000 times each number of rows selection) and calculate the mean, variance and CIs of the simulations.

In the end I would like a object with the total means, variance and CIs for the number of selected rows generated by the simulations,with the rows refering to the selection of 4 rows, and 5 selected random rows.... etc until 50 rows:

#>    rows  meanSim varianceSim  lowerCISim   upperCISim
#>    4     1.84     0.410        0.105       0.300
#>    5     1.69     0.951        1.023       2.098  
#>    6     1.99     0.714        1.234       1.987
#>   ..... 
#>    50    2.58     0.242        2.098       2.999

Any idea on how I can make this automated and save these results?

Thank you!




Aucun commentaire:

Enregistrer un commentaire