jeudi 14 avril 2022

Conditional sampling a dataframe

I am a newbie in R and want to perform a specific task but I am lost on the specificities of it and I would really appreciate if someone could help me through it.

What I would like to do is: I want to sample, say, 2 random couple on year 2000 including every unique male on that year (e.g. on year 2000, the first random couple sampled is M1-F2, and therefore the second couple must be M2 (the other unique male) - any female other then F2). Then, sample another 2 from 2001, but on 2002 I want to sample 3. When the loop finishes to sample across all years I want to store the mean of "Diff_ages" and "Diff_weights". Then repeat this loop 100 times to get a distribution and compare to my real data.

So the conditions are that on each year I want to sample different number of random couples (those numbers are stored in another df, and the male and female should be unique within each year but can vary across years).

I started the code like this: (I know that this only samples twice from year 2000 and can have the same male which I don't want).

couples <- data.frame (Male = c("M1","M1","M1","M2","M2","M2","M1","M1","M1","M1","M3","M3","M3","M3","M3","M3","M3","M4","M4","M4","M5","M5","M5"), Female = c("F1","F2","F3","F1","F2","F3","F2","F3","F4","F5","F2","F3","F4","F5","F3","F6","F7","F3","F6","F7","F3","F6","F7"), Year = c(2000,2000,2000,2000,2000,2000,2001,2001,2001,2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2002,2002,2002,2002), Diff_ages = c(0,2,1,3,2,1,1,0,1,1,0,5,4,3,0,1,2,2,3,2,1,0,2), Diff_weights = c(0.20,0.25,0.24,0.34 ,0.21 ,0.24,0.21,0.25,0.26,0.24,0.26,0.23,0.22,0.21,0.18,0.21,0.23,0.24,0.25,0.23,0.28,0.25,0.24))

Nb_true_couples <- data.frame(Year = c(2000,2001,2002), Nb_true_couples = c(2,3,2))

# Seed for reproducibility
set.seed(2022)

# Number iterations to build null distribution
  n_rep = 100 

# Number of samples to draw for each year
  sample_size = Nb_true_couples

# Vector to store means
  mean_diff_ages = rep(NA,n_rep)
  mean_diff_weights = rep(NA,n_rep)
  
for (i in 1:n_rep){
  random_couples_sample=couples[sample(which(couples$Year == "2000"),size=2,replace=F),]
  
  # Calculate and store the means
  mean_diff_ages[i] = mean(random_couples_sample$Diff_ages)
  mean_diff_weights[i] = mean(random_couples_sample$Diff_weights)
}

This is my first question on SO so please be patient with me if I am not doing this in clearer way. If someone can guide me through this I would really appreciate. Thanks in advance




Aucun commentaire:

Enregistrer un commentaire