Background
Here's d
, an R
dataframe:
d <- data.frame(ID = c("a","a","b","b","c","d","d"),
gender = c(0,0,0,0,0,1,1),
zip = c(48601,48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE)
It looks like this:
ID gender zip
a 0 48601
a 0 48601
b 0 NA
b 0 29910
c 0 54220
d 1 NA
d 1 44663
The Problem
I'd like to sample conditionally from d
, but I'm getting tripped up on the details.
Specifically, I'd like to sample ...
- All the rows of a certain number (2, in this case) of unique
d$ID
... - ... in rows for which
d$gender
is zero
Phrased differently, I'm saying to R: "sample 2 distinct ID
s who have gender
= 0".
What I want is a dataframe d2
that could look like this:
ID gender zip
a 0 48601
a 0 48601
b 0 NA
b 0 29910
Because it's sampling, of course, it could also look something like this:
ID gender zip
b 0 NA
b 0 29910
c 0 54220
The real dataset I'm working with has hundreds of thousands of unique ID
; I want to sample from them (instead of just subsetting all of them) because it'll take too much memory to use them all in my analysis and, for statistical reasons, I don't need all those ID
.
What I've tried
I've attempted things like this:
set.seed(123)
d2 <- sample(subset(unique(d$ID), d$gender==0), size = 2) %>% as.data.frame()
This runs, but the output is odd:
.
a
d
I've also seen several posts asking about conditional sampling (in fact I've made one myself before), but my parameters are slightly different and can't quite find what I need. I think I'm not too far from a solution, but it eludes me enough to ask for your help. Thanks.
Aucun commentaire:
Enregistrer un commentaire