jeudi 5 août 2021

In R, sample n rows from a df in which a certain column has non-null values (sample conditionally)

Background

Here's a toy df:

df <- data.frame(ID = c("a","b","c","d","e","f"), 
                gender = c("f","f","m","f","m","m"), 
                zip = c(48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE)

As you can see, I've got a couple of NA values in the zip column.

Problem

I'm trying to randomly sample 2 entire rows from df -- but I want them to be rows for which zip is not null.

What I've tried

This code gets me a basic (i.e. non-conditional) random sample:

df2 <- df[sample(nrow(df), 2), ]

But of course, that only gets me halfway to my goal -- a bunch of the time it's going to return a row with an NA value in zip. This code attempts to add the condition:

df2 <- df[sample(nrow(df$zip != NA), 2), ]

I think I'm close, but this yields an error invalid first argument.

Any ideas?




Aucun commentaire:

Enregistrer un commentaire