mardi 15 septembre 2020

Adding a new column to data.frame whose values are random samples of one column and conditioned on another

I want to add a new column (category) whose values (a/b) are random samples (without replacement) of the id-column, but conditioned on the value (A/B) in the group-column. When trying to do so, however, the value in the id column changes--I don't understand why this is happening.

set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA

> table(df$group)
1 2 
6 4

df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F),]$category <- "a" 
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F),]$category <- "b" 

> df
  id group category
  1   A     1        a
  2   B     1     <NA>
  3   B     1        a
  4   D     2        b
  5   E     1     <NA>
  6   F     2     <NA>
  7   G     2     <NA>
  8   H     2        b
  9   C     1        a
  10  E     1        a

> df$id==LETTERS[1:10]
 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
# this should be all TRUE

(Please feel free to edit title and question, if it is not expressed clearly enough)




Aucun commentaire:

Enregistrer un commentaire