jeudi 1 juillet 2021

R: "case_when" statement bins all values into a single bin

I am using the R programming language.

I have some data and I am trying to "bin" the data into 3 bins that look like this:

enter image description here

In a previous post ("binning" rows into ranges (dplyr/R)), I learned how to make these bins using the "case_when" statement:

library(dplyr)

## data make
set.seed(111)
df = data.frame(var1 = abs(rnorm(50,10,10)), var2 = abs(rnorm(50,2,8)))

## core
df <- df %>%
  mutate(var3 = case_when(var1 < 5 & var2 < 5 ~ 'a', 
                          var1 < 10 & var2 < 10 ~ 'b',
                          TRUE ~ 'c'))
## plot to check
with(df, plot(var1, var2, col = c(2:4)[as.numeric(as.factor(var3))], cex = 0.7))
abline(h = c(5, 10), v = c(5, 10), lty = 2)

enter image description here

The above code works perfectly.

In the above code, the bins are made using "fixed definitions". Now, I am trying to do the same thing using "random definitions":

 #create data
 a1 = rnorm(1000,100,10)
 b1 = rnorm(1000,200,5)
 c1 = sample.int(1000, 1000, replace = TRUE)
 train_data = data.frame(a1,b1,c1)
 
 #random criteria
 random_1 =  runif(1, 0, 100)
random_2 =  runif(1, 100, 200)



 df <- train_data %>%
     mutate(var3 = case_when(a1 < random_1 & b1 < random_1 ~ 'a', 
                             a1 < random_2 & b1 < random_2 ~ 'b',
                             TRUE ~ 'c'))

However, everything is now labelled as "c":

table(df$var3)

   c 
1000

I repeatedly tried to re-run this code with different "random bin definitions", but it always places everything into a single bin. Shouldn't this at least place data into 2 bins?

Can someone please show me how to bin the data using random definitions?

Thanks




Aucun commentaire:

Enregistrer un commentaire