I am using the R programming language.
I have some data and I am trying to "bin" the data into 3 bins that look like this:
In a previous post ("binning" rows into ranges (dplyr/R)), I learned how to make these bins using the "case_when" statement:
library(dplyr)
## data make
set.seed(111)
df = data.frame(var1 = abs(rnorm(50,10,10)), var2 = abs(rnorm(50,2,8)))
## core
df <- df %>%
mutate(var3 = case_when(var1 < 5 & var2 < 5 ~ 'a',
var1 < 10 & var2 < 10 ~ 'b',
TRUE ~ 'c'))
## plot to check
with(df, plot(var1, var2, col = c(2:4)[as.numeric(as.factor(var3))], cex = 0.7))
abline(h = c(5, 10), v = c(5, 10), lty = 2)
The above code works perfectly.
In the above code, the bins are made using "fixed definitions". Now, I am trying to do the same thing using "random definitions":
#create data
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,200,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
#random criteria
random_1 = runif(1, 0, 100)
random_2 = runif(1, 100, 200)
df <- train_data %>%
mutate(var3 = case_when(a1 < random_1 & b1 < random_1 ~ 'a',
a1 < random_2 & b1 < random_2 ~ 'b',
TRUE ~ 'c'))
However, everything is now labelled as "c":
table(df$var3)
c
1000
I repeatedly tried to re-run this code with different "random bin definitions", but it always places everything into a single bin. Shouldn't this at least place data into 2 bins?
Can someone please show me how to bin the data using random definitions?
Thanks
Aucun commentaire:
Enregistrer un commentaire