jeudi 8 octobre 2020

How to make randomly generated data less uniform on plot in R

I'm not sure if this is even possible, but I'm making a plot of randomly generated data to demonstrate something so I need to produce some observations in grey with a strong positive correlation, and then some observations in red with no correlation that is just a cloud-shape of observations.

I've got the grey observations no problem, but the red observations are coming out on the plot in the shape of a square. I need them to be less square, and more cloud-like of randomness. I've tried several different random number distribution methods and none of them seem to be working. It does look less square when I decrease my sample size, but I'd still like it to have a fairly comparable number of samples (i.e. grey has 2000 samples, red has between 500-2000). I've tried rnorm, runif, sample, and truncnorm but all of them just keep producing either that red box or they aren't staying in the general area of where it should be (x between 2 and 4, y between 20 and 30).

Does anyone know how I can decrease the box-iness and make this look less uniform?

enter image description here

# generate correlated
n <- 2000
beta_0 <- 15 # the true intercept
beta_1 <- 3.4 # the true slope
sigma <- 2 # the true standard deviation
t_x <- rnorm(n)
t_y <- beta_0 + beta_1*t_x + rnorm(n, sd=sigma)
trended <- data.frame(t_x, t_y)
trended$indicator <- 'trended'
colnames(trended) <- c("x", "y", "indicator")

# generate noisy data
n = 500
seq_x <- seq(from=2, to=4.1, by=.001)
b_x <- sample(seq_x, size=n, replace=TRUE)
seq_y <- seq(from=20, to=30.1, by=.001)
b_y <- sample(seq_y, size=n, replace=TRUE)
biased <- data.frame(b_x, b_y)
biased$indicator <- 'biased'
colnames(biased) <- c("x", "y", "indicator")

# put together on plot
dummy_data <- rbind(trended, biased)
ggplot(dummy_data, aes(x=x, y=y, color=indicator)) + 
  geom_point(show.legend = FALSE) +
  scale_color_manual(values=c("#FF0000", "#999999")) +
  theme_bw() +
  theme(plot.title = element_text(size=9, face='bold'), legend.position = "none") +
  labs(title= "The Impact of Selection Bias", x="X", y="Y")



Aucun commentaire:

Enregistrer un commentaire