I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. Right now I'm using R's kde2d
from the MASS package. After estimating the probability distribution, I use sample
to sample from slices of the 2D distribution along the x-axis. I use sample
much like described here. Example code would look like this
library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)
The den
looks like this
My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. Other areas have basically no data points and nothing is going on there. I would be fine if I could just set the n
parameter of kde2d
to a very high number in order to have a good resolution of my data everywhere. Alas, this is not possible due to memory constraints.
That's why I thought I could modify the kde2d
function to have a non-constant granularity.
Here is the source code of the kde2d function. One can modify the line
gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
and put whatever granularity is wished for on the y-axis. For example
gy <- c(seq(0.0, 1.0, 0.1,
seq(1.0, n[2L], 0.5)
And the modified kde2d
returns the kernel density estimate at the specified positions. Works very well. Suppose I have now
Problem is, I can no longer use sample
to sample from slices along the x-axis. Because the part on the left side of the distribution is much finer and thus has a higher distribution to be sampled by sample
.
What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? Thank you a lot.
Aucun commentaire:
Enregistrer un commentaire