random: Generate random data from arbitrary CDF in R?

jeudi 4 janvier 2018

Generate random data from arbitrary CDF in R?

My first post here, and I'm hoping it wasn't already answered! I can't post images due to low rep, yet, so I will have to post links to my images.

I have an arbitrary CDF that is applied to a point estimate. I have a number of these point estimates with associated CDFs, that I need to simulate random data for a Monte Carlo simulation.

The CDF I'm generating by doing a spline fit to the arbitrary points provided in a table. For example, the quantile 0.1 is a product of 0.13 * point estimate. The quantile 0.9 is a product of 7.57 * point estimate. It is fairly crude and is based on a large study comparing these models to real world system -- ignore that for now please.

I fit the CDF using a spline fit as shown here.

If I take the derivative of this, I get the shape of the pdf (image).

I modified the function "samplepdf" found here, Sampling from an Arbitrary Density, as follows:

samplecdf <- function(n, cdf, spdf.lower = -Inf, spdf.upper=Inf) {
  my_fun <- match.fun(cdf)
  invcdf <- function(u) {
    subcdf <- function(t) my_fun(t) - u
    if (spdf.lower == -Inf) 
      spdf.lower <- endsign(subcdf, -1)
    if (spdf.upper == Inf) 
      spdf.upper <- endsign(subcdf)
    return(uniroot(subcdf, c(spdf.lower, spdf.upper))$root)
  }
 sapply(runif(n), invcdf)
}

This seems to work, OK - when I compare the quantiles I estimate from the randomly generated data they are fairly close to the initial values. However, when I look at the histogram something funny is happening at the tail where it is looks like my function is consistently generating more values than it should according to the pdf. This function consistently does that across all my point-estimates and even though I can look at the individual quantiles and they seem close, I can tell that the overall Monte Carlo simulation is demonstrating higher estimates for the 50% percentile than I expect. Here is a plot of my histogram of the random samples.

Any tips or advice would be very welcome. I think the best route would be to fit an exponential distribution to the CDF, but I'm struggling to do that. All "fitting" assumes that you have data that needs to be fitted -- this is more arbitrary than that.

Thanks!

random

jeudi 4 janvier 2018

Generate random data from arbitrary CDF in R?

Aucun commentaire:

Enregistrer un commentaire