vendredi 15 juillet 2022

data.table sample with probabilities stored in columns

I have a data table with probabilities for a discrete distribution stored in columns.

For example, dt <- data.table(p1 = c(0.5, 0.25, 0.1), p2 = c(0.25, 0.5, 0.1), p3 = c(0.25, 0.25, 0.8))

I'd like to create a new column of a random variable sampled using the probabilities in the same row. In data.table syntax I imagine it working like this:

dt[, sample := sample(1:3, 1, prob = c(p1, p2, p3))]

If there were a 'psample' function similar to 'pmin' and 'pmax' this would work. I was able to make this work using apply, the downside is that with my real data set this takes longer than I would like. Is there a way to make this work using data.table? The apply solution is given below.

dt[, sample := apply(dt, 1, function(x) sample(1:3, 1, prob = x[c('p1', 'p2', 'p3')]))]



Aucun commentaire:

Enregistrer un commentaire