samedi 29 juin 2019

Weighted random sampling for Monte Carlo simulation in R

I would like to run a Monte Carlo simulation. I have a data.frame where rows are unique IDs which have a probability of association with one of the columns. The data entered into the columns can be treated as the weights for that probability. I want to randomly sample each row in the data.frame based on the weights listed for each row. Each row should only return one value per run. The data.frame structure looks like this:

ID,    X2000,  X2001,  X2002,  X2003,  X2004
X11,   0,      0,      0.5,    0.5,    0
X33,   0.25,   0.25,   0.25,   0.25,   0
X55,   0,      0,      0,      0,      1
X77,   0.5,    0,      0,      0,      0.5

For weighting, "X11" should either return X2002 or X2003, "X33" should have an equal probability of returning X2000, X2001, X2002, or X2003, should be equal with no chance of returning X2004. The only possible return for "X55" should be X2004.

The output data I am interested in are the IDs and the column that was sampled for that run, although it would probably be simpler to return something like this:

ID,    X2000,  X2001,  X2002,  X2003,  X2004
X11,   0,      0,      1,      0,      0
X33,   1,      0,      0,      0,      0
X55,   0,      0,      0,      0,      1
X77,   1,      0,      0,      0,      0




Aucun commentaire:

Enregistrer un commentaire