Let's say have n random variables x1...xn. Assume for any random variable xi the set of all possible outcomes is {a,b,c,d} with probabilities p1,p2,p3,p4,respectively where p1+p2+p3+p4=1. The goal is to generate m representative random sequences without creating duplicates. The problem is that if we want to independently sample each random variable in case of having a dominant probability for each outcome so many duplicate sequences are generated. Example:
x1: P(a)=.88, p(b)=.04, p(c)=.04, p(d)=.04
x2: P(a)=.04, p(b)=.88, p(c)=.04, p(d)=.04
x3: P(a)=.04, p(b)=.04, p(c)=.88, p(d)=.04
x4: p(a)=.04,p(b)=.04, p(c)=.04, p(d)=.88
If we sample 1000 sequences independently so many 'abcd' repeats are generated. What is the best way to do this?
Aucun commentaire:
Enregistrer un commentaire