vendredi 25 septembre 2015

How to draw without replacement to fill in data set

I am generating a data set where I first want to randomly draw a number for each observation from a discrete distribution, fill in var1 with these numbers. Next, I want to draw another number from the distribution for each row, but the catch is that the number in var1 for this observation is not eligible to be drawn anymore. I want to repeat this a relatively large number of times.

To hopefully make this make more sense, suppose that I start with:

id
1
2
3
...
999
1000

Suppose that the distribution I have is ["A", "B", "C", "D", "E"] that happen with probability [.2, .3, .1, .15, .25].

I would first like to randomly draw from this distribution to fill in var. Suppose that the result of this is:

id    var1
1     E
2     E
3     C
...   
999   B
1000  A

Now E is not eligible to be drawn for observations 1 and 2. C, B, and A are ineligible for observations 3, 999, and 1000, respectively.

After all the columns are filled in, we may end up with this:

id    var1  var2  var3  var4  var5
1     E     C     B     A     D
2     E     A     B     D     C
3     C     B     A     E     D
...        
999   B     D     C     A     E
1000  A     E     B     C     D

I am not sure of how to approach this in Stata. But one way to fill in var1 is to do something like:

gen random1 = runiform()
replace var1 = "A" if random1<.2
replace var1 = "B" if random1>=.2 & random1<.5
etc....




Aucun commentaire:

Enregistrer un commentaire