lundi 19 février 2018

How to Randomly Assign to Groups of Different Size

Say I have a dataset and I want to assign observations to different groups, the size of groups determined by the data. For example, suppose that this is the data:

sysuse census, clear
keep state region pop
order state pop region
decode region, gen(reg)
replace reg="NCntrl" if reg=="N Cntrl"
drop region
*Create global with regions
global region NE NCntrl South West
*Count the number in each region
bys reg (pop): gen reg_N=_N
tab reg

There are four reg groups, all of different sizes. Now, I want to randomly assign observations to the four groups. This is accomplished below by generating a random number and then assigning observations to one of the groups based on the random number.

*Generate random number
set seed 1
gen random = runiform()
sort random
*Assign observations to number based on random sorting
egen reg_rand = seq(), from(1) to (4)
*Map number to region
gen reg_new = ""
global count 1
foreach i in $region {
    replace reg_new = "`i'" if reg_rand==$count
    global count = $count + 1
}
bys reg_new: gen reg_new_N = _N
tab reg_new

This is not what I want, though. Instead of using the seq() command, which creates groups of equal sizes (assuming N divided by number of groups is a whole number), I would like to randomly assign based on the size of the original groups. In this case, that is equivalent to reg_N. For example, there would be 12 observations that have a reg_new value of NCntrl.

I might have one solution similar to https://stats.idre.ucla.edu/stata/faq/how-can-i-randomly-assign-observations-to-groups-in-stata/. The idea would be to save the results of tab reg into a macro or matrix, and then use a loop and replace to cycle through the observations, which are sorted by a random number. Is there a more reasonable way to accomplish this?




Aucun commentaire:

Enregistrer un commentaire