I'm hoping someone with a little more experience using SAS's SURVEYSELECT procedure can help me in obtaining the following desired results.
The scenario I'm working with is using a data set that needs a random stratified sample, the stratification should be by the column STATE that also needs equal total number of representation (when possible) when creating the random sample.
The size of the sample has some set rules that we have to abide by which are:
- If the total data set size is <= 50 then let the sample size = the entire data set
- Else if the total data set size is between 51 and 500 then let the sample size = 50
- Else if the total data set size is between 501 and 999 then let the sample size = 10% of the total data set size (n*.10) given that n = the total data set size.
- Else if the total data set size is > 999 then let the sample size = 100
SAMPLESIZE is currently defined in code as:
>/*sets sample size in accordance with standards*/
>>%if &num>=0 and &num<=50 %then %let samplesize=#
>>> %else %if &num<501 %then %let samplesize=(50/&num_states);
>>> %else %if &num<1000 %then %let samplesize=%sysevalf((&num*.10/&num_states),ceil);
>>> %else %let samplesize=100;
The scenario I'm using is a data set size of 550 (so the sample size would be 55 [n*.10]/5) with each state totaling the following number:
- IN = 100
- KY = 217
- MO = 189
- OH = 8
- WI = 36
Applying the STRATA option for SURVEYSELECT works great when each state has the minimum number needed to satisfy the sample size. In this case the SAMPLESIZE for each STRATA would be 11
You can see that the OH STRATUM does not satisfy the minimum requirement for the SAMPLESIZE here since there is only 8 records with OH in the data set, hence leading to the following error:
ERROR: The sample size, 11, is greater than the number of sampling units, 8.
My current PROC SURVEYSELECT statement looks like this.
```PROC SURVEYSELECT DATA=UniqueList OUT=UniqueListsamp METHOD=SRS SAMPSIZE=&samplesize NOPRINT;
STRATA PROVIDER_STATE;
RUN;```
What I would like to achieve in this scenario is that the code pull the remaining 3 records needed to satisfy the samplesize for OH from the remaining other 4 states. I have spent several hours looking at SAS documentation looking over all of the options this procedure has available but cannot piece it together. It sounds like maybe ALLOCMIN or SELECTALL could be the options to use here along with an option that offers REPLACEMENT. Any help is GREATLY appreciated.
I will be happy to provide additional information if needed.
Aucun commentaire:
Enregistrer un commentaire