dimanche 29 décembre 2019

Sampling 5 observation from a data-set, where the ranked variable does not always have 5 observations

I have a data-set of bank business units (branches) and accounts (account numbers). Some branches have 2 accounts, while others can have 50 - it varies. I need to randomly sample 5 accounts from each branch. I tried using the code below, but get the following error :

ERROR: The sample size, 5, is greater than the number of sampling units, 2.

I need it to be LIKE SMAPSIZE < 6, i.e if there are only 2 obs per branch, bring only the 2.

This is the code:

PROC SQL ;
    CREATE TABLE FINAL_RANDOM as
    SELECT  t1.mis_division_id,
            t1.mis_wing_id,
            t1.region_id,
            t1.account_branch_id,
            t1.branch_name,
            t1.acc,
            t2.Attribute,
    FROM work.ORGANIZATION_STRUC2 t1
    INNER JOIN work.UNION_ALL_RANDOM t2
    ON t1.account_id = t2.account_id
;
QUIT ;

PROC SORT DATA=work.FINAL_RANDOM ;
BY Account_Branch_Id ;
RUN ;

PROC SURVEYSELECT DATA=FINAL_RANDOM OUT=FINAL_RANDOM_1 NOPRINT
     METHOD=srs
     SAMPSIZE = 5 ;
     STRATA Account_Branch_Id ;
RUN; 



Aucun commentaire:

Enregistrer un commentaire