I have a data-set of bank business units (branches) and accounts (account numbers). Some branches have 2 accounts, while others can have 50 - it varies. I need to randomly sample 5 accounts from each branch. I tried using the code below, but get the following error :
ERROR: The sample size, 5, is greater than the number of sampling units, 2.
I need it to be LIKE SMAPSIZE < 6, i.e if there are only 2 obs per branch, bring only the 2.
This is the code:
PROC SQL ;
CREATE TABLE FINAL_RANDOM as
SELECT t1.mis_division_id,
t1.mis_wing_id,
t1.region_id,
t1.account_branch_id,
t1.branch_name,
t1.acc,
t2.Attribute,
FROM work.ORGANIZATION_STRUC2 t1
INNER JOIN work.UNION_ALL_RANDOM t2
ON t1.account_id = t2.account_id
;
QUIT ;
PROC SORT DATA=work.FINAL_RANDOM ;
BY Account_Branch_Id ;
RUN ;
PROC SURVEYSELECT DATA=FINAL_RANDOM OUT=FINAL_RANDOM_1 NOPRINT
METHOD=srs
SAMPSIZE = 5 ;
STRATA Account_Branch_Id ;
RUN;
Aucun commentaire:
Enregistrer un commentaire