I want to split data into 70% data training and 30% data testing. Then i try this query:
select A.*,
case
when rand() < 0.7 then 'training'
else 'test'
end as split
from costumer A
order by user_id
But when i count distinct from user_id, the proportion of training:test not 70%:30%. How i get 70%:30% data and random by user_id?
Aucun commentaire:
Enregistrer un commentaire