jeudi 5 septembre 2019

How to subset data to achieve desired data mix?

Is there any R function that selects random subset of a data to have a desired mix?

For example I have 100 rows of 1, 2, 3 and 4's with 25% in each group. but instead I want the mix to be for example 20%, 20%, 35%, 25% by randomly selecting ideally maximum number of the 100 rows to achieve this.

I have tried to manually cut volume of groups with higher than wanted proportion (1's and 2's) but this is not efficient.

Is there a simpler way to do so?

An extension of this question is: Suppose I have two columns A & B ans 100 rows. A has values 1 - 4 with 25% each, B has c("a", "b", "c") with 33% each. How do I randomly subset to have 1-4 with proportion of 20%, 20%, 35%, 25% respectively and (a, b, c) with proportion of 20%, 30%, 50%?

A  B
1  a
2  b
3  c
4  a
1  b
2  c
3  a
4  b
.....




Aucun commentaire:

Enregistrer un commentaire