Here is the dummy set
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
1 -1.22049503 blue
2 1.61641224 blue
3 0.09079087 blue
4 0.32325956 blue
5 -0.62733486 red
6 0.43102051 red
7 0.61619844 red
8 -0.17718356 red
9 1.18737562 yellow
10 -0.19035444 yellow
11 -0.49158052 yellow
12 -1.47425432 yellow
13 0.22942192 pink
14 0.76779548 pink
15 0.97631652 pink
16 -0.33513712 pink
what I am trying to get is like if the df$color is blue then those rows will be selected, but if the df$color is blue then it got higher probability of getting that row selected, if df$color is yellow then it got lesser probability of getting that row selected, and if df$color is pink then it got very less probability of getting that row selected
This is what I came up with
my.data.frame <- df[(df$color == 'pink') | (df$color == 'blue') & runif(1) < .6 | (df$color == 'red') & runif(1) < .6|(df$color == 'yellow') & runif(1) < .3, ]
But here is the output in 2 runs
1 -1.22049503 blue
2 1.61641224 blue
3 0.09079087 blue
4 0.32325956 blue
13 0.22942192 pink
14 0.76779548 pink
15 0.97631652 pink
16 -0.33513712 pink
In second run
1 -1.22049503 blue
2 1.61641224 blue
3 0.09079087 blue
4 0.32325956 blue
5 -0.62733486 red
6 0.43102051 red
7 0.61619844 red
8 -0.17718356 red
13 0.22942192 pink
14 0.76779548 pink
15 0.97631652 pink
16 -0.33513712 pink
So here the blue rows are always getting selected as expected, but the other rows say all the red rows are selected in first run, in second run all the pink and all the red rows are selected - instead of some in red and even less in pink.
What am I missing? or any better way of doing this?
Aucun commentaire:
Enregistrer un commentaire