mercredi 2 octobre 2019

Create 'usable' bins from a vector in R

I have a numeric vector with integers which:

  1. I want to transform into "bins".
  2. I want these bins to be used as sample frames from which I can then sample again, uniformly.

So far I can do both using findInterval but I am looking for a way to do it with cut. Let's consider a random vector with integers which will be split in equally sized intervals of length 2:

df = sample(1:100,10)
df
[1] 81 11 38 95 45 14 10 61 96 88

Using findInterval I get the bins and a approximate way for sampling:

b <- findInterval(df, breaks)
b
[1]  9  2  4 10  5  2  1  7 10  9
# If b is equal to 1 or 100, then use ifelse() to prevent leaking outside [1,100]
sam <- round(runif(10,ifelse(b==1,10*b-9,10*b-10),ifelse(b==10,10*b,10*b+10))) 
sam
[1] 85 14 39 94 50 16  7 63 93 85

Using cut I get the intervals:

breaks = seq(1,max(df+1),by=10)
cut(df,breaks,right=TRUE)
[1] (71,81] (1,11]  (31,41] <NA>    (41,51] (11,21] (1,11]  (51,61] <NA>    (81,91] Levels: (1,11] (11,21] (21,31] (31,41] (41,51] (51,61] (61,71] (71,81] (81,91]

But I don't know how to use those values as intervals from which to sample.

If there is another approach, I would be interested to know!




Aucun commentaire:

Enregistrer un commentaire