I'm trying to sample without replacement using StatsBase.sample()
in Julia. Because I have my data in the following form I can use my counts as FrequencyWeights()
:
using StatsBase
data = ["red", "blue", "green"]
counts = [2000, 2000, 1]
balls = StatsBase.sample(data, FrequencyWeights(counts), 1000)
One problem with this is that StatsBase.sample()
implicitly sets replace=true
so this is possible:
countmap(balls)
Dict("blue" => 478,
"green" => 2, # <= two green balls?
"red" => 520)
Explicitly setting replace=false
throws an error.
balls = balls = StatsBase.sample(data, FrequencyWeights(counts), 1000, replace=false)
Cannot draw 3 samples from 1000 samples without replacement.
error(::String)@error.jl:33
var"#sample!#174"(::Bool, ::Bool, ::typeof(StatsBase.sample!), ::Random._GLOBAL_RNG, ::Vector{String}, ::StatsBase.FrequencyWeights{Int64, Int64, Vector{Int64}}, ::Vector{String})@sampling.jl:858
#sample#175@sampling.jl:871[inlined]
#sample#176@sampling.jl:874[inlined]
top-level scope@Local: 2[inlined]
Is my only solution here to reformat my data to a wide form like this? Because that seems very inefficient as my actual data set has a lot of counts.:
wide_data = [fill("red", 2000)..., fill("blue", 2000)..., "green"]
sample(wide_data, 1000, replace=false)
Aucun commentaire:
Enregistrer un commentaire