lundi 29 mars 2021

StatsBase.sample() can't draw with replacement if FrequencyWeights() are provided

I'm trying to sample without replacement using StatsBase.sample() in Julia. Because I have my data in the following form I can use my counts as FrequencyWeights():

using StatsBase

data   = ["red", "blue", "green"]
counts = [2000, 2000, 1]

balls  = StatsBase.sample(data, FrequencyWeights(counts), 1000)

One problem with this is that StatsBase.sample() implicitly sets replace=true so this is possible:

countmap(balls)
Dict("blue"  => 478,
     "green" => 2,  # <= two green balls?
     "red"   => 520)

Explicitly setting replace=false throws an error.

balls  = balls = StatsBase.sample(data, FrequencyWeights(counts), 1000, replace=false)

Cannot draw 3 samples from 1000 samples without replacement.

error(::String)@error.jl:33
var"#sample!#174"(::Bool, ::Bool, ::typeof(StatsBase.sample!), ::Random._GLOBAL_RNG, ::Vector{String}, ::StatsBase.FrequencyWeights{Int64, Int64, Vector{Int64}}, ::Vector{String})@sampling.jl:858
#sample#175@sampling.jl:871[inlined]
#sample#176@sampling.jl:874[inlined]
top-level scope@Local: 2[inlined]

Is my only solution here to reformat my data to a wide form like this? Because that seems very inefficient as my actual data set has a lot of counts.:

wide_data = [fill("red", 2000)..., fill("blue", 2000)..., "green"]
sample(wide_data, 1000, replace=false)



Aucun commentaire:

Enregistrer un commentaire