random: Weighted random based off integers from SQL database

mardi 22 décembre 2015

Weighted random based off integers from SQL database

I am using a SQLite3 database which keeps track of how many times a word is used:

Psudo table:

table Words {
    Word1 as text;
    NextWord as text;
    TimesUsed as integer
}

When issuing a SQL statement such as

select NextWord, TimesUsed from Words where Word1 = "cat"

It can presumably return a array of array such as

 [["Ball", 2], ["Toy", 8], ["dead", 1]]

From this array of array, the first index of the sub array being the next word, and the second index being the number of times used; which ways can I create a weighted random selection where the most used word has the highest chance of being selected?

Currently I am doing the following (below is the output, debugging lines added to show output):

nextwords = [["Ball", 2], ["Toy", 8], ["dead", 1]]

weights = nextwords.map { |_, weight| weight }
p weights

words   = nextwords.map { |word, _| word }
p words

cdf = weights.map { |weight| (Float weight) / weights.reduce(:+) }
p cdf

loaded_nextwords = words.zip(cdf).to_h
p loaded_nextwords

random_word = loaded_nextwords.max_by { |_, weight| rand ** (1.0 / weight) }.first
p random_word

---- Output ----
[2, 8, 1]
["Ball", "Toy", "dead"]
[0.18181818181818182, 0.7272727272727273, 0.09090909090909091]
{"Ball"=>0.18181818181818182, "Toy"=>0.7272727272727273, "dead"=>0.09090909090909091}
"Toy"

Which works absolutely fine, but it seems a bit long winded to create a weighted random choice from the original result from the SQL DB.

Is there any shortcuts I can take to come to the same outcome in a cleaner way? I would imagine there are ways to achieve this without separating the array of array into sub arrays holding purely weights and words.

Thank you for any tips.

random

mardi 22 décembre 2015

Weighted random based off integers from SQL database

Aucun commentaire:

Enregistrer un commentaire