mardi 22 décembre 2015

Weighted random based off integers from SQL database

I am using a SQLite3 database which keeps track of how many times a word is used:

Psudo table:

table Words {
    Word1 as text;
    NextWord as text;
    TimesUsed as integer
}

When issuing a SQL statement such as

select NextWord, TimesUsed from Words where Word1 = "cat"

It can presumably return a array of array such as

 [["Ball", 2], ["Toy", 8], ["dead", 1]]


From this array of array, the first index of the sub array being the next word, and the second index being the number of times used; which ways can I create a weighted random selection where the most used word has the highest chance of being selected?

Currently I am doing the following (below is the output, debugging lines added to show output):

nextwords = [["Ball", 2], ["Toy", 8], ["dead", 1]]

weights = nextwords.map { |_, weight| weight }
p weights

words   = nextwords.map { |word, _| word }
p words

cdf = weights.map { |weight| (Float weight) / weights.reduce(:+) }
p cdf

loaded_nextwords = words.zip(cdf).to_h
p loaded_nextwords

random_word = loaded_nextwords.max_by { |_, weight| rand ** (1.0 / weight) }.first
p random_word

---- Output ----
[2, 8, 1]
["Ball", "Toy", "dead"]
[0.18181818181818182, 0.7272727272727273, 0.09090909090909091]
{"Ball"=>0.18181818181818182, "Toy"=>0.7272727272727273, "dead"=>0.09090909090909091}
"Toy"

Which works absolutely fine, but it seems a bit long winded to create a weighted random choice from the original result from the SQL DB.

Is there any shortcuts I can take to come to the same outcome in a cleaner way? I would imagine there are ways to achieve this without separating the array of array into sub arrays holding purely weights and words.

Thank you for any tips.




Aucun commentaire:

Enregistrer un commentaire