I am using a SQLite3 database which keeps track of how many times a word is used:
Psudo table:
table Words {
Word1 as text;
NextWord as text;
TimesUsed as integer
}
When issuing a SQL statement such as
select NextWord, TimesUsed from Words where Word1 = "cat"
It can presumably return a array of array such as
[["Ball", 2], ["Toy", 8], ["dead", 1]]
From this array of array, the first index of the sub array being the next word, and the second index being the number of times used; which ways can I create a weighted random selection where the most used word has the highest chance of being selected?
Currently I am doing the following (below is the output, debugging lines added to show output):
nextwords = [["Ball", 2], ["Toy", 8], ["dead", 1]]
weights = nextwords.map { |_, weight| weight }
p weights
words = nextwords.map { |word, _| word }
p words
cdf = weights.map { |weight| (Float weight) / weights.reduce(:+) }
p cdf
loaded_nextwords = words.zip(cdf).to_h
p loaded_nextwords
random_word = loaded_nextwords.max_by { |_, weight| rand ** (1.0 / weight) }.first
p random_word
---- Output ----
[2, 8, 1]
["Ball", "Toy", "dead"]
[0.18181818181818182, 0.7272727272727273, 0.09090909090909091]
{"Ball"=>0.18181818181818182, "Toy"=>0.7272727272727273, "dead"=>0.09090909090909091}
"Toy"
Which works absolutely fine, but it seems a bit long winded to create a weighted random choice from the original result from the SQL DB.
Is there any shortcuts I can take to come to the same outcome in a cleaner way? I would imagine there are ways to achieve this without separating the array of array into sub arrays holding purely weights and words.
Thank you for any tips.
Aucun commentaire:
Enregistrer un commentaire