lundi 15 juillet 2019

Elasticsearch random selection based on weighting out of 100

I have been running a Rails site for a couple of years and some articles are being pulled from the DB based on a weight field. The data structure is:

{name: 'Content Piece 1', weight: 50}
{name: 'Content Piece 2', weight: 25}
{name: 'Content Piece 3', weight: 25}

The Ruby code that I originally wrote looks like so:

choices = []
sum = articles.inject(0.0) { |sum, article|
  sum += listing['weight']
}
pick = rand(sum)
choices << articles.detect { |listing|
  if pick <= listing['weight']
    true
  else
    pick -= listing['weight']
    false
  end
}

This works well at pulling out each content piece and respecting the weight. After running this code 100 times across the data set, multiple times I get the content pieces distributed fairly well based on the weights:

100.times do
  choices = []
  sum = articles.inject(0.0) { |sum, article|
    sum += listing['weight']
  }
  pick = rand(sum)
  choices << articles.detect { |listing|
    if pick <= listing['weight']
      true
    else
      pick -= listing['weight']
      false
    end
  }
end

{:total_runs=>100, "Content Piece 1"=>51, "Content Piece 2"=>22, "Content Piece 3"=>27}
{:total_runs=>100, "Content Piece 1"=>53, "Content Piece 2"=>30, "Content Piece 3"=>17}

I am starting to more frequently use ElasticSearch at the moment and I was hoping I could index the data in ES and pull the content out based on weights.

I found a SO post talking about something very similar that can be found here:

Weighted random sampling in Elasticsearch

I have pulled the search query across and changed it to match my data structure:

{
  "sort": ["_score"],
  "size": 1, 
  "query": {
    "function_score": {
      "functions": [
        {
          "random_score": {}
        },
        {
          "field_value_factor": {
            "field": "weight",
            "modifier": "none",
            "missing": 0
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

This query does definitely respect the weighting and pulls out the Content Piece with the weight 50 a lot more than the other 2 content pieces with the weights of 25 but it doesn't distribute the content out of a total of 100 weight, if that makes sense. I run this query 100 times and get results like so:

{:total_runs=>100, "Content Piece 1"=>70, "Content Piece 2"=>22, "Content Piece 3"=>8}
{:total_runs=>100, "Content Piece 1"=>81, "Content Piece 2"=>7, "Content Piece 3"=>12}
{:total_runs=>100, "Content Piece 1"=>90, "Content Piece 2"=>3, "Content Piece 3"=>7}

As I am new to ES and still learning the ins and outs of the querying, scoring etc I was wondering if anyone could help with a solution to more mimic the Ruby code I wrote to more effectively distribute the content based on the weights out of 100.

I hope this makes sense, let me know if you have any more questions to help explain what I am trying to achieve. Thanks!




Aucun commentaire:

Enregistrer un commentaire