mercredi 10 juillet 2019

Finding a combination of hashes to match a desired percentage distribution of values

Given an array of hashes, I am looking for a a way to select a random subset of these hashes, such that the distribution of attributes of the subset matches desired percentages.

For example, given the following array:

[
  {
    question_id: 1,
    grade: 1,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'ratios', ao: 2 }
    ]
  },
  {
    question_id: 2,
    grade: 3,
    marks: [
      { topic: 'number', ao: 2 },
      { topic: 'number', ao: 2 }
    ]
  },
  {
    question_id: 3,
    grade: 2,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'geometry', ao: 1 },
      { topic: 'ratios', ao: 1 },
      { topic: 'number', ao: 2 },
      { topic: 'geometry', ao: 2 }
    ]
  },
  {
    question_id: 4,
    grade: 3,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'ratios', ao: 2 },
      { topic: 'geometry', ao: 2 },
      { topic: 'geometry', ao: 2 }
    ]
  },
  {
    question_id: 5,
    grade: 1,
    marks: [
      { topic: 'ratios', ao: 1 },
      { topic: 'ratios', ao: 2 }
    ]
  },
  {
    question_id: 6,
    grade: 1,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'number', ao: 2 },
      { topic: 'number', ao: 2 },
      { topic: 'ratios', ao: 2 }
    ]
  },
  {
    question_id: 7,
    grade: 3,
    marks: [
      { topic: 'number', ao: 2 }
    ]
  },
  {
    question_id: 8,
    grade: 3,
    marks: [
      { topic: 'geometry', ao: 1 }
    ]
  }
]

I would like the find a random combination that satisfies the following:

Total number of marks = 10

50% of the marks are topic number
20% of the marks are topic ratios
30% of the marks are topic geometry

40% of the marks are grade 1
50% of the marks are grade 2
10% of the marks are grade 3

50% of the marks are ao 1
50% of the marks are ao 2

An example outcome that satisfies these requirements is:

[
  {
    question_id: 3,
    grade: 2,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'geometry', ao: 1 },
      { topic: 'ratios', ao: 1 },
      { topic: 'number', ao: 2 },
      { topic: 'geometry', ao: 2 }
    ]
  },
  {
    question_id: 6,
    grade: 1,
    marks: [
      { topic: 'number', ao: 1 },
      { topic: 'number', ao: 2 },
      { topic: 'number', ao: 2 },
      { topic: 'ratios', ao: 2 }
    ]
  },
  {
    question_id: 8,
    grade: 3,
    marks: [
      { topic: 'geometry', ao: 1 }
    ]
  }
]

Ideally, if a combination did not exist to satisfy these requirements (with some degree of tolerance), I would expect to receive an error.

My initial approach to the problem was to find find all the possible combinations of questions which would sum to 10 marks total, and then iterate through these combinations and check each to see if it satisfied all of other the requirements.

I started with this algorithm that finds all possible combinations of numbers from an array to sum to a desired total:

def subset_sum(numbers, target, partial=[], result=[])
    s = partial.inject 0, :+

    if s == target
      result << partial
    end

    return if s >= target

    (0..(numbers.length - 1)).each do |i|
      n = numbers[i]
      remaining = numbers.drop(i+1)
      subset_sum(remaining, target, partial + [n], result)
    end

    result
  end
end

However in the real application of my problem I expect the array of questions to have a length over 1000, and total number of marks equal to 40. For these numbers this solution is far too under optimized and the runtime is very long.




Aucun commentaire:

Enregistrer un commentaire