vendredi 20 juillet 2018

Generate random intervals with given density and overlapping

I am in search of algorithm (possibly, approximate) that will generate test data.

We have large integer interval [0, n), n may be up to 10^9. We want to generate a number of smaller intervals (possibly, overlapping) of length k each, all of which fit inside this large interval and also satisfy following properties:

  1. Number of "cells" covered by these intervals divided by n must be equal to density (<=1.0)
  2. Every cell covered by at least one interval is actually covered by overlapping (>=1.0) intervals on average. E.g. degenerate case of overlap_factor=1.0 means that no two intervals intersect.
  3. Interval positions should be distributed uniformly randomly in all other respects

Achieving both (1) and (2) is what makes this problem difficult. The algorithm should produce an array of interval positions.

Image below demonstrates one of the solutions for n=20, k=4, density=0.5, overlapping=1.6:

    ◼◼◼◼
     ◼◼◼◼
   ◼◼◼◼     ◼◼◼◼
       ↓↓↓
◻◻◻◼◼◼◼◼◼◻◻◻◼◼◼◼◻◻◻◻
0                  19

density = 10/20 = 0.5
overlapping = 4*4/10 = 1.6

Real-world applications will opearte with larger values: n ≤ 10^9, k ∈ [1 .. 10^6], density ∈ [0.01 .. 1.0], overlapping ∈ [1.0..5.0].

Because this algorithm is intended to generate test data, approximate solution would be fine.




Aucun commentaire:

Enregistrer un commentaire