mercredi 28 octobre 2020

Asynchronous column iteration through an array

I'm trying to solve the following algorithmic problem:

I'm given an array (it could be a standard C/C++ array, a vector<vector<T> >, or something else... whatever is most suited to this problem is fine) populated with random non-zero values. I know the size of the array, m x n.

What I want to do is accomplish (either directly, or through an equivalent approach) the following: iterate through all n columns until a value over some threshold, t is found. Continue iterating in the remaining columns until either another value over t is found, or some maximum number of iterations has been reached.

If the max number of iterations has been reached, I continue from here iterating through all n columns until I again encounter a value exceeding threshold t. If I find a second value exceeding that threshold, I continue until I've either found a third value exceeding t, or I've reached the max number of iterations from the first value.

If I reach max iterations, I continue iteraitng through all n columns until I again encounter a value exceeding threshold t. If I find a third value, I add one to some count, and I continue iteration through all n columns.

So, in the end either I find three values, in three unique columns, exceeding the threshold, or I don't and I keep searching from the first found value + max iterations.


I'm not sure how to execute this. I have a few ideas:

  • Keep some sort of list of iterators, one for each column. Not sure where I'd go from there, though.
  • Have a for loop like for(int i = 0, j = 0, k = 0..., with one iterator for each column and changing the value of the iterator with if statements inside the loop. This quickly becomes pretty complicated, especially for large arrays, though.
  • Iterate through each row with nested for loops. When first value is found, save column index to an std::set. Start an iterator at 0 here. Continue iterating, adding found threshold value column #'s to the set each time one is found. Check each time if set size is 3. If set size is 3, add to count. When the iterator has reached the maximum value, reset to 0 and empty the set.

My goals:

  • Most important is readability. Many other people will be reading and using this. I understand that I'll likely need nested loops and such, but I'd like to avoid clutter as much as possible.
  • Speed. I'm looking at arrays with columns on the order of tens, and rows on the order of thousands or tens of thousands. Optimally, the code should be able to 'scan' the array in a few minutes time (I know that's a bit hand-wavy and there are a lot of factors at play, but hopefully that gives some sense of the 'constraints').

If it is at all useful to know, the random values are from a Gaussian distribution, with known values for mu and sigma.




Aucun commentaire:

Enregistrer un commentaire