lundi 26 septembre 2016

Weighted distribution among buckets with no look-ahead

I have N workers that need to process incoming batches of data. Each worker is configured so that it knows that it is "worker X of N".

Each incoming batch of data has a random unique ID (being random, it is uniformly distributed), and it has a different size; processing time is proportional to the size. Size can vary wildly.

When a new batch of data is available, it is immediately visible as available to all N workers, but I want only one to actually process it, without coordination among them. Right now, each worker calculates ID % N == X, and it it's true, the worker self-assigns the batch, while the others skip it. This works correctly and makes sure that, on average, each worker processes the same number of batches. Unfortunately, it doesn't take into account the batch size, so some workers can finish processing much later than others, because they might happen to self-assign very large jobs.

How can I change the algorithm so that each work self-assigns batches in a way that also takes into account the size of the batch, so that on average, each worker will self-assign the same total size of work (from different batches)?




Aucun commentaire:

Enregistrer un commentaire