vendredi 28 janvier 2022

Optimal way of picking a random number that's not in a container with C++

I have a container such as a std::vector<int> containing unique values. I'd like to pick a random integer number between 0 and a certain maximum MAX, that is not in the container. I'm concerned with the time complexity of such a program so I'd like to find something somewhat optimal.

So far I have basically two ideas that have both pros and cons. I can take a random number between 0 and MAX, and check if this number appears inside my container. If it is in the container, then try again.

// Values I don't want to get are stored in Vec
std::default_random_engine generator;
std::uniform_int_distribution<> distribution(0,MAX);
bool found = false;
int number;
while(!found) {
    number = distribution(generator);
    found = true;
    for(auto it : Vec) {
        if(it == number) {
            found = false;
            break;
        }
    }
}

That solution is pretty simple, but it will be excessively slow when most values inbetween 0 and MAX are already in Vec, and my program might come across this edge case.

The other solution would be to first sort Vec, and then pick a random number between 0 and MAX-Vec.size(). That way, the random number refers to a specific number not in the vector, and I'd have to iterate through the sorted container to recover the actual value, like this :

// Values I don't want to get are stored in Vec
std::sort(Vec.begin(),Vec.end());
std::default_random_engine generator;
std::uniform_int_distribution<> distribution(0,MAX-Vec.size());
int ref = distribution(generator);
int shift = 0;
int result;
for(auto it : Vec) {
    if(ref + shift >= it)
        shift++;
    else {
        result = ref + shift;
        break;
    }
}

The main disadvantage of this way of doing things, while always taking the same amount of time, is that it requires to sort the container first, which might be costly. So far, if I'm not mistaken, the first method's best case should be O(n) while the second method is O(n log(n)) since it requires a sort.

My guess is therefore to check which method should be the most efficient depending on the size of the container compared to the maximum MAX, which would basically come down to : if the container is too large, then the probability to have to guess multiple values is too high, therefore use the second method. Is there another way to make this more optimal, or am I on the right way ?




Aucun commentaire:

Enregistrer un commentaire