samedi 15 février 2020

Generate same random matrix in OpenMP than sequential code

I'd like to generate a random matrix with OpenMP like it were generated by a sequential program, i.e. if any sequential matrix generator outputs me a matrix like the following one:

1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
9.0 0.0 1.0 2.0
3.0 4.0 5.0 6.0

I want the parallel OpenMP version of the same program to generate the same matrix with no interleaved rows.


Here is how I gradually approached the problem.

Given my serial generator C function generating a matrix as a 1D array:

void generate_matrix_array(
    double *v,
    int rows,
    int columns,
    double min,
    double max,
    int seed
) {
    srand(seed);

    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < columns; j++) {
            v[i*rows + j] = min + (rand() / (RAND_MAX / (max - min)));
        }
    }
}

First, I naively tried the #pragma omp parallel for directive to outer for loop; however, there's no guarantee about row ordering, since thread execution gets interleaved, so they get generated in a non-deterministic order.

Adding the ordered option would solve the issue at the price of making useless multithreading in this particular case.

In order to solve the issue, I tried to partition by hand the matrix array so that thread i would generate the i-th slice of it:

void generate_matrix_array_par(
    double *v,
    int rows,
    int columns,
    double min,
    double max,
    int seed
) {
    srand(seed);

    #pragma omp parallel \
            shared(v)
    {
        int tid = omp_get_thread_num();
        int nthreads = omp_get_num_threads();
        int rows_per_thread = round(rows / (double) nthreads);
        int rem_rows = rows % (nthreads - 1) != 0?
            rows % (nthreads - 1):
            rows_per_thread;
        int local_rows = (tid == 0)?
            rows_per_thread:
            rem_rows;
        int lower_row = tid * local_rows;
        int upper_row = ((tid + 1) * local_rows);

        printf(
            "[T%d] receiving %d of %d rows from row %d to %d\n",
            tid,
            local_rows,
            rows,
            lower_row,
            upper_row - 1
        );
        printf("\n");
        fflush(stdout);

        for (int i = lower_row; i < upper_row; i++) {
            for (int j = 0; j < columns; j++) {
                v[i*rows + j] = min + (rand() / (RAND_MAX / (max - min)));
            }
        }
    }
}

However, despite matrix vector gets properly divided among threads, for some reason unknown to me, every thread generates its rows into the matrix in a non-deterministic order, i.e. if I want to generate a 8x8 matrix with 4 threads and thread 3 is assigned to rows 4 and 5, he will generate two contiguous rows in the matrix array but in the wrong position every time, like if I didn't perform any partitioning and the omp parallel for directive was in place.

I skeptically tried, at last, to get back to naive approach by specifying shared(v) and schedule(static, 16) options to omp parallel for directive and it 'magically' happens to work:

void generate_matrix_array_par(
    double *v,
    int rows,
    int columns,
    double min,
    double max,
    int seed
) {
    srand(seed);

    #pragma omp parallel for \
            shared(v) \
            schedule(static, 16)
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < columns; j++) {
            v[i*rows + j] = min + (rand() / (RAND_MAX / (max - min)));
        }
    }
}

The schedule option is being added since I read somewhere else that it gets rid of cache conflicts.


Any question? YES!!!

Now, I'd like to know whether I missed or failed some consideration about the problem, since I'm not convinced about the fairness of my last version of the program, despite the fact that it is working.




Aucun commentaire:

Enregistrer un commentaire