When compiling this code, I have noticed that when the thread count is high (above i = 64 and j = 10) the method will not compile. No matter what I try to print the code just will not run here.
If I lower i down to below 50 it will compile again and work. I am new to cuda, so I am possibly doing something wrong. But if I comment out curand_init the code compiles fine. The problem has to do with curand_init.
__global__ void initializePart(Particle *dev_particle) {
int i = threadIdx.x + blockIdx.x *blockDim.x;
int j = threadIdx.y + blockIdx.y *blockDim.y;
curandState state;
curand_init(seed, i, j, &state);
double random = curand_uniform(&state)*(1000 - (-1000)) + (-1000);
}
dim3 grid(1, 1, 1);
dim3 block(64,10,1);
initializePart << < grid,block>> > (*dev_particle);
When compiled the method will not run unless I lower the 64 down to below 50. If I printf("test") within the method it does not execute it at all. I just want each thread, i and j to print a different random number. So 640 random numbers in total between -1000 and 1000.
Any idea what the problem could be?
Aucun commentaire:
Enregistrer un commentaire