image processing - CUDA: unable to calculate grid size -


i'm writing program convert rgba image greyscale. i've worked on , have correctly implemented kernel. however, grid size possible wrong, though correct logic.

the kernel:

__global__ void rgba_to_greyscale(const uchar4* const rgbaimage,                    unsigned char* const greyimage,                    int numrows, int numcols) {        int x = (blockidx.x * blockdim.x) + threadidx.x;     int y = (blockidx.y * blockdim.y) + threadidx.y;      if(x >= numcols || y >= numrows)         return;      uchar4 rgba = rgbaimage[x+y];     float channelsum = 0.299f*rgba.x + 0.587f*rgba.y + 0.114f*rgba.z;      greyimage[x+y] = channelsum; } 

and kernel launch:

const dim3 blocksize(10, 10, 1);  //todo   size_t gridsizex, gridsizey;   gridsizex = numcols + (10 - (numcols % 10) );  //adding number make multiple of 10   gridsizey = numrows + (10 - (numrows % 10) );  //adding number make multiple of 10    const dim3 gridsize( gridsizex, gridsizey, 1);  //todo   rgba_to_greyscale<<<gridsize, blocksize>>>(d_rgbaimage, d_greyimage, numrows, numcols); 

i'm creating more number of threads required , applying bound check in kernel.

you accessing image using x+y. think this, maximum image size can way numrows+numcols. cannot add 2 coordinates, since mean e.g. (1,2) same image element (3,0) plain rubbish. instead each y-coordinate have skip entire row of image, should rgbaimage[x+y*numcols] (and same greyimage, of course). note, depending on layout of image data might other way around (x*numrows+y), i'm assuming usual image layout here (and in kernel doesn't matter anyway, since pixels treated equally).


Comments