i'm writing program convert rgba image greyscale. i've worked on , have correctly implemented kernel. however, grid size possible wrong, though correct logic.
the kernel:
__global__ void rgba_to_greyscale(const uchar4* const rgbaimage, unsigned char* const greyimage, int numrows, int numcols) { int x = (blockidx.x * blockdim.x) + threadidx.x; int y = (blockidx.y * blockdim.y) + threadidx.y; if(x >= numcols || y >= numrows) return; uchar4 rgba = rgbaimage[x+y]; float channelsum = 0.299f*rgba.x + 0.587f*rgba.y + 0.114f*rgba.z; greyimage[x+y] = channelsum; } and kernel launch:
const dim3 blocksize(10, 10, 1); //todo size_t gridsizex, gridsizey; gridsizex = numcols + (10 - (numcols % 10) ); //adding number make multiple of 10 gridsizey = numrows + (10 - (numrows % 10) ); //adding number make multiple of 10 const dim3 gridsize( gridsizex, gridsizey, 1); //todo rgba_to_greyscale<<<gridsize, blocksize>>>(d_rgbaimage, d_greyimage, numrows, numcols); i'm creating more number of threads required , applying bound check in kernel.
you accessing image using x+y. think this, maximum image size can way numrows+numcols. cannot add 2 coordinates, since mean e.g. (1,2) same image element (3,0) plain rubbish. instead each y-coordinate have skip entire row of image, should rgbaimage[x+y*numcols] (and same greyimage, of course). note, depending on layout of image data might other way around (x*numrows+y), i'm assuming usual image layout here (and in kernel doesn't matter anyway, since pixels treated equally).
Comments
Post a Comment