i have 3 d grid consisting of 3d blocks. wish calculate individual thread indexes of each coordinates every time kernel being called. have these parameters:
dim3 blocks_query(32,32,32); dim3 threads_query(32,32,32); kernel<<< blocks_query,threads_query >>>(); inside kernel, wish calculate individual values of x,y , z coordinates instance, x=0,y=0,z=0, x=0,y=0,z=1, x=0,y=0,z=2,....thanks in advance....
individual thread indices (x, y, z coordinates) can calculated inside kernel follows:
int x = blockidx.x * blockdim.x + threadidx.x; int y = blockidx.y * blockdim.y + threadidx.y; int z = blockidx.z * blockdim.z + threadidx.z; keep in mind number of threads per block limited gpu. block size have created invalid.
dim3 threads_query(32,32,32) it equals 32768 threads per block not supported of current cuda devices. currently, maximum 1024 threads per block supported gpus of compute capability 2.0 , above while maximum 512 threads older gpus. should reduce block size otherwise kernel not launch. thing noted creating 3d grid supported on cuda gpus of compute 2.0 , above.
update
suppose dimensions of 3d data xdim, ydim , zdim, generic grid of thread blocks can formed follows:
dim3 threads_query(8,8,8); dim3 blocks_query; blocks_query.x = (xdim + threads_query.x - 1)/threads_query.x; blocks_query.y = (ydim + threads_query.y - 1)/threads_query.y; blocks_query.z = (zdim + threads_query.z - 1)/threads_query.z; the above approach create total number of threads equal or greater total data size. threads may cause invalid memory access. perform bound checks inside kernel. can passing xdim, ydim , zdim kernel arguments , adding following line inside kernel:
if(x>=xdim || y>=ydim || z>=zdim) return;
Comments
Post a Comment