cuda - How to calculate individual thread coordinate indices in 3 D grids? -


i have 3 d grid consisting of 3d blocks. wish calculate individual thread indexes of each coordinates every time kernel being called. have these parameters:

dim3 blocks_query(32,32,32); dim3 threads_query(32,32,32); kernel<<< blocks_query,threads_query >>>(); 

inside kernel, wish calculate individual values of x,y , z coordinates instance, x=0,y=0,z=0, x=0,y=0,z=1, x=0,y=0,z=2,....thanks in advance....

individual thread indices (x, y, z coordinates) can calculated inside kernel follows:

int x = blockidx.x * blockdim.x + threadidx.x; int y = blockidx.y * blockdim.y + threadidx.y; int z = blockidx.z * blockdim.z + threadidx.z; 

keep in mind number of threads per block limited gpu. block size have created invalid.

dim3 threads_query(32,32,32) 

it equals 32768 threads per block not supported of current cuda devices. currently, maximum 1024 threads per block supported gpus of compute capability 2.0 , above while maximum 512 threads older gpus. should reduce block size otherwise kernel not launch. thing noted creating 3d grid supported on cuda gpus of compute 2.0 , above.

update

suppose dimensions of 3d data xdim, ydim , zdim, generic grid of thread blocks can formed follows:

dim3 threads_query(8,8,8);  dim3 blocks_query;  blocks_query.x = (xdim + threads_query.x - 1)/threads_query.x; blocks_query.y = (ydim + threads_query.y - 1)/threads_query.y; blocks_query.z = (zdim + threads_query.z - 1)/threads_query.z; 

the above approach create total number of threads equal or greater total data size. threads may cause invalid memory access. perform bound checks inside kernel. can passing xdim, ydim , zdim kernel arguments , adding following line inside kernel:

if(x>=xdim || y>=ydim || z>=zdim) return; 

Comments