i'm trying implement maximum performance circle hough transform in cuda, whereby edge pixel coordinates cast votes in hough space. pseudo code cht follows, i'm using image sizes of 256 x 256 pixels:
int maxradius = 100; int minradius = 20; int imagewidth = 256; int imageheight = 256; int houghspace[imagewidth x imageheight * maxradius]; for(int radius = minradius; radius < maxradius; ++radius) { for(float theta = 0.0; theta < 180.0; ++theta) { xcenter = edgecoordinatex + (radius * cos(theta)); ycenter = edgecoordinatey + (radius * sin(theta)); houghspace[xcenter, ycenter, radius] += 1; } } my basic idea have each thread block calculate (small) tile of output hough space (maybe 1 block each row of output hough space). therefore, need required part of input image shared memory somehow in order carry out voting in particular output sub-hough space.
my questions follows:
how calculate , store coordinates required part of input image in shared memory?
how retrieve x,y coordinates of edge pixels, stored in shared memory?
do cast votes in shared memory array or write votes directly global memory?
thanks in advance time. i'm new cuda , gratefully received.
i don't profess know sort of filtering, basic idea of propagating characteristics source doesn't sound different marching , sweeping methods solving stationary eikonal equation. there paper on solving class of problem (pdf might still available here):
a fast iterative method eikonal equations. won-ki jeong, ross t. whitaker. siam journal on scientific computing, vol 30, no 5, pp.2512-2534, 2008
the basic idea decompose computational domain tiles, , sweep characteristic source across domain. tiles touched advancing characteristic, added list of active tiles , calculated. each time tile "solved" (converged numerical tolerance in eikonal case, state in problem) retired working set , neighbours activated. if tile touched again, re-added active list. process continues until tiles calculated , active list empty. each calculation iteration can solved kernel launch, explictly synchronizes calculation. run many kernels in series required reach empty work list.
i don't think worth trying answer questions until have more concrete algorithmic approach , getting implementation details.
Comments
Post a Comment