regex - Improve efficiency of python os.walk + regular expression algorithm -


i'm using os.walk select files specific folder match regular expression.

for dirpath, dirs, files in os.walk(str(basedir)):     files[:] = [f f in files if re.match(regex, os.path.join(dirpath, f))]     print dirpath, dirs, files 

but has process files , folders under basedir, quite time consuming. i'm looking way use same regular expression used files filter out unwanted directories in each step of walk. or way match part of regex...

for example, in structure like

/data/2013/07/19/file.dat 

using e.g. following regular expression

/data/(?p<year>2013)/(?p<month>07)/(?p<day>19)/(?p<filename>.*\.dat) 

find .dat files without needing e.g. /data/2012

if, example, want files in /data/2013/07/19 processed, start os.walk() directory top /data/2013/07/19. similar tommi komulainen's suggestion, needn't modify loop code.


Comments