is there lib out there can take text (like html document) , list of strings (like name of products) , find pattern in list of strings , generate regular expression extract strings in text (html document) match pattern found?
for example, given following html:
<table> <tr> <td>product 1</td> <td>product 2</td> <td>product 3</td> <td>product 4</td> <td>product 5</td> <td>product 6</td> <td>product 7</td> <td>product 8</td> </tr> </table> and following list of strings:
['product 1', 'product 2', 'product 3'] i'd function build regex following:
'<td>(.*?)</td>' and extract information html match regex. in case, output be:
['product 1', 'product 2', 'product 3', 'product 4', 'product 5', 'product 6', 'product 7', 'product 8'] clarification:
i'd function @ surrounding of samples, not @ samples themselves. so, example, if html was:
<tr> <td>word</td> <td>more words</td> <td>101</td> <td>-1-0-1-</td> </tr> and samples ['word', 'more words'] i'd extract:
['word', 'more words', '101', '-1-0-1-']
your requirement @ same time specific , general.
i don't think ever find library purpose unless write own.
on other hand, if spend time writing regex, use gui tools build them, like: http://www.regular-expressions.info/regexmagic.html
however, if need extract data html documents only, should consider using html parser, should make things lot easier.
i recommend beautifulsoup parsing html document in python: https://pypi.python.org/pypi/beautifulsoup4/4.2.1
Comments
Post a Comment