regex - Python library to generate regular expressions -

is there lib out there can take text (like html document) , list of strings (like name of products) , find pattern in list of strings , generate regular expression extract strings in text (html document) match pattern found?

for example, given following html:

<table>   <tr>     <td>product 1</td>     <td>product 2</td>     <td>product 3</td>     <td>product 4</td>     <td>product 5</td>     <td>product 6</td>     <td>product 7</td>     <td>product 8</td>   </tr> </table>

and following list of strings:

['product 1', 'product 2', 'product 3']

i'd function build regex following:

'<td>(.*?)</td>'

and extract information html match regex. in case, output be:

['product 1', 'product 2', 'product 3', 'product 4', 'product 5', 'product 6', 'product 7', 'product 8']

clarification:

i'd function @ surrounding of samples, not @ samples themselves. so, example, if html was:

<tr>   <td>word</td>   <td>more words</td>   <td>101</td>   <td>-1-0-1-</td> </tr>

and samples ['word', 'more words'] i'd extract:

['word', 'more words', '101', '-1-0-1-']

your requirement @ same time specific , general.

i don't think ever find library purpose unless write own.

on other hand, if spend time writing regex, use gui tools build them, like: http://www.regular-expressions.info/regexmagic.html

however, if need extract data html documents only, should consider using html parser, should make things lot easier.

i recommend beautifulsoup parsing html document in python: https://pypi.python.org/pypi/beautifulsoup4/4.2.1

Brazier

Search This Blog

regex - Python library to generate regular expressions -

Comments

Post a Comment