i facing problem perl regex. on img element, want match src attribute value starting /file?id, , class , alt attribute. want ignore rel attribute exist , not exist below:
<img rel="lightbox[45451]" src="/file?id=13166" class="bbc_img" alt="myimagess.jpg"> <img src="/file?id=13166" class="bbc_img" alt="myimagess.jpg"> my question how handle optional rel attribute.
i trying rel attribute match:
(?!\s+(rel)="([^"]+)") it works when there no rel attribute fails when img has rel attribute.
this trivial using proper html parser. program demonstrates using html::treebuilder , look_down method.
it searching elements with:
- a tag name of 'img'
- a
srcattribute matches regex qr|^/file\?id=| - a
classattribute matches null regex (i.e. class attribute value) - an
altattribute matches null regex
you don't want do elements once you've found them. code uses as_html display them.
use strict; use warnings; use html::treebuilder; $html = html::treebuilder::xpath->new_from_file(\*data); @images = $html->look_down( _tag => 'img', src => qr|^/file\?id=|, class => qr//, alt => qr// ); print $_->as_html, "\n" @images; __data__ <html> <head> <title>page title</title> </head. <body> <img rel="lightbox[45451]" src="/file?id=13166" class="bbc_img" alt="myimagess.jpg"> <img src="/file?id=13166" class="bbc_img" alt="myimagess.jpg"> <img src="/file" class="bbc_img" alt="myimagess.jpg"> /* mismatch id="" */ <img src="/file?id=13166" alt="myimagess.jpg"> /* no class="" */ <img src="/file?id=13166" class="bbc_img"> /* no alt="" */ </body> </html> output
<img alt="myimagess.jpg" class="bbc_img" rel="lightbox[45451]" src="/file?id=13166" /> <img alt="myimagess.jpg" class="bbc_img" src="/file?id=13166" />
Comments
Post a Comment