perl - Regex to match `rel` attribute of `img` element which only exists sometimes -


i facing problem perl regex. on img element, want match src attribute value starting /file?id, , class , alt attribute. want ignore rel attribute exist , not exist below:

<img rel="lightbox[45451]" src="/file?id=13166" class="bbc_img" alt="myimagess.jpg">      <img  src="/file?id=13166" class="bbc_img" alt="myimagess.jpg"> 

my question how handle optional rel attribute.

i trying rel attribute match:

(?!\s+(rel)="([^"]+)") 

it works when there no rel attribute fails when img has rel attribute.

this trivial using proper html parser. program demonstrates using html::treebuilder , look_down method.

it searching elements with:

  • a tag name of 'img'
  • a src attribute matches regex qr|^/file\?id=|
  • a class attribute matches null regex (i.e. class attribute value)
  • an alt attribute matches null regex

you don't want do elements once you've found them. code uses as_html display them.

use strict; use warnings;  use html::treebuilder;  $html = html::treebuilder::xpath->new_from_file(\*data); @images = $html->look_down(   _tag => 'img',   src => qr|^/file\?id=|,   class => qr//,   alt => qr// ); print $_->as_html, "\n" @images;  __data__ <html>   <head>     <title>page title</title>   </head.   <body>     <img rel="lightbox[45451]" src="/file?id=13166" class="bbc_img" alt="myimagess.jpg">         <img  src="/file?id=13166" class="bbc_img" alt="myimagess.jpg">     <img  src="/file" class="bbc_img" alt="myimagess.jpg"> /* mismatch id="" */     <img  src="/file?id=13166" alt="myimagess.jpg">        /* no class="" */     <img  src="/file?id=13166" class="bbc_img">            /* no alt="" */   </body> </html> 

output

<img alt="myimagess.jpg" class="bbc_img" rel="lightbox[45451]" src="/file?id=13166" /> <img alt="myimagess.jpg" class="bbc_img" src="/file?id=13166" /> 

Comments