i have regular expression. contains required named capture group, , optional named capture groups. captures individual matches , parses sections named groups need.
except, need repeat.
essentially, regular expression represents single atomic unit in (potentially) longer string. instead of matching regex exactly, target string contain repeated instances of regex, separated dot '.' character.
for example, if regular expression captures: <some match>
the actual string of these:
<some match><some match>.<some other match><some match>.<some other match>.<yet match>
what simplest way in modify original regular expression, account repeating patterns, while ignoring dots?
i'm not sure if it's needed, here regular expression i'm using capture individual segments. again, i'd enhance account optional additional segments. i'd have each segment appear "match" in result set;
^(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\])?(?:\[(?<index2>[0-9]+)\])?(?:\[(?<index3>[0-9]+)\])?$ it intended parse class path, 3 optional index accessors. (i.e. "member.sub_member[0].sub_sub_member[0][1][2]")
i suspect answer involves look-ahead or look-behind, not entirely familiar.
i use string.split separate string segments. figure if enhancement regex simple enough, skip split step, , re-use regex validation mechanism, well.
edit:
as additional wrench in gears, i'd disallow dot '.' character beginning or end of string. should exist separators between path segments.
you don't need use look-arounds. can put (^|\.) in front of main pattern , + after it. allow make repeating, .-separated sequence. recommend combine <index> groups single capture simplicity (i used * match number of indexes, can use {0,3} match 3). final pattern be:
(?:(?:^|\.)(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\])*)+$ for example:
var input = "member.sub_member[0].sub_sub_member[0][1][2]"; var pattern = @"(?:(?:^|\.)(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\])*)+$"; var match = regex.match(input, pattern); var parts = (from group g in match.groups capture c in g.captures orderby c.index select c.value) .skip(1); foreach(var part in parts) { console.writeline(part); } which output:
member sub_member 0 sub_sub_member 0 1 2 update: pattern ensure string cannot have leading or trailing dots. it's monster, should work:
^(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\]){0,3}(?:\.(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\]){0,3})*$ or one, although did have give on 'no-look-arounds' idea:
^(?!\.)(?:(?:^|\.)(?<member>[a-za-z_][a-za-z0-9_]*)(?:\[(?<index>[0-9]+)\]){0,3})*$
Comments
Post a Comment