为什么我使用这些Raku正则表达式得到不同的回溯?
我意外地回溯+了 Raku 正则表达式的量词。
在这个正则表达式中:
'abc' ~~ m/(w+) {say $0} <?{ $0.substr(*-1) eq 'b' }>/;
say $0;
我得到了预期的结果:
?abc? # inner say
?ab? # inner say
?ab? # final say
也就是说,(贪婪)+量词获取所有字母,然后条件失败。之后它通过释放最后一个得到的字母开始回溯,直到条件评估为真。
但是,当我将量词放在捕获组之外时,回溯似乎不会以相同的方式工作:
'abc' ~~ m/[(w)]+ {say $0} <?{ $0.tail eq 'b' }>/;
say $0;
结果:
[?a? ?b? ?c?] # inner say
[?a? ?b? ?c?] # why this extra inner say? Shouldn't this backtrack to [?a? ?b?]?
[?a? ?b? ?c?] # why this extra inner say? Shouldn't this backtrack to [?a? ?b?]?
[?b? ?c?] # Since we could not successfully backtrack, We go on matching by increasing the position
[?b? ?c?] # Previous conditional fails. We get this extra inner say
[?c?] # Since we could not successfully backtrack, We go on matching by increasing the position
Nil # final say, no match because we could not find a final 'b'
这种行为是预期的吗?如果是这样:为什么它们的工作方式不同?是否可以模仿第一个正则表达式但仍将量词保留在捕获组之外?
笔记:
使用惰性量词“解决”了问题......这是预期的,因为回溯似乎会发生差异,而惰性量词不会发生这种情况。
'abc' ~~ m/[(w)]+? {say $0} <?{ $0.tail eq 'b' }>/;
[?a?]
[?a? ?b?]
[?a? ?b?]
但是出于性能原因,我宁愿使用贪婪的量词(这个问题中的例子是一个简化)。
回答
我认为问题不在于回溯。但看起来中间$0暴露保留了先前的迭代捕获。考虑这个表达式,
'abc' ~~ m/[(w)]+ {say "Match:",$/.Str,";tCapture:",$0} <?{ False }>/;
这是输出:
Match:abc; Capture:[?a? ?b? ?c?]
Match:ab; Capture:[?a? ?b? ?c?]
Match:a; Capture:[?a? ?b? ?c?]
Match:bc; Capture:[?b? ?c?]
Match:b; Capture:[?b? ?c?]
Match:c; Capture:[?c?]
如您所见,匹配顺序正确,abc ab a .... 但是ab匹配的捕获数组也是[?a? ?b? ?c?]. 我怀疑这是一个错误。
对于您的情况,有几种方法。
- 仅
$/用于条件检查'abc' ~~ m/[(w)]+ <?{ $/.Str.substr(*-1) eq 'b' }>/; - 或者,另外也用限定符捕获组。
'abc' ~~ m/([(w)]+) <?{ $0[0][*-1] eq 'b' }>/;这里
$0匹配外部组,$0[0]匹配第一个内部组,$[0][*-1]匹配本次迭代中最终匹配的字符。
- Even easier than `$/.Str.substr(*-1) eq 'b'` is `$/.ends-with: 'b'`
- I've filed an issue [With `(foo)+` the corresponding sub-captures aren't removed during backtracking](https://github.com/rakudo/rakudo/issues/4105).
- @jubilatious1, the regex given, is a simplified one for demonstrating the issue. Based on the demo output, hope you agree there is a discrepancy. And yes, the issue occurs only when we use backtracking and `$0`.