如何在Raku语法中使用:global进行匹配?
我正在尝试编写一个 Raku 语法,它可以解析要求编程难题的命令。
这是仅针对我的问题的简化版本,但这些命令将难度级别与可选的语言列表结合在一起。
示例有效输入:
- 无语言:
easy - 一种语言:
hard javascript - 多种语言:
medium javascript python raku
我可以让它匹配一种语言,但不能匹配多种语言。我不确定在哪里添加:g.
这是我到目前为止所拥有的一个例子:
grammar Command {
rule TOP { <difficulty> <languages>? }
token difficulty { 'easy' | 'medium' | 'hard' }
rule languages { <language>+ }
token language { w+ }
}
multi sub MAIN(Bool :$test) {
use Test;
plan 5;
# These first 3 pass.
ok Command.parse('hard', :token<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no', :token<difficulty>), '<difficulty> should not parse random words';
# Why does this parse <languages>, but <language> fails below?
ok Command.parse('js', :rule<languages>), '<languages> can parse a language';
# These last 2 fail.
ok Command.parse('js', :token<language>), '<language> can parse a language';
# Why does this not match both words? Can I use :g somewhere?
ok Command.parse('js python', :rule<languages>), '<languages> can parse multiple languages';
}
这有效,即使我的测试 #4 失败了:
my token wrd { w+ }
'js' ~~ &wrd; #=> ?js?
使用此语法提取多种语言可以使用正则表达式,但我不确定如何在语法中使用它:
'js python' ~~ m:g/ w+ /; #=> (?js? ?python?)
另外,是否有一种理想的方法可以使顺序变得不重要,以便它difficulty可以出现在字符串中的任何位置?例子:
rule TOP { <languages>* <difficulty> <languages>? }
理想情况下,我希望将不是 a 的任何内容difficulty读作language. 示例:raku python medium js应读medium作 a difficulty,其余读作languages。
回答
这里有两个问题。
要指定一个语法解析子规则,指定的参数是永远 :rule,无论在文法这是一个rule,token,method,或regex。您的前两个测试通过了,因为它们代表有效的完整语法解析(即TOP),因为:token命名参数被忽略,因为它是未知的。
这让我们:
ok Command.parse('hard', :rule<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no', :rule<difficulty>), '<difficulty> should not parse random words';
ok Command.parse('js', :rule<languages> ), '<languages> can parse a language';
ok Command.parse('js', :rule<language> ), '<language> can parse a language';
ok Command.parse('js python', :rule<languages> ), '<languages> can parse multiple languages';
# Output
ok 1 - <difficulty> can parse a difficulty
ok 2 - <difficulty> should not parse random words
ok 3 - <languages> can parse a language
ok 4 - <language> can parse a language
not ok 5 - <languages> can parse multiple languages
第二个问题是如何在rule. 在 a 中token,以下内容是等效的:
token foo { <alpha>+ }
token bar { <alpha> + }
但是在 a 中rule,它们会有所不同。比较以下规则的令牌等效项:
rule foo { <alpha>+ }
token foo { <alpha>+ <.ws> }
rule bar { <alpha> + }
token bar { [<alpha> <.ws>] + }
在您的情况下,您有<language>+,并且由于language是w+,因此不可能匹配两个(因为第一个将消耗所有w)。简单的解决方案,只需更改<language>+为<language> +.
为了让<difficulty>令牌浮动,我想到的第一个解决方案是匹配它并在<language>令牌中保释:
token language { <!difficulty> w+ }
<!foo>如果在那个位置,它可以匹配,就会失败<foo>。这将几乎完美地工作,直到您获得像“easyFoo”这样的语言。简单的解决方法是确保难度标记始终出现在单词边界处:
token difficulty {
[
| easy
| medium
| hard
]
>>
}
其中>>断言右侧的单词边界。
- See [When is white space really important in Raku grammars?](https://stackoverflow.com/questions/48892306/when-is-white-space-really-important-in-perl6-grammars) for discussion that elaborates on both the issues that @user0721090601 explains underlie all the failures you were having. The fact that unhandled named arguments are ignored, which has practical and strategic evolutionary benefits, has the downside that it's currently done without a warning. For now, that downside is just something you need to be aware of. Aiui, Raku, Rakudo, and/or CommaIDE may provide relief in years to come.