Perl正则表达式重复匹配

使用 '?' 时我遇到了奇怪的行为 正则表达式重复。我正在处理日志文件,在其中搜索特定的 HTTP 错误响应,例如。401. 该行可能包含但可能不包含响应正文。所以我想匹配这两种情况。我有以下代码。

#!/usr/bin/perl
$match = 'response 401';
$line = '2021-04-08 07:15:01 |  INFO | [http-nio-8080-exec-11] | rId:123456789 | ip:127.0.0.1 | activationId: abcdefg | user: admin | response 401: headers: Cache-Control: [no-cache, no-store, max-age=0, must-revalidate] / Content-Length: [60] / Content-Type: [application/json;charset=UTF-8] / Date: [Thu, 08 Apr 2021 05:15:01 GMT] / Expires: [0] / Pragma: [no-cache] | body: {"errors":[{"message":"Bad credentials","repeatable":true}]}';
    
my($tstamp, $level, $thread, $body) = $line =~ m/^(.*?)s+|s+(w+)s+|s+[(.*?)].*?$match.*?(?:body:s+({.*}))?/;
if($body) {
  print "body: $bodyn";
}

这不会打印任何内容。我希望它应该与.*?$match.*?线条的最小部分相匹配并为body模式留出足够的空间。但显然不会。当我更改正则表达式并?body模式中删除并使其成为强制行匹配时。

my($tstamp, $level, $thread, $body) = $line =~ m/^(.*?)s+|s+(w+)s+|s+[(.*?)].*?$match.*?(?:body:s+({.*}))/;

但这不会匹配没有body. 正则表达式有什么问题?我怀疑.*?前面的非贪婪模式(?:body...)?吃掉输入,因为它可以与可选主体一起使用。

如何编写正确的正则表达式?

回答

使用您显示的样本,您能否尝试以下操作。

^(d{4}-d{2}-d{2}s*(?:d{2}:){2}d{2})s+|s+(S+)s+|s+[([^]]*)].*?(body.*)?$

这是上述正则表达式的在线演示

说明:为以上添加详细说明。

^                                          ##Matching starting of value by caret sign.
(d{4}-d{2}-d{2}s*(?:d{2}:){2}d{2})   ##Creating 1st capturing group to match time stamp here.
s+|s+                                   ##Matching spaces pipe spaces(one or more occurrences).
(S+)                                      ##Creating 2nd capturing group which has everything apart from space, which will have INFO/WARN/ERROR etc here.
s+|s+[                                 ##Matching spaces pipe spaces(one or more occurrences).
([^]]*)                                    ##Creating 3rd capturing group which has everything till ] occurrence in it.
].*?                                      ##Matching ] with lazy match.
(body.*)?$                                 ##Creating 4th capturing group which will match from body to till end of line and keeping it optional at the end of the line/value.


回答

您可以使用组 4 的可选部分并断言字符串的结尾。

^(.*?)s+|s+(w+)s+|s+[([^][]*)].*?(?:s+|s+body:s+({.*}))?$
  • ^ 字符串的开始
  • (.*?)s+|捕获组 1尽可能匹配任何字符并匹配空格和|
  • s+(w+)s+|匹配空格并捕获第 2 组中的1+ 个单词字符并匹配空格和|
  • s+[([^][]*)]匹配空格和捕获所有之间[...]第3组
  • .*? 尽可能匹配任何字符
  • (?:s+|s+body:h+({.*}))?可选地|在空格之间匹配,body:{...}组 4 中捕获所有之间
  • $ 字符串结束

正则表达式演示

使用示例代码:

$match = 'response 401';
$line = '2021-04-08 07:15:01 |  INFO | [http-nio-8080-exec-11] | rId:123456789 | ip:127.0.0.1 | activationId: abcdefg | user: admin | response 401: headers: Cache-Control: [no-cache, no-store, max-age=0, must-revalidate] / Content-Length: [60] / Content-Type: [application/json;charset=UTF-8] / Date: [Thu, 08 Apr 2021 05:15:01 GMT] / Expires: [0] / Pragma: [no-cache] | body: {"errors":[{"message":"Bad credentials","repeatable":true}]}';

my($tstamp, $level, $thread, $body) = $line =~ m/^(.*?)s+|s+(w+)s+|s+[([^][]*)].*?(?:s+|s+body:s+({.*}))?$/;
if($body) {
  print "body: $bodyn";
}

输出

body: {"errors":[{"message":"Bad credentials","repeatable":true}]}

如果没有body,你仍然可以得到 的值$tstamp$level并且$thread


以上是Perl正则表达式重复匹配的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>