解析逻辑表达式的正则表达式

我正在尝试使用正则表达式来解析带括号的逻辑表达式

例如:

((weight gt 10) OR (weight lt 100)) AND (length lt 50)
((weight gt 10) OR (weight lt 100)) AND (length lt 50)

我希望它可以解析为:

Group 1: (weight gt 10) OR (weight lt 100)
Group 2: AND
Group 3: length lt 50

如果这个顺序改变:

(length lt 50) AND ((weight gt 10) OR (weight lt 100))

我希望它可以解析为:

Group 1: length lt 50
Group 2: AND
Group 3: (weight gt 10) OR (weight lt 100)

我试过的成本最高的是这个表达式:

(((?>[^()]+|(?1))*))

问题在于它仅部分起作用:

((weight gt 10) OR (weight lt 100)) AND (length lt 50)

Group 1: ((weight gt 10) OR (weight lt 100))
Group 2: (length lt 50)

(length lt 50) AND ((weight gt 10) OR (weight lt 100))

Group 1: (length lt 50)
Group 2: ((weight gt 10) OR (weight lt 100))

逻辑运算符不是作为一个组选择的。

如何解决此问题以捕获逻辑运算符 AND?

回答

使用您显示的示例,请尝试以下正则表达式,用 Python3.8 测试和编写

^(?:(((weight.*?))|((length[^)]*)))s+(AND)s+(?:(((weight.*?).*?))|((length[^)]*)))$

或通用解决方案:

^(?:(((.*?))|((w+[^)]*)))s+(S+)s+(?:(((w+.*?).*?))|((w+[^)]*)))$

这是python3的完整代码:以下结果是特定于示例的正则表达式,只是将正则表达式更改为泛型(如上所示),它也适用于泛型值。

import re
##Scenario 1st here...
var="""((weight gt 10) OR (weight lt 100)) AND (length lt 50)"""
li = re.findall(r'^(?:(((weight.*?))|((length[^)]*)))s+(AND)s+(?:(((weight.*?).*?))|((length[^)]*)))',var)
[('(weight gt 10) OR (weight lt 100)', '', 'AND', '', 'length lt 50')]

##Scenario 2nd here.
var="""(length lt 50) AND ((weight gt 10) OR (weight lt 100))
li = re.findall(r'^(?:(((weight.*?))|((length[^)]*)))s+(AND)s+(?:(((weight.*?).*?))|((length[^)]*)))',var)
[('', 'length lt 50', 'AND', '(weight gt 10) OR (weight lt 100)', '')]

##Remove null elements in 1st scenario's find command here.
[string for string in li[0] if string != ""]
['(weight gt 10) OR (weight lt 100)', 'AND', 'length lt 50']

##Remove null elements came in 2nd scenario's find command here.
[string for string in li[0] if string != ""]
['length lt 50', 'AND', '(weight gt 10) OR (weight lt 100)']

说明:为上述正则表达式添加详细说明。

^                                          ##Checking from starting of value.
(?:                                        ##Creating a non-capturing group here.
  (                                       ##Matching literal  here.
  ((weight.*?))|((length[^)]*))        ##Creating 1st capturing group to match weight till ) OR length before ) as per need.
)                                          ##Closing 1st non-capturing group here.
s+                                        ##Matching 1 or more occurrences of spaces here.
(AND)                                      ##Matching AND and keeping it in 2nd capturing group here.
s+                                        ##Matching 1 or more occurrences of spaces here.
(?:                                        ##Creating 2nd capturing group here.
  (                                       ##Matching literal  here.
  ((weight.*?).*?))|((length[^)]*))   ##Creating 3rd capturing group here which is matching either weight till ) 2nd occurrence OR length just before ) as per need.
)$                                         ##Closing 2nd non-capturing group at end of value here.


回答

你快到了。唯一缺少的一点是逻辑表达式没有包含在括号中。在ANDOR要捕获。您的正则表达式要求所有内容都位于括号中间。

此外,您所说的组似乎实际上是匹配项,其中

匹配两次:

  • 第一场比赛是 ((weight gt 10) OR (weight lt 100))
  • 第二场比赛是 (length lt 50)

您的表达式中只有两个组并且它们是相同的,因为 group1 (g1),最外面的括号,实际上是整个表达式 (g0)。

由于您的表达式匹配任何包含的逻辑,我只是​​对其进行了扩展,添加了一个封闭的可选非捕获组,该组由您提供的捕获组组成:

(?:([^()]+)((?1)))?

结合起来就变成

(((?>[^()]+|(?1))*))(?:([^()]+)((?1)))?
^----------- g1 -----^    ^-g2--^^-g3-^

(?1)仍引用第1组为在原始的表达。以下所有是比赛及其各自的组:

(weight gt 10)
^--- g1 -----^

(weight gt 10) OR (weight lt 100)
^--- g1 -----^ g2 ^--- g3 ------^

((weight gt 10) OR (weight lt 100)) AND (length lt 50)
^-------------- g1 ---------------^  g2 ^--- g3 -----^

(length lt 50) AND ((weight gt 10) OR (weight lt 100))
^--- g1 -----^  g2 ^------------- g3 ----------------^

(length lt 50) nonsense ((weight gt 10) OR (weight lt 100))
^--- g1 -----^    g2    ^-------------- g3 ---------------^

字符玻璃只排除括号,因此匹配任何废话。


你的表情崩溃了:

(            # capturing group 1
  (         # match a `(` literally
  (?>        # atomic/independent, non-capturing group (meaning no backtracking into the group)
    [^()]    # any character that is not `(` nor `)` 
    +        # one or more times
   |         # or
    (?1)     # recurse group 1.
             # ..this is like a copy of the expression of group 1 here.
             # ..which also includes this part.
             # ..so it's sort of self-recursing
  )*         # zero or more times
  )         # match a `)` literally
)

添加分解:

(?:          # non-capturing group     
  (          # capturing group 2
    [^()]    # any character that is not `(` nor `)` 
    +        # one or more times
  )
  (          # capturing group 3
    (?1)     # recurse group 1.
  )
)?           # zero or one time

regex101处的表达式。在这里,我更改了字符类[^()n]以避免换行问题。


以上是解析逻辑表达式的正则表达式的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>