如何仅匹配多行字符串中的YouTubeid?

如何只匹配每个 URL 中的 11 个字符的 YouTube id 而没有其他任何内容?

网址:

test_string = """https://youtu.be/uJei9-tepRE
https://youtu.be/1m7czKyDatU
https://www.youtube.com/watch?v=Disi_5W3J8I&ab_channel=AubreyHaddard
https://www.youtube.com/watch?v=Aqhtu2HEhtU&ab_channel=ElusivityRadio
https://www.youtube.com/watch?v=9n62phJQnM4
https://www.youtube.com/watch?v=ntvPhgxHfRE
https://www.youtube.com/watch?v=t9Szz0a0UYM
https://www.youtube.com/watch?v=ExMjEwymQ3A
https://www.youtube.com/watch?v=U9u5I9625b4&feature=emb_title
https://www.youtube.com/watch?v=f4eCh2N4RIk
https://www.youtube.com/watch?v=DhtX3poCNOg
https://youtu.be/WAbFfFvKtvw
https://nervousdater.bandcamp.com/track/nothing-left
https://www.youtube.com/watch?v=0UBn3ipMq_A
https://soundcloud.com/carlosvivanco-1/diminished-all-over
https://youtu.be/7XKkmQwTF_4
https://youtu.be/1G2RbPoFFOU
https://youtu.be/imMQVdshYQg
https://cigazze.bandcamp.com/track/eastwood-2
https://www.youtube.com/watch?v=33heuMT2iUs
https://www.youtube.com/watch?v=AkjyEFbsfQ4
https://www.youtube.com/watch?v=_nqEDPQR5X0&feature=emb_title
https://www.youtube.com/watch?v=NHIaWN6mkKY
https://youtu.be/pSgXSwx3yOI
https://www.youtube.com/watch?v=FVgqHZbp-pw"""

原来使用:

regex = "[=/]{1}K[_a-zA-Z0-9-]{11}"

这个正则表达式匹配“diminished-”和其他几个,但我希望它只匹配来自 YouTube 链接的 11 个字符的 YouTube id。

回答

(?:youtube.com/watch?v=|youtu.be/)([^&n]{11})b

见证明。

解释

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    youtube                  'youtube'
--------------------------------------------------------------------------------
    .                       '.'
--------------------------------------------------------------------------------
    com/watch                'com/watch'
--------------------------------------------------------------------------------
    ?                       '?'
--------------------------------------------------------------------------------
    v=                       'v='
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    youtu                    'youtu'
--------------------------------------------------------------------------------
    .                       '.'
--------------------------------------------------------------------------------
    be                       'be'
--------------------------------------------------------------------------------
    /                       '/'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to 1:
--------------------------------------------------------------------------------
    [^&n]{11}               any character except: '&', 'n'
                             (newline) (11 times)
--------------------------------------------------------------------------------
  )                        end of 1
--------------------------------------------------------------------------------
  b                       the boundary between a word char (w) and
                           something that is not a word char

Python代码示例

matches = re.findall(r'(?:youtube.com/watch?v=|youtu.be/)([^&n]{11})b', test_string)

  • this was way cleaner than my attempt `youtu(?:be)?.(?:com|be)/(?:watch?v=)?(.{11})` very nice

以上是如何仅匹配多行字符串中的YouTubeid?的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>