如何仅匹配多行字符串中的YouTubeid?
如何只匹配每个 URL 中的 11 个字符的 YouTube id 而没有其他任何内容?
网址:
test_string = """https://youtu.be/uJei9-tepRE
https://youtu.be/1m7czKyDatU
https://www.youtube.com/watch?v=Disi_5W3J8I&ab_channel=AubreyHaddard
https://www.youtube.com/watch?v=Aqhtu2HEhtU&ab_channel=ElusivityRadio
https://www.youtube.com/watch?v=9n62phJQnM4
https://www.youtube.com/watch?v=ntvPhgxHfRE
https://www.youtube.com/watch?v=t9Szz0a0UYM
https://www.youtube.com/watch?v=ExMjEwymQ3A
https://www.youtube.com/watch?v=U9u5I9625b4&feature=emb_title
https://www.youtube.com/watch?v=f4eCh2N4RIk
https://www.youtube.com/watch?v=DhtX3poCNOg
https://youtu.be/WAbFfFvKtvw
https://nervousdater.bandcamp.com/track/nothing-left
https://www.youtube.com/watch?v=0UBn3ipMq_A
https://soundcloud.com/carlosvivanco-1/diminished-all-over
https://youtu.be/7XKkmQwTF_4
https://youtu.be/1G2RbPoFFOU
https://youtu.be/imMQVdshYQg
https://cigazze.bandcamp.com/track/eastwood-2
https://www.youtube.com/watch?v=33heuMT2iUs
https://www.youtube.com/watch?v=AkjyEFbsfQ4
https://www.youtube.com/watch?v=_nqEDPQR5X0&feature=emb_title
https://www.youtube.com/watch?v=NHIaWN6mkKY
https://youtu.be/pSgXSwx3yOI
https://www.youtube.com/watch?v=FVgqHZbp-pw"""
原来使用:
regex = "[=/]{1}K[_a-zA-Z0-9-]{11}"
这个正则表达式匹配“diminished-”和其他几个,但我希望它只匹配来自 YouTube 链接的 11 个字符的 YouTube id。
回答
用
(?:youtube.com/watch?v=|youtu.be/)([^&n]{11})b
见证明。
解释
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
youtube 'youtube'
--------------------------------------------------------------------------------
. '.'
--------------------------------------------------------------------------------
com/watch 'com/watch'
--------------------------------------------------------------------------------
? '?'
--------------------------------------------------------------------------------
v= 'v='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
youtu 'youtu'
--------------------------------------------------------------------------------
. '.'
--------------------------------------------------------------------------------
be 'be'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to 1:
--------------------------------------------------------------------------------
[^&n]{11} any character except: '&', 'n'
(newline) (11 times)
--------------------------------------------------------------------------------
) end of 1
--------------------------------------------------------------------------------
b the boundary between a word char (w) and
something that is not a word char
Python代码示例
matches = re.findall(r'(?:youtube.com/watch?v=|youtu.be/)([^&n]{11})b', test_string)
- this was way cleaner than my attempt `youtu(?:be)?.(?:com|be)/(?:watch?v=)?(.{11})` very nice