提取段落中与列表中单词相似的单词

html5 • 2022年11月28日 am9:26 • 问答

我有以下字符串：

"The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister"

要提取的单词列表：

["town","teddy","chicken","boy went"]

注意：town 和 teddy 在给定的句子中拼写错误。

我尝试了以下方法，但我得到了不属于答案的其他词：

import difflib

sent = "The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister"

list1 = ["town","teddy","chicken","boy went"]

[difflib.get_close_matches(x.lower().strip(), sent.split()) for x in list1 ]

我得到以下结果：

[['twn', 'to'], ['tddy'], ['chicken.', 'picked'], ['went']]

代替：

'twn', 'tddy', 'chicken','boy went'

回答

文档中的注意事项difflib.get_closest_matches()：

difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6)

返回最佳“足够好”匹配的列表。word是一个需要紧密匹配的序列（通常是一个字符串），并且
possibilities是一个要与之匹配的序列列表word
（通常是一个字符串列表）。

可选参数n（默认3）是要返回的最大匹配数；n必须大于0。

可选参数cutoff（默认0.6）是范围内的浮点数[0, 1]。不得分至少与单词相似的可能性将被忽略。

目前，您正在使用默认值n和cutoff参数。

您可以指定其中一个（或两者），以缩小返回的匹配范围。

例如，您可以使用cutoff0.75的分数：

result = [difflib.get_close_matches(x.lower().strip(), sent.split(), cutoff=0.75) for x in list1]

或者，您可以指定最多只返回 1 个匹配项：

result = [difflib.get_close_matches(x.lower().strip(), sent.split(), n=1) for x in list1]

在任何一种情况下，您都可以使用列表理解来展平列表列表（因为difflib.get_close_matches()总是返回一个列表）：

matches = [r[0] for r in result]

由于您还想检查二元组的接近匹配，您可以通过提取相邻“单词”的配对来实现，并将它们difflib.get_close_matches()作为possibilities参数的一部分传递给。

这是一个完整的工作示例：

import difflib
import re

sent = "The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister"

list1 = ["town", "teddy", "chicken", "boy went"]

# this extracts overlapping pairings of "words"
# i.e. ['The boy', 'boy went', 'went to', 'to twn', ...
pairs = re.findall(r'(?=(b[^ ]+ [^ ]+b))', sent)

# we pass the sent.split() list as before
# and concatenate the new pairs list to the end of it also
result = [difflib.get_close_matches(x.lower().strip(), sent.split() + pairs, n=1) for x in list1]

matches = [r[0] for r in result]

print(matches)
# ['twn', 'tddy', 'chicken.', 'boy went']

以上是提取段落中与列表中单词相似的单词的全部内容。

THE END

二维码

for/else语句中是否有等效的“elif”

< <上一篇

带有TypeScript的Nuxt.js：类型“y”上不存在属性“x”

下一篇>>

搜索内容

提取段落中与列表中单词相似的单词

回答

目录

目录

推荐文章

最新文章