如何从列表中获取前“n”个最常用的单词？

html5 • 2022年11月5日 pm4:04 • 问答

我有两个清单。每个列表都包含单词。有些词在两个列表中是通用的，有些则不是。我只想输出 20 个最常用的词，但我的代码显示了所有常用词。我想将范围限制为 20。我不允许使用 COUNTER。

def countwords(lst):
    dct = {}
    for word in lst:
        dct[word] = dct.get(word, 0) + 1
    return dct


count1 = countwords(finallist1)
count2 = countwords(finallist2)

words1 = set(count1.keys())
words2 = set(count2.keys())

common_words = words1.intersection(words2)
for i,w in enumerate (common_words,1):
    print(f"{i}t{w}t{count1[w]}t{count2[w]}t{count1[w] + count2[w]}")

预期输出：

common   f1 f2 sum 
1 program 5 10 15 
2 python  2  4  6 
.
.
until 20

回答

您可以使用.most_common()的collections.Counter来实现这一目标：

>>> from collections import Counter
>>> word_list = ["one", "two", "three", "four", "two", "three", "four", "three", "four", "four"]

>>> Counter(word_list).most_common(2)
[('four', 4), ('three', 3)]

从Counter().most_common()文档：

返回“n”个最常见元素及其从最常见到最少的计数的列表。如果省略“n”或 None，most_common() 返回计数器中的所有元素。具有相等计数的元素按第一次遇到的顺序排序

这是在不导入任何模块的情况下实现相同目的的替代方法：

# Step 1: Create Counter dictionary holding frequency. # Similar to: `collections.Counter()` my_counter = {} for word in word_list: my_counter[word] = my_counter.get(word, 0) + 1 # where `my_counter` will hold: # {'four': 4, 'three': 3, 'two': 2, 'one': 1} #------------- # Step 2: Get sorted list holding word & frequency in descending order. # Similar to: `Counter.most_common()` sorted_frequency = sorted(my_counter.iteritems(), key=lambda x: x[1], reverse=True) # where `sorted_frequency` will hold: # [('four', 4), ('three', 3), ('two', 2), ('one', 1)] #------------- # Step 3: Get top two words by slicing the ordered list from Step 2. # Similar to: `.most_common(2)` top_two = sorted_frequency[:2] # where `top_two` will hold: # [('four', 4), ('three', 3)]

请参阅上面代码片段中的注释以获取分步说明。

以上是如何从列表中获取前“n”个最常用的单词？的全部内容。

THE END

分享

二维码



StrapiCMS，Heroku错误：主机没有pg_hba.conf条目

< <上一篇

Map<String,Set<Pathway>>将字符串映射到一组Pathways属性的转换

下一篇>>

搜索内容

如何从列表中获取前“n”个最常用的单词？

回答

目录

目录

推荐文章

最新文章