如何从列表中获取前“n”个最常用的单词?
我有两个清单。每个列表都包含单词。有些词在两个列表中是通用的,有些则不是。我只想输出 20 个最常用的词,但我的代码显示了所有常用词。我想将范围限制为 20。我不允许使用 COUNTER。
def countwords(lst):
dct = {}
for word in lst:
dct[word] = dct.get(word, 0) + 1
return dct
count1 = countwords(finallist1)
count2 = countwords(finallist2)
words1 = set(count1.keys())
words2 = set(count2.keys())
common_words = words1.intersection(words2)
for i,w in enumerate (common_words,1):
print(f"{i}t{w}t{count1[w]}t{count2[w]}t{count1[w] + count2[w]}")
预期输出:
common f1 f2 sum
1 program 5 10 15
2 python 2 4 6
.
.
until 20
回答
您可以使用.most_common()的collections.Counter来实现这一目标:
>>> from collections import Counter
>>> word_list = ["one", "two", "three", "four", "two", "three", "four", "three", "four", "four"]
>>> Counter(word_list).most_common(2)
[('four', 4), ('three', 3)]
从Counter().most_common()文档:
返回“n”个最常见元素及其从最常见到最少的计数的列表。如果省略“n”或 None,most_common() 返回计数器中的所有元素。具有相等计数的元素按第一次遇到的顺序排序
这是在不导入任何模块的情况下实现相同目的的替代方法:
# Step 1: Create Counter dictionary holding frequency.
# Similar to: `collections.Counter()`
my_counter = {}
for word in word_list:
my_counter[word] = my_counter.get(word, 0) + 1
# where `my_counter` will hold:
# {'four': 4, 'three': 3, 'two': 2, 'one': 1}
#-------------
# Step 2: Get sorted list holding word & frequency in descending order.
# Similar to: `Counter.most_common()`
sorted_frequency = sorted(my_counter.iteritems(), key=lambda x: x[1], reverse=True)
# where `sorted_frequency` will hold:
# [('four', 4), ('three', 3), ('two', 2), ('one', 1)]
#-------------
# Step 3: Get top two words by slicing the ordered list from Step 2.
# Similar to: `.most_common(2)`
top_two = sorted_frequency[:2]
# where `top_two` will hold:
# [('four', 4), ('three', 3)]
请参阅上面代码片段中的注释以获取分步说明。