过滤超过2.5GBJson文件的最快方法是什么？

html5 • 2022年9月13日 pm1:34 • 问答

我有 2.5 GB 的 JSON 文件，包含 25 列和大约 400 万行。我尝试使用以下脚本过滤 JSON，它至少需要 10 分钟。

import json

product_list = ['Horse','Rabit','Cow']
year_list = ['2008','2009','2010']
country_list = ['USA','GERMANY','ITALY']

with open('./products/animal_production.json', 'r', encoding='utf8') as r:
     result = r.read()
result = json.loads(result)

for item in result[:]:
    if (not str(item["Year"]) in year_list) or (not item["Name"] in product_list) or (not item["Country"] in country_list):
        result.remove(item)
print(result)

我需要在最多 1 分钟内准备结果，那么您的建议或过滤 JSON 的最快方法是什么？

回答

从列表中删除循环较慢，每个删除是O(n)和完成n时间，以便O(n^2)，附加到一个新的名单O(1)，做这个n时间是 O(n)在一个循环。所以你可以试试这个

[item for item in result if str(item["Year"] in year_list) or (item["Name"] in product_list) or (item["Country"] in country_list)]

根据您需要的条件进行过滤，并仅添加匹配的条件。

以上是过滤超过2.5GBJson文件的最快方法是什么？的全部内容。

THE END

二维码

按control+Z后有没有办法继续输入？

< <上一篇

PancakeSwapAPI/使用币安智能链的API交换BNB

下一篇>>

搜索内容

过滤超过2.5GBJson文件的最快方法是什么？

回答

目录

目录

推荐文章

最新文章