Pandas无法打开Excel(.xlsx)文件
请看我下面的代码:
import pandas
df = pandas.read_excel('cat.xlsx')
运行后,它给了我以下错误:
Traceback (most recent call last):
File "d:OneDrive??practice.py", line 4, in <module>
df = pandas.read_excel('cat.xlsx')
File "D:pythonlibsite-packagespandasutil_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "D:pythonlibsite-packagespandasioexcel_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "D:pythonlibsite-packagespandasioexcel_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "D:pythonlibsite-packagespandasioexcel_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "D:pythonlibsite-packagespandasioexcel_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "D:pythonlibsite-packagespandasioexcel_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "D:pythonlibsite-packagesxlrd__init__.py", line 170, in open_workbook
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported
我尝试使用 pip 命令卸载并重新安装 Pandas。错误仍然存在。我安装了 xlrd 2.0.1 和 Pandas 1.1.5。
回答
如发布电子邮件中所述,从发布推文链接到并在文档首页出现的大橙色警告中指出,橙色警告较少但仍存在于repo的自述文件和pypi 上的发布中:
xlrd 已明确删除对 xls 文件以外的任何内容的支持。
这是由于与使用xlrd1.2 或更早版本读取.xlsx文件相关的潜在安全漏洞。
在您的情况下,解决方案是:
- 确保您使用的是最新版本的 Pandas,至少是 1.0.1,最好是最新版本。
- 安装
openpyxl:https : //openpyxl.readthedocs.io/en/stable/ - 将您的熊猫代码更改为:
pandas.read_excel('cat.xlsx', engine='openpyxl')
编辑:目前,pandas >= 1.2 解决了这个问题。(发行说明)
-
This is a prime example on how versions should *not* be released. This change breaks a lot of existing code, and I don't recall ever seeing a deprecation warning with xlrd reading xlsx files.
If I were you, I would quickly release a new version that does support xlsx files, and gives a very clear deprecation warning saying that the *next* version will not support it. This will give people time to, you know, fix their systems before they break.
People don't usually read package readmes or listen for tweets on packages that read excel files. This is not the proper way of notifying them.
- @ChrisWithers sorry and thanks for all your hard work. Please understand that your library was mainly used as a dependency, and we don't go scouring the pages of every dependency, that's why the visibilty of your messages were low. We can simply pass `engine="openpyxl"` to `pd.ExcelFile`, but your library parses excel files in different ways and code was written to take that into account which means it is not trivial to fix. The warnings should have been in decorators in code. I wish I knew, but I did not. Again thanks for your work. Would support.
- There was a deprecation warning on the whole library for over a year, and an announcement about this four years ago. Reading documentation and mailing list announcements is important for just this type of issue. The changes needed here are trivial, especially in light on the potential security vulnerabilities.
回答
最新版本的 xlrd (2.0.1) 仅支持 .xls 文件。
如果您准备冒潜在的安全漏洞,并冒着错误解析某些文件的风险,则可以通过安装旧版本的 xlrd 来解决此错误。
在 shell 或 cmd 提示符下使用以下命令:
pip install xlrd==1.2.0
- How very incredibly useful to have an excel module that doesn't support excel files.
- Just to be clear, as the author of this package, I can safely state that this is an incredibly dangerous suggestion. The reason xlsx support was removed is because it had potential security vulnerabilities and no-one was maintaining it. If you choose this approach, rather than the trivial switch to openpyxl, you are risking exposure to these.
- @painoman102: perhaps you could read the README, or the release notes of the package, or the release email, to see why?
- @ChrisWithers Unfortunately, openpyxl does not appear to work at all with the excel files I am working with. It is completely unable to parse and always returns an empty dataframe.