如何列出目录的所有文件?
如何在Python中列出目录的所有文件并将其添加到list
?
回答
os.listdir()
将为您提供目录中的所有内容 - 文件和目录.
如果您只想要文件,可以使用以下方法对其进行过滤os.path
:
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
或者您可以使用os.walk()
哪个会为它访问的每个目录生成两个列表 - 为您分割成文件和目录.如果你只想要顶级目录,你可以在它第一次产生时中断
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk(mypath):
f.extend(filenames)
break
最后,正如该示例所示,将一个列表添加到另一个列表,您可以使用os.listdir()
或
就个人而言,我更喜欢 os.path
- `f.extend(filenames)`实际上并不等同于`f = f + filenames`.`extend`将在原地修改`f`,而添加在新的内存位置创建一个新列表.这意味着`extend`通常比`+`更有效,但如果多个对象持有对列表的引用,它有时会导致混淆.最后,值得注意的是`f + = filenames`相当于`f.extend(filenames)`,_not_`f = f + filenames`.
- 更简单一点:`(_,_,filenames)= walk(mypath).next()`(如果你确信walk会返回至少一个值,它应该.)
- @misterbee,你的解决方案是最好的,只是一个小改进:``_,_,filenames = next(walk(mypath),(None,None,[]))``
- 在python 3.x中使用```(_,_,filenames)= next(os.walk(mypath))```
- 对存储完整路径的轻微修改:对于os.walk(mypath)中的(dirpath,dirnames,filenames):checksum_files.extend(文件名中的文件名的os.path.join(dirpath,filename))break
- 有没有办法使它包含每个文件的完整路径?
- ```f += filenames``` is equivalent to extend and not the other way around??? Jeez.
我更喜欢使用glob
模块,因为它模式匹配和扩展.
import glob
print(glob.glob("/home/adam/*.txt"))
它将返回包含查询文件的列表:
['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]
- 澄清一下,这确实没有回归"完整的道路"; 它只是返回glob的扩展,无论它是什么.例如,给定`/ home/user/foo/bar/hello.txt`,然后,如果在目录`foo`中运行,`glob("bar/*.txt")`将返回`bar/hello.txt` .有些情况下你确实想要完整的(即绝对的)路径; 对于这些情况,请参阅http://stackoverflow.com/questions/51520/how-to-get-an-absolute-file-path-in-python
- 这是listdir + fnmatch的快捷方式http://docs.python.org/library/fnmatch.html#fnmatch.fnmatch
- 没有回答这个问题。`glob.glob(“ *”)`将会。
import os
os.listdir("somedirectory")
将返回"somedirectory"中所有文件和目录的列表.
- @Jixiang:`os.listdir()`总是返回_mere文件名_(不是相对路径).`glob.glob()`返回的内容由输入模式的路径格式驱动.
- 与"glob.glob"返回的完整路径相比,这将返回文件的相对路径
获取Python 2和3的文件列表
我也在这里做了一个简短的视频: Python:如何获取目录中的文件列表
os.listdir()
或者.....如何获取当前目录中的所有文件(和目录)(Python 3)
在Python 3中将文件放在当前目录中的最简单方法是这样.这很简单; 使用os.listdir()
模块和os
函数,你将在该目录中有文件(和目录中的最终文件夹,但你不会在子目录中有文件,因为你可以使用walk - 我将在稍后讨论它).
import os
arr = os.listdir()
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
import os
arr = os.listdir()
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
使用glob
我发现glob更容易选择相同类型的文件或共同的东西.请看以下示例:
import glob
txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
使用列表理解
import glob
mylist = [f for f in glob.glob("*.txt")]
使用os.path.abspath获取完整路径名
如您所知,您在上面的代码中没有该文件的完整路径.如果您需要具有绝对路径,则可以使用所listdir()
调用模块的另一个函数glob
,将您获得的文件glob
作为参数.还有其他方法可以获得完整路径,我们稍后会检查(我更换了,如mexmex所建议的那样,_getfullpathname with glob
).
import glob
def filebrowser():
return [f for f in glob.glob("*")]
x = filebrowser()
print(x)
>>> ['example.txt', 'fb.py', 'filebrowser.py', 'help']
获取所有子目录中的文件类型的完整路径名 glob
我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得名字的文件:
import glob
def filebrowser(word=""):
"""Returns a list with all files with the word/extension in it"""
file = []
for f in glob.glob("*"):
if word in f:
file.append(f)
return file
flist = filebrowser("example")
print(flist)
flist = filebrowser(".py")
print(flist)
>>> ['example.txt']
>>> ['fb.py', 'filebrowser.py']
os.listdir():获取当前目录中的文件(Python 2)
在Python 2中,如果您想要当前目录中的文件列表,则必须将参数设置为".".或os.listdir方法中的os.getcwd().
import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)
>>> ['F:\documentiapplications.txt', 'F:\documenticollections.txt']
进入目录树
import os
# Getting the current work directory (cwd)
thisdir = os.getcwd()
# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if ".docx" in file:
print(os.path.join(r, file))
获取文件:特定目录中的os.listdir()(Python 2和3)
import os
arr = os.listdir('.')
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
使用os.listdir()获取特定子目录的文件
# Method 1
x = os.listdir('..')
# Method 2
x= os.listdir('/')
os.walk('.') - 当前目录
import os
arr = os.listdir('F:\python')
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
glob模块 - 所有文件
import os
x = os.listdir("./content")
next(os.walk('.'))和os.path.join('dir','file')
import os
arr = next(os.walk('.'))[2]
print(arr)
>>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']
next(os.walk('F:') - 获取完整路径 - 列表理解
import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))
for f in arr:
print(files)
>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
os.walk - 获取完整路径 - 子目录中的所有文件
[os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]
>>> ['F:\_python\dict_class.py', 'F:\_python\programmi.txt']
os.listdir() - 只获取txt文件
x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)
>>> ['F:\_python\dict.py', 'F:\_python\progr.txt', 'F:\_python\readl.py']
glob - 只获取txt文件
arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)
>>> ['work.txt', '3ebooks.txt']
使用glob来获取文件的完整路径
如果我需要文件的绝对路径:
from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)
>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:bootstrap_jquery_ecc.txt
其他使用glob
如果我想要目录中的所有文件:
import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)
>>> ['a simple game.py', 'data.txt', 'decorator.py']
使用os.path.isfile来避免列表中的目录
import pathlib
flist = []
for p in pathlib.Path('.').iterdir():
if p.is_file():
print(p)
flist.append(p)
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
使用pathlib(Python 3.4)
flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]
如果你想使用列表理解
import pathlib
py = pathlib.Path().glob("*.py")
for file in py:
print(file)
>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
*您也可以使用pathlib.Path()而不是pathlib.Path(".")
在pathlib.Path()中使用glob方法
import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)
>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']
输出:
import os
x = next(os.walk('F://python'))[2]
print(x)
>>> ['calculator.bat','calculator.py']
使用os.walk获取所有和唯一的文件
import os
next(os.walk('F://python'))[1] # for the current dir use ('.')
>>> ['python3','others']
只获取带有next的文件并进入目录
for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)
>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
只获取下一个目录并进入目录
import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)
>>> ['calculator.bat','calculator.py']
# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.
import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)
>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
获取所有子目录名称 os.path.abspath
import os
def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"
print(count("F:\python"))
>>> 'F:\python' : 12057 files'
来自Python 3.5的os.scandir()
import os
import shutil
from path import path
destination = "F:\file_copied"
# os.makedirs(destination)
def copyfile(dir, filetype='pptx', counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print('-' * 30)
print("t==> Found in: `" + dir + "` : " + str(counter) + " filesn")
for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == '_':
# copyfile(dir, filetype='pdf')
copyfile(dir, filetype='txt')
>>> _compiti18Compito Contabilità 1conti.txt
>>> _compiti18Compito Contabilità 1modula4.txt
>>> _compiti18Compito Contabilità 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
防爆.1:子目录中有多少个文件?
在此示例中,我们查找包含在所有目录及其子目录中的文件数.
import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "n"
file.write(mylist)
例2:如何将目录中的所有文件复制到另一个目录?
一个脚本,用于在计算机中查找所有类型的文件(默认值:pptx)并将其复制到新文件夹中.
"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""
import os
# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "n")
with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "n")
os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
防爆.3:如何获取txt文件中的所有文件
如果您要创建包含所有文件名的txt文件:
import os
with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}n")
示例:txt包含硬盘驱动器的所有文件
import os
def searchfiles(extension='.ttf', folder='H:\'):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}n")
# looking for png file (fonts) in the hard disk H:
searchfiles('.png', 'H:\')
>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
C:\的所有文件都在一个文本文件中
这是以前代码的较短版本.如果需要从其他位置开始,请更改文件夹从哪里开始查找文件.此代码在我的计算机上生成一个50 MB的文本文件,其中包含少于500.000行,文件包含完整路径.
import tkinter as tk
import os
def searchfiles(extension='.txt', folder='H:\'):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)
def open_file():
os.startfile(lb.get(lb.curselection()[0]))
root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
搜索特定类型文件的功能
- This is a mish-mash of too many answers to questions not asked here. It may also be worth explaining what the caveats or recommended approaches are. I'm no better off knowing one way versus 20 ways to do the same thing unless I also know which is more appropriate to use when.
- Such compilations can be helpful, but this answer in particular adds no value to the existing answers. Just to give an example, `[f for f in glob.glob("*.txt")]` is equivalent to `glob.glob("*.txt")` and warrants no extra section in this write up. It is also very wordy and with lots of spacing. An improvement could be made by adding explanations or pointing out differences instead of listing yet another variant.
- Ok, ASAP I will take a look at my answer and try to make it more clean and with more useful informations about the difference among the methods etc.
只获取文件列表(无子目录)的单行解决方案:
filenames = next(os.walk(path))[2]
或绝对路径名:
paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]
- 如果你已经"导入os",那么只有一个单行.对我来说,似乎不像`glob()`简洁.
- glob的问题是glob('/ home/adam /*.*')会返回一个名为'something.something'的文件夹
- 在OS X上,有一种叫做bundle的东西.这是一个目录,通常应该被视为一个文件(如.tar).你想要那些被视为文件或目录的人吗?使用`glob()`会将其视为文件.您的方法会将其视为目录.
从目录及其所有子目录获取完整文件路径
import os
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
- 我在上面的函数中提供的路径包含3个文件 - 其中两个位于根目录中,另一个位于名为"SUBFOLDER"的子文件夹中.您现在可以执行以下操作:
-
['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']
如果您愿意,可以打开并阅读内容,或只关注扩展名为".dat"的文件,如下面的代码所示:
for f in full_file_paths:
if f.endswith(".dat"):
print f
/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat
从版本3.4开始,有内置的迭代器,它比os.listdir()
以下更有效:
pathlib
:版本3.4中的新功能.
>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]
根据PEP 428,pathlib
库的目的是提供一个简单的类层次结构来处理文件系统路径以及用户对它们执行的常见操作.
os.scandir()
:3.5版中的新功能.
>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]
请注意,os.walk()
使用os.scandir()
而不是os.listdir()
版本3.5,根据PEP 471,其速度提高了2-20倍.
我还建议您阅读下面的ShadowRanger评论.
- 注意:`os.scandir`解决方案比使用`os.path.is_file`检查等的`os.listdir`更有效,即使你需要`list`(所以你没有受益来自lazy迭代),因为`os.scandir`使用OS提供的API,在迭代时免费提供`is_file`信息,没有按文件往返磁盘到`stat`它们(在Windows上, `DirEntry`s让你免费完成`stat`信息,在*NIX系统上它需要`stat`以获取超出`is_file`,`is_dir`等的信息,但`DirEntry`缓存在第一个'stat`以方便).
初步说明
- 虽然问题文本中的文件和目录术语之间存在明显区别,但有些人可能认为目录实际上是特殊文件
- 声明:" 目录的所有文件 "可以用两种方式解释:
- 所有直接(或1级)的后代只
- 整个目录树中的所有后代(包括子目录中的后代)
-
- 在Python 2之前的版本中,序列(iterables)主要由列表(元组,集合......)表示
- In Python 2.2, the concept of generator ([Python.Wiki]: Generators) - courtesy of [Python 3]: The yield statement) - was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
- In Python 3, generator is the default behavior
- Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python 3]: map(function, iterable, ...)
>>> import sys >>> sys.version '2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function >>> m, type(m) ([1, 2, 3], <type 'list'>) >>> len(m) 3
>>> import sys >>> sys.version '3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) >>> m, type(m) (<map object at 0x000001B4257342B0>, <class 'map'>) >>> len(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'map' has no len() >>> lm0 = list(m) # Build a list from the generator >>> lm0, type(lm0) ([1, 2, 3], <class 'list'>) >>> >>> lm1 = list(m) # Build a list from the same generator >>> lm1, type(lm1) # Empty list now - generator already consumed ([], <class 'list'>)
-
E:WorkDevStackOverflowq003207219>tree /f "root_dir" Folder PATH listing for volume Work Volume serial number is 00000029 3655:6FED E:WORKDEVSTACKOVERFLOWQ003207219ROOT_DIR ¦ file0 ¦ file1 ¦ +---dir0 ¦ +---dir00 ¦ ¦ ¦ file000 ¦ ¦ ¦ ¦ ¦ +---dir000 ¦ ¦ file0000 ¦ ¦ ¦ +---dir01 ¦ ¦ file010 ¦ ¦ file011 ¦ ¦ ¦ +---dir02 ¦ +---dir020 ¦ +---dir0200 +---dir1 ¦ file10 ¦ file11 ¦ file12 ¦ +---dir2 ¦ ¦ file20 ¦ ¦ ¦ +---dir20 ¦ file200 ¦ +---dir3
解决方案
程序化方法:
-
>>> import os >>> root_dir = "root_dir" # Path relative to current dir (os.getcwd()) >>> >>> os.listdir(root_dir) # List all the items in root_dir ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories) ['file0', 'file1']
一个更详细的例子(code_os_listdir.py):
import os from pprint import pformat def _get_dir_content(path, include_folders, recursive): entries = os.listdir(path) for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: yield entry_with_path if recursive: for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive): yield sub_entry else: yield entry_with_path def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) for item in _get_dir_content(path, include_folders, recursive): yield item if prepend_folder_name else item[path_len:] def _get_dir_content_old(path, include_folders, recursive): entries = os.listdir(path) ret = list() for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: ret.append(entry_with_path) if recursive: ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive)) else: ret.append(entry_with_path) return ret def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)] def main(): root_dir = "root_dir" ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True) lret0 = list(ret0) print(ret0, len(lret0), pformat(lret0)) ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False) print(len(ret1), pformat(ret1)) if __name__ == "__main__": main()
备注:
- 有两种实现:
- 一个使用生成器(当然这里似乎没用,因为我立即将结果转换为列表)
- 经典之一(函数名以_old结尾)
- 使用递归(进入子目录)
- 对于每个实现,有两个功能:
- 一个以下划线(_)开头的:"私有"(不应该直接调用) - 这样做可以完成所有工作
- public one(前一个包装器):它只是从返回的条目中剥离初始路径(如果需要).这是一个丑陋的实现,但这是我在这一点上可以带来的唯一想法
- 在性能方面,生成器通常要快一点(考虑创建和 迭代时间),但我没有在递归函数中测试它们,而且我在函数内部迭代内部生成器 - 不知道性能如何这是友好的
- 使用参数来获得不同的结果
输出:
(py35x64_test) E:WorkDevStackOverflowq003207219>"e:WorkDevVEnvspy35x64_testScriptspython.exe" "code_os_listdir.py" <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\dir0', 'root_dir\dir0\dir00', 'root_dir\dir0\dir00\dir000', 'root_dir\dir0\dir00\dir000\file0000', 'root_dir\dir0\dir00\file000', 'root_dir\dir0\dir01', 'root_dir\dir0\dir01\file010', 'root_dir\dir0\dir01\file011', 'root_dir\dir0\dir02', 'root_dir\dir0\dir02\dir020', 'root_dir\dir0\dir02\dir020\dir0200', 'root_dir\dir1', 'root_dir\dir1\file10', 'root_dir\dir1\file11', 'root_dir\dir1\file12', 'root_dir\dir2', 'root_dir\dir2\dir20', 'root_dir\dir2\dir20\file200', 'root_dir\dir2\file20', 'root_dir\dir3', 'root_dir\file0', 'root_dir\file1'] 11 ['dir0\dir00\dir000\file0000', 'dir0\dir00\file000', 'dir0\dir01\file010', 'dir0\dir01\file011', 'dir1\file10', 'dir1\file11', 'dir1\file12', 'dir2\dir20\file200', 'dir2\file20', 'file0', 'file1']
- 有两种实现:
-
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.
>>> import os >>> root_dir = os.path.join(".", "root_dir") # Explicitly prepending current directory >>> root_dir '.\root_dir' >>> >>> scandir_iterator = os.scandir(root_dir) >>> scandir_iterator <nt.ScandirIterator object at 0x00000268CF4BC140> >>> [item.path for item in scandir_iterator] ['.\root_dir\dir0', '.\root_dir\dir1', '.\root_dir\dir2', '.\root_dir\dir3', '.\root_dir\file0', '.\root_dir\file1'] >>> >>> [item.path for item in scandir_iterator] # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension) [] >>> >>> scandir_iterator = os.scandir(root_dir) # Reinitialize the generator >>> for item in scandir_iterator : ... if os.path.isfile(item.path): ... print(item.name) ... file0 file1
Notes:
- It's similar to
os.listdir
- But it's also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)
- It's similar to
-
>>> import os >>> root_dir = os.path.join(os.getcwd(), "root_dir") # Specify the full path >>> root_dir 'E:\Work\Dev\StackOverflow\q003207219\root_dir' >>> >>> walk_generator = os.walk(root_dir) >>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument) >>> root_dir_entry ('E:\Work\Dev\StackOverflow\q003207219\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1']) >>> >>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path ['E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0', 'E:\Work\Dev\StackOverflow\q003207219\root_dir\dir1', 'E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2', 'E:\Work\Dev\StackOverflow\q003207219\root_dir\dir3', 'E:\Work\Dev\StackOverflow\q003207219\root_dir\file0', 'E:\Work\Dev\StackOverflow\q003207219\root_dir\file1'] >>> >>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir) ... print(entry) ... ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0', ['dir00', 'dir01', 'dir02'], []) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir00', ['dir000'], ['file000']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir00\dir000', [], ['file0000']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir01', [], ['file010', 'file011']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02', ['dir020'], []) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02\dir020', ['dir0200'], []) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02\dir020\dir0200', [], []) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir1', [], ['file10', 'file11', 'file12']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2', ['dir20'], ['file20']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2\dir20', [], ['file200']) ('E:\Work\Dev\StackOverflow\q003207219\root_dir\dir3', [], [])
Notes:
- Under the scenes, it uses
os.scandir
(os.listdir
on older versions) - It does the heavy lifting by recurring in subfolders
- Under the scenes, it uses
-
>>> import glob, os >>> wildcard_pattern = "*" >>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name >>> root_dir 'root_dir\*' >>> >>> glob_list = glob.glob(root_dir) >>> glob_list ['root_dir\dir0', 'root_dir\dir1', 'root_dir\dir2', 'root_dir\dir3', 'root_dir\file0', 'root_dir\file1'] >>> >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list] # Strip the dir name and the path separator from begining ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> for entry in glob.iglob(root_dir + "*", recursive=True): ... print(entry) ... root_dir root_dirdir0 root_dirdir0dir00 root_dirdir0dir00dir000 root_dirdir0dir00dir000file0000 root_dirdir0dir00file000 root_dirdir0dir01 root_dirdir0dir01file010 root_dirdir0dir01file011 root_dirdir0dir02 root_dirdir0dir02dir020 root_dirdir0dir02dir020dir0200 root_dirdir1 root_dirdir1file10 root_dirdir1file11 root_dirdir1file12 root_dirdir2 root_dirdir2dir20 root_dirdir2dir20file200 root_dirdir2file20 root_dirdir3 root_dirfile0 root_dirfile1
Notes:
- Uses
os.listdir
- For large trees (especially if recursive is on), iglob is preferred
- Allows advanced filtering based on name (due to the wildcard)
- Uses
-
>>> import pathlib >>> root_dir = "root_dir" >>> root_dir_instance = pathlib.Path(root_dir) >>> root_dir_instance WindowsPath('root_dir') >>> root_dir_instance.name 'root_dir' >>> root_dir_instance.is_dir() True >>> >>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only ['root_dir\file0', 'root_dir\file1']
Notes:
- This is one way of achieving our goal
- It's the OOP style of handling paths
- Offers lots of functionalities
-
- But, according to [GitHub]: python/cpython - (2.7) cpython/Lib/dircache.py, it's just a (thin) wrapper over
os.listdir
with caching
def listdir(path): """List directory contents, using cache.""" try: cached_mtime, list = cache[path] del cache[path] except KeyError: cached_mtime, list = -1, [] mtime = os.stat(path).st_mtime if mtime != cached_mtime: list = os.listdir(path) list.sort() cache[path] = mtime, list return list
- But, according to [GitHub]: python/cpython - (2.7) cpython/Lib/dircache.py, it's just a (thin) wrapper over
-
code_ctypes.py:
#!/usr/bin/env python3 import sys from ctypes import Structure, c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, CDLL, POINTER, create_string_buffer, get_errno, set_errno, cast DT_DIR = 4 DT_REG = 8 char256 = c_char * 256 class LinuxDirent64(Structure): _fields_ = [ ("d_ino", c_ulonglong), ("d_off", c_longlong), ("d_reclen", c_ushort), ("d_type", c_ubyte), ("d_name", char256), ] LinuxDirent64Ptr = POINTER(LinuxDirent64) libc_dll = this_process = CDLL(None, use_errno=True) # ALWAYS set argtypes and restype for functions, otherwise it's UB!!! opendir = libc_dll.opendir readdir = libc_dll.readdir closedir = libc_dll.closedir def get_dir_content(path): ret = [path, list(), list()] dir_stream = opendir(create_string_buffer(path.encode())) if (dir_stream == 0): print("opendir returned NULL (errno: {:d})".format(get_errno())) return ret set_errno(0) dirent_addr = readdir(dir_stream) while dirent_addr: dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr) dirent = dirent_ptr.contents name = dirent.d_name.decode() if dirent.d_type & DT_DIR: if name not in (".", ".."): ret[1].append(name) elif dirent.d_type & DT_REG: ret[2].append(name) dirent_addr = readdir(dir_stream) if get_errno(): print("readdir returned NULL (errno: {:d})".format(get_errno())) closedir(dir_stream) return ret def main(): print("{:s} on {:s}n".format(sys.version, sys.platform)) root_dir = "root_dir" entries = get_dir_content(root_dir) print(entries) if __name__ == "__main__": main()
Notes:
- It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer) - last notes from item #4.). That would place this approach very close to the Python/C edge
- LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
- It returns data in the
os.walk
's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task - Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ
Output:
[cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609] on linux ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
-
>>> import os, win32file, win32con >>> root_dir = "root_dir" >>> wildcard = "*" >>> root_dir_wildcard = os.path.join(root_dir, wildcard) >>> entry_list = win32file.FindFilesW(root_dir_wildcard) >>> len(entry_list) # Don't display the whole content as it's too long 8 >>> [entry[-2] for entry in entry_list] # Only display the entry names ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")] # Filter entries and only display dir names (except self and parent) ['dir0', 'dir1', 'dir2', 'dir3'] >>> >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)] # Only display file "full" names ['root_dir\file0', 'root_dir\file1']
Notes:
-
win32file.FindFilesW
is part of [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions, which is a Python wrapper over WINAPIs - The documentation link is from ActiveState, as I didn't find any pywin32 official documentation
-
- Install some (other) third-party package that does the trick
- Most likely, will rely on one (or more) of the above (maybe with slight customizations)
Notes:
-
- platform (Nix, Win, )
- Python version (2, 3, )
-
- Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument:
filter_func=lambda x: True
(this doesn't strip out anything) and inside _get_dir_content something like:if not filter_func(entry_with_path): continue
(if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
- Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument:
Other approaches:
-
- Everything is done using another technology
- That technology is invoked from Python
-
- Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)
- Some consider this a neat hack
- I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn't have anything to do with Python.
- Filtering (
grep
/findstr
) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately usedos.system
instead ofsubprocess.Popen
.
(py35x64_test) E:WorkDevStackOverflowq003207219>"e:WorkDevVEnvspy35x64_testScriptspython.exe" -c "import os;os.system("dir /b root_dir")" dir0 dir1 dir2 dir3 file0 file1
In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).
我真的很喜欢adamk的回答,建议您使用glob()
同名模块.这允许您与*
s 进行模式匹配.
但正如其他人在评论中指出的那样,glob()
可能会因不一致的斜线方向而被绊倒.为了解决这个问题,我建议您使用模块中的join()
和expanduser()
函数,也可以使用os.path
模块中的getcwd()
函数os
.
例如:
from glob import glob
# Return everything under C:Usersadmin that contains a folder called wlp.
glob('C:Usersadmin*wlp')
上面的内容非常糟糕 - 路径已被硬编码,并且只能在Windows上以驱动器名称和硬编码到路径之间的方式工作.
from glob import glob
from os.path import join
# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))
上面的工作更好,但它依赖Users
于Windows上常见的文件夹名称,而在其他操作系统上则不常见.它还依赖于具有特定名称的用户admin
.
from glob import glob
from os.path import expanduser, join
# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))
这适用于所有平台.
另一个很好的例子,它可以跨平台完美运行,并且做一些不同的事情
from glob import glob
from os import getcwd
from os.path import join
# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))
希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些函数的强大功能.
- 额外的全部乐趣:从Python 3.5开始,只要你设置`recursive = True`,`**`就可以了.请参阅此处的文档:https://docs.python.org/3.5/library/glob.html#glob.glob
def list_files(path):
# returns a list of names (with extension, without full path) of all files
# in folder path
files = []
for name in os.listdir(path):
if os.path.isfile(os.path.join(path, name)):
files.append(name)
return files
如果你正在寻找一个find的Python实现,这是我经常使用的一个配方:
from findtools.find_files import (find_files, Match)
# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)
for found_file in found_files:
print found_file
所以我用它制作了一个PyPI 包,还有一个GitHub存储库.我希望有人发现它可能对此代码有用.
返回绝对文件路径列表,不会递归到子目录中
L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]
- 注意:`os.path.abspath(f)`将是一个更便宜的替代`os.path.join(os.getcwd(),f)`.
这是一个代码:
import os
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
for file in files("."):
print (file)
该listdir()
方法返回给定目录的条目列表.如果给定条目是文件,则os.path.isfile()
返回该方法True
.并且yield
运算符退出func但保持其当前状态,并且仅返回检测为文件的条目的名称.以上所有允许我们循环生成器函数.
希望这可以帮助.
import os
import os.path
def get_files(target_dir):
item_list = os.listdir(target_dir)
file_list = list()
for item in item_list:
item_dir = os.path.join(target_dir,item)
if os.path.isdir(item_dir):
file_list += get_files(item_dir)
else:
file_list.append(item_dir)
return file_list
在这里,我使用递归结构.
一位聪明的老师曾经告诉我:
因此,我将为问题的一个子集添加一个解决方案:很多时候,我们只想检查文件是否匹配开始字符串和结束字符串,而无需进入子目录。因此,我们想要一个返回文件名列表的函数,例如:
filenames = dir_filter('foo/baz', radical='radical', extension='.txt')
如果您想先声明两个函数,可以这样做:
def file_filter(filename, radical='', extension=''):
"Check if a filename matches a radical and extension"
if not filename:
return False
filename = filename.strip()
return(filename.startswith(radical) and filename.endswith(extension))
def dir_filter(dirname='', radical='', extension=''):
"Filter filenames in directory according to radical and extension"
if not dirname:
dirname = '.'
return [filename for filename in os.listdir(dirname)
if file_filter(filename, radical, extension)]
此解决方案可以使用正则表达式轻松进行一般化(pattern
如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加一个参数)。
使用发电机
import os
def get_files(search_path):
for (dirpath, _, filenames) in os.walk(search_path):
for filename in filenames:
yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
print(filename)
Python 3.4+ 的另一个非常易读的变体是使用 pathlib.Path.glob:
from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]
更具体的很简单,例如只在所有子目录中查找不是符号链接的 Python 源文件:
[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]