如何使用os.posix_fadvise防止Linux上的文件缓存?

我有一个通常在整个块设备上运行的脚本,如果读取的每个块都被缓存,它将驱逐其他应用程序正在使用的数据。为了防止这种情况发生,我添加了对使用mmap(2)posix_fadvise(2) 的支持,逻辑如下:

指示不再需要块的函数:

def advise_dont_need(fd, offset, length):
    """
    Announce that data in a particular location is no longer needed.

    Arguments:
    - fd (int): File descriptor.
    - offset (int): Beginning of the unneeded data.
    - length (int): Length of the unneeded data.
    """
    # TODO: macOS support
    if hasattr(os, "posix_fadvise"):
        # posix_fadvise(2) states that "If the application requires that data
        # be considered for discarding, then offset and len must be
        # page-aligned." When this code aligns the offset and length, the
        # advised area is widened under the presumption it is better to discard
        # more memory than needed than to leak it which could cause resource
        # issues.

        # If the offset is unaligned, extend it toward 0 to align it and adjust
        # the length to compensate for the change.
        aligned_offset = offset - offset % PAGE_SIZE
        length += offset - aligned_offset
        offset = aligned_offset

        # If the length is unaligned, widen it to align it.
        length -= length % -PAGE_SIZE

        os.posix_fadvise(fd, offset, length, os.POSIX_FADV_DONTNEED)

读取文件的逻辑:

            with open(path, "rb", buffering=0) as file, 
              ProgressBar("Reading file") as progress, timer() as read_loop:
                size = file_size(file)

                if mmap_file:
                    # At the time of this writing, mmap.mmap in CPython uses
                    # st_size to determine the size of a file which will not
                    # work with every file type which is why file size
                    # autodetection (size=0) cannot be used here.
                    fd = file.fileno()
                    view = mmap.mmap(fd, size, prot=mmap.PROT_READ)

                try:
                    while writer.error is None and hash_queue.error is None:
                        # Skip offsets that are already in the block map.
                        if offset in blocks:
                            while offset in blocks:
                                if mmap_file:
                                    advise_dont_need(fd, offset, block_size)

                                offset += block_size

                            if not mmap_file:
                                file.seek(offset)

                        if mmap_file:
                            block = view[offset:offset + block_size]
                            advise_dont_need(fd, offset, len(block))
                        else:
                            block = file.read(block_size)

                        if not block:
                            break

                        bytes_read += len(block)

                        while hash_queue.error is None:
                            try:
                                hash_queue.put((offset, block), timeout=0.1)
                                offset += len(block)
                                progress.update(offset / size)
                                break
                            except queue.Full:
                                pass
                finally:
                    if mmap_file:
                        view.close()

当我运行脚本并监视 的输出时free -h,尽管有这种逻辑,但我可以看到缓冲区缓存使用量增加。我的逻辑是否不正确,或者这是posix_fadvise(2)的结果——建议与授权?

以下是一些日志,显示了在 block_size 设置为 1048576 的脚本执行结束时的长度和偏移量值:

offset=107296587776; length=1048576
offset=107297636352; length=1048576
offset=107298684928; length=1048576
offset=107299733504; length=1048576
offset=107300782080; length=1048576
offset=107301830656; length=1048576
offset=107302879232; length=1048576
offset=107303927808; length=1048576
offset=107304976384; length=0

以上是如何使用os.posix_fadvise防止Linux上的文件缓存?的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>