为什么clang不用memmove替换这个循环

html5 • 2022年9月21日 pm2:36 • 问答

考虑这个 memcpy 类似的功能：

void copy(unsigned *restrict const dst, unsigned const *restrict const src, unsigned long n)
{
    for (unsigned long x = 0; x < n; ++x)
    {
        dst[x] = src[x];
    }
}

demo

这段代码很好地优化到了 memcpy：

copy:
        cbz     x2, .L1
        lsl     x2, x2, 2
        b       memcpy
.L1:
        ret

但是，当我删除时restrict，会clang应用循环矢量化并且不会将其替换为memmove. 这是为什么？

我尝试在启用优化报告的情况下编译它：

clang-10 main.c -c -O3 -fsave-optimization-record -S && cat ./main.opt.yaml

这就是我得到的restrict：

--- !Passed
Pass:            loop-idiom
Name:            ProcessLoopStoreOfLoopLoad
DebugLoc:        { File: main.c, Line: 4, Column: 12 }
Function:        copy
Args:
  - String:          'Formed a call to '
  - NewFunction:     llvm.memcpy.p0i8.p0i8.i64
  - String:          '() function'
...

并且没有restrict：

--- !Passed
Pass:            loop-vectorize
Name:            Vectorized
DebugLoc:        { File: main.c, Line: 3, Column: 3 }
Function:        copy
Args:
  - String:          'vectorized loop (vectorization width: '
  - VectorizationFactor: '4'
  - String:          ', interleaved count: '
  - InterleaveCount: '2'
  - String:          ')'
...

优化器直接跳过循环矢量化ProcessLoopStoreOfLoopLoad，不打印任何消息。这是为什么？为什么不能用替换此代码memmove？

回答

这是关于在数组之间发生碰撞时操作的可观察效果。
例如：

1 2 3 4

如果 src 指向 1 并且 dst 指向 2 结果应该是

1 1 1 1

另一方面，Memmove 会在重叠的情况下执行以下操作：

内存区域可能会重叠：复制发生时，就好像 src 中的字节首先被复制到不与 src 或 dest 重叠的临时数组中，然后将字节从临时数组复制到 dest。

即这种复制的结果将是：

1 1 2 3

与原始代码有何显着不同。

另外，如果你写这个 memmove 类似的代码：

#include <stdlib.h>

void copy(unsigned *const dst, unsigned const *const src, unsigned long n)
{
    unsigned *tmp = malloc(n * sizeof(*tmp));
    for (unsigned long x = 0; x < n; ++x)
    {
        tmp[x] = src[x];
    }

    for (unsigned long x = 0; x < n; ++x)
    {
        dst[x] = tmp[x];
    }

    free(tmp);
}

clang很好地将其替换为memmove：

copy:                                   # @copy
        testq   %rdx, %rdx
        je      .LBB0_2
        pushq   %rax
        shlq    $2, %rdx
        callq   memmove@PLT
        addq    $8, %rsp
.LBB0_2:
        retq

demo

以上是为什么clang不用memmove替换这个循环的全部内容。

THE END

二维码

O(mn)比O((m+n)^2)好吗？

< <上一篇

为什么使用Maybe时catch不能正确调用处理程序？

下一篇>>

搜索内容

为什么clang不用memmove替换这个循环

回答

目录

目录

推荐文章

最新文章