使用C++20的std::popcount和向量优化是否等同于popcnt内在?
C++20 引入了许多新函数,例如std::popcount,我使用Intel Intrinsic使用相同的功能。
我编译了这两个选项 - 可以在编译器资源管理器代码中看到:
- 使用英特尔的 AVX2 内在
- 使用 std::popcount 和 GCC 编译器标志“-mavx2”
除了 std 模板中使用的类型检查之外,生成的汇编代码看起来是相同的。
就操作系统不可知代码并具有相同的优化而言 - 假设使用std::popcount和 apt 编译器向量优化标志比直接使用内在函数更好是否正确?
谢谢。
回答
Technically No. (But practically, yes). The C++ standard only specifies the behavior of popcount, and not the implementation (Refer to [bit.count]).
Implementors are allowed to do whatever they want to achieve this behavior, including using the popcnt intrinsic, but they could also write a while loop:
int set_bits = 0;
while(x)
{
if (x & 1)
++set_bits;
x >>= 1;
}
return set_bits;
This is the entire wording in the standard at [bit.count]:
template<class T>
constexpr int popcount(T x) noexcept;
Constraints:
Tis an unsigned integer type ([basic.fundamental]).
Returns: The number of1bits in the value ofx.
Realistically? Compiler writers are very smart and will optimize this to use intrinsics as much as possible. For example, gcc's implementation appears to be fairly heavily optimized.