如何最好地模拟_mm_slli_si128(128位位移)的逻辑含义,而不是_mm_bslli_si128
查看英特尔内在指南,我看到了这个指令。查看命名模式,含义应该很清楚:“将 128 位寄存器左移固定位数”,但事实并非如此。实际上,它移动了固定数量的字节,这使得它与_mm_bslli_si128.
- 这是疏忽吗?它不应该像
_mm_slli_epi32or那样按位移动_mm_slli_epi64吗? - 如果不是,我应该在哪种情况下使用它
_mm_bslli_si128? - 是否有正确执行此操作的汇编指令?
- 用较小的班次模拟这一点的最佳方法是什么?
回答
1 这不是疏忽。该指令确实按字节移动,即 8 位的倍数。
2 无关紧要,_mm_slli_si128并且_mm_bslli_si128是等价的,两者都编译为pslldqSSE2 指令。
至于仿真,假设您有 C++/17,我会这样做。如果您正在编写 C++/14,请替换if constexpr为 normal if,同时在static_assert.
template<int i>
inline __m128i shiftLeftBits( __m128i vec )
{
static_assert( i >= 0 && i < 128 );
// Handle couple trivial cases
if constexpr( 0 == i )
return vec;
if constexpr( 0 == ( i % 8 ) )
return _mm_slli_si128( vec, i / 8 );
if constexpr( i > 64 )
{
// Shifting by more than 8 bytes, the lowest half will be all zeros
vec = _mm_slli_si128( vec, 8 );
return _mm_slli_epi64( vec, i - 64 );
}
else
{
// Shifting by less than 8 bytes.
// Need to propagate a few bits across 64-bit lanes.
__m128i low = _mm_slli_si128( vec, 8 );
__m128i high = _mm_slli_epi64( vec, i );
low = _mm_srli_epi64( low, 64 - i );
return _mm_or_si128( low, high );
}
}
- I'd recommend `_mm_bslli_si128` - the newer name more clearly implies that it's a byte shift. Note that you don't really need `if constexpr`. That does maybe help your compiler make more efficient code in debug mode, if your compiler doesn't remove `if(false)` blocks in that case (*cough* MSVC), but that's all. With `i` as a template parameter, it's definitely a compile-time constant and even MSVC does dead-code removal with optimization enabled.
THE END
二维码