如何最好地模拟_mm_slli_si128(128位位移)的逻辑含义,而不是_mm_bslli_si128

查看英特尔内在指南,我看到了这个指令。查看命名模式,含义应该很清楚:“将 128 位寄存器左移固定位数”,但事实并非如此。实际上,它移动了固定数量的字节,这使得它与_mm_bslli_si128.

  • 这是疏忽吗?它不应该像_mm_slli_epi32or那样按位移动_mm_slli_epi64吗?
  • 如果不是,我应该在哪种情况下使用它_mm_bslli_si128
  • 是否有正确执行此操作的汇编指令?
  • 用较小的班次模拟这一点的最佳方法是什么?

回答

1 这不是疏忽。该指令确实按字节移动,即 8 位的倍数。

2 无关紧要,_mm_slli_si128并且_mm_bslli_si128是等价的,两者都编译为pslldqSSE2 指令。

至于仿真,假设您有 C++/17,我会这样做。如果您正在编写 C++/14,请替换if constexpr为 normal if,同时在static_assert.

template<int i>
inline __m128i shiftLeftBits( __m128i vec )
{
    static_assert( i >= 0 && i < 128 );
    // Handle couple trivial cases
    if constexpr( 0 == i )
        return vec;
    if constexpr( 0 == ( i % 8 ) )
        return _mm_slli_si128( vec, i / 8 );

    if constexpr( i > 64 )
    {
        // Shifting by more than 8 bytes, the lowest half will be all zeros
        vec = _mm_slli_si128( vec, 8 );
        return _mm_slli_epi64( vec, i - 64 );
    }
    else
    {
        // Shifting by less than 8 bytes.
        // Need to propagate a few bits across 64-bit lanes.
        __m128i low = _mm_slli_si128( vec, 8 );
        __m128i high = _mm_slli_epi64( vec, i );
        low = _mm_srli_epi64( low, 64 - i );
        return _mm_or_si128( low, high );
    }
}

  • I'd recommend `_mm_bslli_si128` - the newer name more clearly implies that it's a byte shift. Note that you don't really need `if constexpr`. That does maybe help your compiler make more efficient code in debug mode, if your compiler doesn't remove `if(false)` blocks in that case (*cough* MSVC), but that's all. With `i` as a template parameter, it's definitely a compile-time constant and even MSVC does dead-code removal with optimization enabled.

以上是如何最好地模拟_mm_slli_si128(128位位移)的逻辑含义,而不是_mm_bslli_si128的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>