逻辑/算术移位更少的位更快吗?
是x>>2不是更快x>>31?换句话说,sar x, 2比sar x, 31? 我做了一些简单的测试,它们似乎具有相同的速度。我将不胜感激任何确凿的证据。
回答
这将取决于硬件实现。对于涉及常量移位的常见操作(例如指针算术),可能存在更快的路径(例如,它可能与相关的加法运算融合)。对于变量移位,使用桶形移位器电路,其中任何移位量都具有相同的延迟。
- https://uops.info/ and https://agner.org/optimize/ have numbers for actual x86 CPU instructions. Pentium 4 notoriously had slow shifts, but still fixed latency/throughput (not data-dependent). Most CPUs have 1-cycle latency for any shift count. (On modern Intel CPUs, compile-time-constant shifts are great, but when the count is a runtime variable, `shr reg, cl` decodes to 3 uops [because of x86 legacy baggage with not updating FLAGS if the count was 0](https://stackoverflow.com/a/36510865/224132). Unless you let the compiler use BMI2 `shlx` / `shrx`. Still, latency is only 1 cycle.)
- Intel as early as 386SX had a barrel shifter: https://media.digikey.com/pdf/Data%20Sheets/Intel%20PDFs/Intel386%20SX.pdf#page=84 lists cycle counts for shift/rotate of a register as 3 cycles for shift-by-1, shift-by-CL, or shift-by-immediate. (vs. 2 cycles for an instruction like `add reg,reg`). The last Intel x86 to have shift performance that depended on the count seems to be 286: https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_i.html has a table for 8088 .. Pentium. **8088 was 8 + 4n, 186 and 286 were 5 + n. 386 was a fixed 3 cycles**.