是否允许C编译器合并对volatile变量的顺序分配?

我有一个由硬件供应商报告的理论上的(非确定性的、难以测试的、实践中从未发生过的)硬件问题,其中对某些内存范围的双字写入可能会破坏任何未来的总线传输。

虽然我没有在 C 代码中明确写任何双字,但我担心编译器被允许(在当前或未来的实现中)将多个相邻的字分配合并为一个双字分配。

编译器不允许重新排序 volatile 的分配,但不清楚(对我而言)合并是否算作重新排序。我的直觉说是,但我之前已经被语言律师纠正过!

例子:

typedef struct
{
   volatile unsigned reg0;
   volatile unsigned reg1;
} Module;

volatile Module* module = (volatile Module*)0xFF000000u;

// two word stores, or one double-word store?
module->reg0 = 1;
module->reg1 = 2;

(我会单独询问我的编译器供应商,但我很好奇标准的规范/社区解释是什么。)

回答

不,编译器绝对不允许将这两个写入优化为单个双字写入。引用标准有点困难,因为关于优化和副作用的部分写得如此模糊。相关部分见 C17 5.1.2.3:

本国际标准中的语义描述描述了与优化问题无关的抽象机器的行为。

访问易失性对象、修改对象、修改文件或调用执行任何这些操作的函数都是副作用,它们是执行环境状态的变化。

在抽象机中,所有表达式都按照语义指定的方式进行评估。如果一个实际的实现可以推断出它的值未被使用并且没有产生所需的副作用(包括由调用函数或访问易失性对象引起的任何副作用),则它不需要计算表达式的一部分。

对 volatile 对象的访问严格按照抽象机的规则进行评估。

当您访问结构的一部分时,这本身就是一种副作用,可能会产生编译器无法确定的后果。例如,假设您的结构是一个硬件寄存器映射,并且这些寄存器需要按特定顺序写入。例如,一些微控制器文档可能是这样的:“reg0 启用硬件外设,必须先写入,然后才能在 reg1 中配置详细信息”。

volatile对象写入合并为单个对象的编译器将不符合标准且完全损坏。

  • @Andreas If the struct access is volatile, then member access will be volatile even if the members are not declared volatile. Same as for "const".
  • Ohhh didn't think of the struct access. The pointer in this case should not be volatile then, leaving only the members volatile (and down the nested volatile rabbit hole we go). Damn, C is hard. Happy to see you were able to look past that. The "real" code in question does not have that aspect, but it was too gnarly to make a good example from.

回答

编译器不会允许做两个这样的分配到单个存储器写。内核必须有两个独立的写入。@Lundin 的回答给出了 C 标准的相关参考。

但是,请注意缓存(如果存在)可能会欺骗您。该关键字volatile并不意味着“未缓存”的内存。所以除了使用之外volatile,你还需要确保地址 0xFF000000 被映射为未缓存的。如果地址被映射为缓存,则缓存硬件可以将两个分配转换为单个内存写入。换句话说 - 对于缓存内存,两个核心内存写入操作可能最终作为系统内存接口上的单个写入操作。

  • @Lundin: *C has never allowed speculative or OoO execution of volatile access* - that's different from "uncacheable". You seem to be talking about not hoisting loads/sinking stores out of loops in asm. But that's totally different from *hardware* prefetch on write-back cacheable memory regions. You can look at it as C guaranteeing that loads/stores *to the cache coherency domain* are a visible side-effect, not the true contents of DRAM. SW can't observe DRAM (except possibly via another mapping of the same physical address, or on a hypothetical system with non-coherent shared memory)
  • @Lundin: If you want MMIO accesses to work properly, you need to make sure the address range including the MMIO address is mapped uncacheable even if you're writing asm by hand; it's implausible and impractical for a C compiler to do this for you for global `volatile int foo;`.
  • @Lundin You can qualify automatic variables as `volatile`. Does that mean then that the compiler has to emit code to turn off caching for that section of the stack? Never seen that before and seems absurd. (an automatic variable qualified `volatile` is quite useful e.g. if you single-step through the program and want to change it from a debugger).
  • `volatile` absolutely means uncached memory. A system that does pre-fetch reads of `volatile` qualified variables is not compliant. `volatile` access has to be performed according to the sequence points placed around the variables. As CPUs have evolved, there's been attempts by hardware and/or compiler vendors to push this burden of memory barrier-like behavior onto the application programmers. But C has never allowed speculative or out of order execution of `volatile` access. It's not the application programmer's fault if someone has released hardware which can't execute compliant C.
  • @Lundin I like to see some reference for that claim as I disagree. Also this little example https://ideone.com/U8Sq9n shows that the compiler doesn't map volatile variables any different than ordinary variables.
  • @Lundin: You seem to have decided that DRAM itself, not the cache-coherent view of memory that all cores share, is what the C standard means by "the execution environment". Yes, your argument would follow from that premise. But I don't see a good reason to choose that, and it makes very little sense to me in a C implementation for a system with coherent cache. Bypassing cache would make `volatile` unusably slow overkill for a lot of things, and make users look for some mechanism that wasn't horrible. e.g. for stuff like `volatile sig_atomic_t`, for making sure stores to mmaped files happen.
  • @Lundin: Linkers, and software to control memory-type attributes like making some range uncacheable, give you the tools to set up some uncacheable memory you can read from if that's what you want, when programming for a system that does have cache. I don't buy that timing argument at all. If you want something extra slow for a delay, do a volatile read *from uncacheable memory*, not just from any arbitrary variable. Having every volatile necessarily be slow sounds like a worse design that I wouldn't want.

回答

的行为volatile似乎取决于实现,部分原因是一个奇怪的句子说:“什么构成对具有 volatile 限定类型的对象的访问是实现定义的”。

在 ISO C 99 第 5.1.2.3 节中,还有:

3 在抽象机中,所有表达式都按照语义指定的方式进行评估。如果一个实际的实现可以推断出它的值未被使用并且没有产生所需的副作用(包括由调用函数或访问易失性对象引起的任何副作用),则它不需要计算表达式的一部分。

因此,尽管要求volatile必须按照抽象语义(即未优化)处理对象,但奇怪的是,抽象语义本身允许消除死代码和数据流,这些都是优化的例子!

恐怕要知道什么volatile会做什么不会做什么,您必须查看编译器的文档。


回答

C 标准不知道易失性对象上的操作和实际机器上的操作之间的任何关系。虽然大多数实现会指定类似的构造*(char volatile*)0x1234 = 0x56;将生成值为 0x56 的字节存储到硬件地址 0x1234,但实现可以在闲暇时为例如 8192 字节数组分配空间,并指定*(char volatile*)0x1234 = 0x56;将立即将 0x56 存储到元素 0x1234那个数组,从来没有对硬件地址 0x1234 做任何事情。或者,一个实现可能包括一些进程,该进程周期性地将该数组的 0x1234 中的任何内容存储到硬件地址 0x56。

一致性所需要的只是在单个线程中对易失性对象的所有操作,从抽象机的角度来看,都被认为是绝对有序的。从标准的角度来看,实现可以以他们认为合适的任何方式将此类访问转换为真实的机器操作。

  • Moreover what constitutes a volatile access is implementation-defined.

回答

改变它会改变程序的可观察行为。所以编译器是不允许这样做的。

  • The sequence of actual hardware memory operations is only "observable" if an implementation chooses to specify it as such. Nothing would forbid an implementation from include its own virtual machine where volatile stores update the virtual machine state immediately, but such updates take awhile to be translated into operations on real machine hardware.

以上是是否允许C编译器合并对volatile变量的顺序分配?的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>