为什么实例字段不需要是final或有效final才能在lambda表达式中使用?

我正在用 Java 练习 lambda 表达式。根据 Java SE 16 Lambda Body的 Oracle 文档,我知道局部变量需要是 final 或有效 final 的:

任何使用但未在 lambda 表达式中声明的局部变量、形式参数或异常参数必须是 final 或有效 final (§4.12.4),如 §6.5.6.1 中所述。

但它没有说为什么。搜索我发现了这个类似的问题为什么 lambdas 中的变量必须是最终的或有效的最终?,其中 StackOverflow 用户“snr”回复了下一个引用:

到目前为止,Java 中的局部变量不受竞争条件和可见性问题的影响,因为它们只能被执行声明它们的方法的线程访问。但是一个 lambda 可以从创建它的线程传递到另一个线程,因此如果由第二个线程评估的 lambda 被赋予改变局部变量的能力,那么这种免疫力就会丧失。

  • 来源:为什么限制局部变量捕获?

这就是我的理解:一个方法一次只能由一个线程(比如说 thread_1)执行。这确保该特定方法的局部变量仅由 thread_1 修改。另一方面,可以将 lambda 传递给不同的线程 (thread_2),因此...如果 thread_1 完成 lambda 表达式并继续执行方法的其余部分,则它可能会更改局部变量的值,并且,在同时,thread_2 可能会更改 lambda 表达式中的相同变量。然后,这就是存在此限制的原因(局部变量需要是 final 或有效 final 的)。

对不起,很长的解释。我做对了吗?

但接下来的问题是:

  • 为什么这种情况不适用于实例变量?
  • 如果 thread_1 与 thread_2 同时更改实例变量会发生什么(即使它们没有执行 lambda 表达式)?
  • 实例变量是否以另一种方式保护?

我对Java没有太多经验。对不起,如果我的问题有明显的答案。

回答

这个问题实际上与线程安全无关。对于为什么总是可以捕获实例变量,有一个简单直接的答案:this总是有效的最终变量。也就是说,在创建访问实例变量的 lambda 时总是有一个已知的固定对象。请记住,命名实例变量foo始终有效地等同于this.foo

所以

class MyClass {
  private int foo;
  public void doThingWithLambda() {
    doThing(() -> { System.out.println(foo); })
  }
}

可以将 lambda 重写为doThing(() -> System.out.println(this.foo); }),因此等价于

class MyClass {
  private int foo;
  public void doThingWithLambda() {
    final MyClass me = this;
    doThing(() -> { System.out.println(me.foo); })
  }
}

...exceptthis已经是最终的,不需要复制到另一个局部变量(尽管 lambda 会捕获引用)。

当然,所有正常的线程安全警告都适用。如果您的 lambdas 被传递给多个线程并修改变量,那么如果不使用 lambdas,则会发生完全相同的事情,并且除了您的变量的线程安全性(例如,如果它们是 volatile)或者您的lambda 使用其他机制来安全地访问变量。Lambda 对线程安全没有任何特别之处,它们对实例变量也没有任何特别之处;它们只是捕获对this而不是对实例变量的引用。

  • @cat: It can be used as a reference, and it never changes. It's not literally a variable, but it behaves like a final variable as far as everything in this post discusses.
  • I think this answer could be better if you first explain that lambdas capture local variables by _copying_ their value -- hence why they must be effectively final; otherwise a caller could observe that their value didn't match. So, since `this` is "effectively effectively final," the lambda can capture it by copying the reference.
  • This answer is at least partially incorrect. The JLS itself notes the issue of concurrency. And `this` is a keyword representing a value, not a variable that can be effectively final. From the JLS: *"Similar rules on variable use apply in the body of an inner class (§8.1.3). The restriction to effectively final variables prohibits access to **dynamically-changing local variables**, whose capture would likely introduce **concurrency problems**."*

回答

其他答案已经提供了很好的上下文,说明为什么这是 Java 中的限制。我想提供一些有关其他语言在不强制要求将局部变量视为不可变(即final)时如何处理此问题的背景知识。

建议的要点是“堆”值(即字段)本质上可以从其他线程访问,而“堆栈”值(即局部变量)本质上只能从声明这些值的方法内部访问。这是真的。因此,由于字段存储在堆中,因此可以在方法完成后改变它们。相反,一旦方法完成,堆栈值就会消失。

Java 选择遵守这些语义,因此在方法完成后绝不能修改局部变量。这是一个公平的设计决定。但是,某些语言确实选择允许在方法退出后对局部变量进行突变。那怎么可能呢?

在 C#(我最熟悉的语言,但其他语言如 JavaScript 也允许这些构造)中,当您在 lambda 中引用局部变量时,编译器会检测到并在幕后实际生成一个全新的类来存储局部变量。因此,不是在堆栈上声明变量,而是编译器检测到它已在 lambda 内部被引用,因此实例化该类以存储值。所以这个(在幕后)行为将堆栈值转换为堆值。(您实际上可以反编译此类代码并查看这些编译器生成的类)

这个决定并非没有代价。实例化一个类只是为了容纳一个整数显然更昂贵。在 Java 中,您可以保证这永远不会发生。在诸如 C# 之类的语言中,需要仔细推理才能知道您的变量是否已“提升”到该生成的类中。

因此,最终理由成为设计决策之一。在 Java 中,你不能用脚射击自己。在 C# 中,他们认为在大多数情况下,性能后果并不是什么大问题。

也就是说,C# 的决定通常是混淆和错误的根源,尤其是在循环中的循环迭代器变量for(循环变量i可以(并且必须)被改变)并传递给 lambda 的情况下,如 Eric Lippert 的博客文章中所述。问题如此严重,以至于他们决定为该foreach变体的编译器引入一个(罕见的)破坏性更改。

另一方面,我很享受在 C# 中的 lamda 内部改变局部变量的自由。但是这两个决定都不是没有代价的。

这个答案绝对不是要提倡任何一个决定,但我认为有必要详细说明其中的一些设计选择。

  • Small correction: in C#, a variable is lifted to the closure class even if it is not mutated in the lambda. Suffices if it is just referenced.
  • This behavior has more consequences than performance. It implies that suddenly, the programmer is responsible for ensuring the thread safety of local variables. In Java, the local variables are immune to data races, which doesn’t apply when you can turn local variables into shared mutable variables. But you can’t, for example, declare a local variable as `volatile` in Java. That’s not possible, as it was never needed. Since you also can’t synchronize on the instance of the synthetic class, ensuring thread safe local variables suddenly becomes more complicated than ensuring thread safe fields.
  • @ach: When I was on the C# team -- almost ten years ago now -- we considered doing some optimizations which would distinguish between mutated vs merely read outer variables of a lambda, and capture the latter "by value" rather than capturing the variable. I never ended up making such optimizations; it sounds like the team has not done so in the years since, but I would not be surprised at all if they do so someday.
  • @Holger: Your points are correct and well taken, but its worthwhile to note that since C# 2.0 it has always been the case that local variables can be modified in unexpected ways and unexpected orders, and you need neither anonymous methods nor lambdas nor even multithreading to fall victim to such races. Coroutines -- iterator blocks in C# 2.0 and async methods in C# 6.0 -- also have the property that they hoist locals to the heap and extend their lifetimes because coroutine activations do not form a stack.
  • @supercat: Re: as many closure classes as there are subsets of outer vars, yes, that was exactly the optimization we considered.
  • @supercat: Re: indicating in the language: when we were designing lambdas for C# 3.0 Herb Sutter randomly stopped by my office and we had a very entertaining conversation about how C++ was doing exactly that, and what the pros and cons were. Obviously the C# team decided on not adding a syntax for indicating desired closure semantics. In retrospect, I kinda wish that we had made it easier to statically detect and disallow LINQ query comprehensions that closed over modifiable variables, as that turned out to be a rich source of user errors.
  • @Joker_vD: I note that the same tradeoff exists for the "transparent identifiers" introduced by query comprehension rewriting. They desugar into types where you can end up drilling down through several levels of dereferencing to get to a variable. But if you're building a big query with a lot of SelectMany clauses, odds are pretty good that time spent accessing the range variables is going to be the least of your performance worries.
  • @EricLippert yes, for C#, the road has been taken which makes it easier to decide to repeat it for other features. For Java, this does not apply, which makes staying with only sharing (effectively) final local variables variables attractive. Which means that supercat’s idea of letting the programmer decide exists in Java. Just use a local variable for “by value” semantic or explicitly create the class holding the variable for “by reference” semantic.

回答

实例变量存储在堆空间中,而局部变量存储在堆栈空间中。每个线程维护自己的堆栈,因此局部变量不会在线程之间共享。另一方面,堆空间由所有线程共享,因此多个线程可以修改实例变量。有多种机制可以使数据线程安全,您可以在该平台上找到许多相关讨论。为了完整起见,我在下面引用了http://web.mit.edu/6.005/www/fa14/classes/18-thread-safety/的摘录

基本上有四种方法可以使共享内存并发中的变量访问安全:

  • 坐月子。不要在线程之间共享变量。这个想法被称为限制,我们今天将探讨它。
  • 不变性。使共享数据不可变。我们已经讨论了很多关于不变性的内容,但是我们将在本阅读中讨论并发编程的一些额外约束。
  • 线程安全数据类型。将共享数据封装在为您进行协调的现有线程安全数据类型中。我们今天就讲这个。
  • 同步。使用同步来防止线程同时访问变量。同步是您构建自己的线程安全数据类型所需要的。
  • How does this answer the question?
  • @akuzminykh - What else are you looking for?
  • You talk about heap and stack but you don't talk about why local variables need to be `final` and fields don't. The question is rather theoretical and I think it deserves a more extensive and instructive explanation than this.
  • @akuzminykh - The OP has already done good research and this answer is focused on the questions listed at the end. The rest of the explanation is there in the referred links.

以上是为什么实例字段不需要是final或有效final才能在lambda表达式中使用?的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>