关于 scala:found: org.apache.spark.sql.Dataset[(Double, Double)] 需要: org.apache.spark.rdd.RDD[(Double, Double)]
found: org.apache.spark.sql.Dataset[(Double, Double)] required: org.apache.spark.rdd.RDD[(Double, Double)]
我收到以下错误
|
1
2 3 |
found : org.apache.spark.sql.Dataset[(Double, Double)]
required: org.apache.spark.rdd.RDD[(Double, Double)] val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel) |
关于以下代码:
|
1
2 3 4 |
val testScoreAndLabel = testResults.
select("Label","ModelProbability"). map{ case Row(l:Double,p:Vector) => (p(1),l) } val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel) |
从错误看来,testScoreAndLabel 是 sql.Dataset 类型,但 BinaryClassificationMetrics 需要 RDD。
如何将 sql.Dataset 转换为 RDD?
我会做这样的事情
|
1
2 3 |
val testScoreAndLabel = testResults.
select("Label","ModelProbability"). map{ case Row(l:Double,p:Vector) => (p(1),l) } |
现在只需执行 testScoreAndLabel.rdd
即可将 testScoreAndLabel 转换为 RDD
|
1
|
val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel.rdd)
|
API 文档
THE END
二维码