带有data.table的滚动加权平均值
我想为 data.table 计算每组的滚动加权平均值,如下所示:
DT <- data.table(group = rep(c(1,2), each = 5), value = 1:10, weight = 11:20)
group value weight
1: 1 1 11
2: 1 2 12
3: 1 3 13
4: 1 4 14
5: 1 5 15
6: 2 6 16
7: 2 7 17
8: 2 8 18
9: 2 9 19
10: 2 10 20
我runner在这个问题Rolling over function with 2 vector arguments 中找到了一个带有包的工作解决方案:
my_weighted_mean <- function(data) {
weighted.mean(data[, 1], w = data[, 2])
}
DT[, weighted_mean := runner::runner(x = .SD, f = my_weighted_mean , k = 3, na_pad = TRUE), .SDcols = c("value", "weight"), by = list(group)]
但是代码很慢。
我想它应该可以使用,frollapply但以下不能使用,因为我不明白如何将 frollapply 与两列函数一起使用:
DT[, weighted_mean := frollapply(value, FUN = weighted.mean, n = 3, w = weights), by = list(group)]
寻找更好的性能(以及没有转轮的解决方案)
回答
“使用两列函数滚动应用”:不是滚动值,而是滚动索引,并且内部函数可以根据需要使用尽可能多的列。
DT[, weighted_mean := frollapply(seq_len(.N),
FUN = function(ind) weighted.mean(value[ind], weight[ind]),
n = 3),
by = .(group)]
# group value weight weighted_mean
# <num> <int> <int> <num>
# 1: 1 1 11 NA
# 2: 1 2 12 NA
# 3: 1 3 13 2.055556
# 4: 1 4 14 3.051282
# 5: 1 5 15 4.047619
# 6: 2 6 16 NA
# 7: 2 7 17 NA
# 8: 2 8 18 7.039216
# 9: 2 9 19 8.037037
# 10: 2 10 20 9.035088