基于滞后行的条件总和
我有一个数据框测量每月的登录次数。我正在尝试创建一个计数器测量months_since_zero_login,它仅在一个月的登录次数为零时添加。第一个月,每个客户的柜台将从零开始。
这是数据:
library(tidyverse)
obs <- seq(as.Date('2020-01-01'),
as.Date('2020-05-01'),
by = "month")
table <- tibble(customer = seq(1:3))
#output
table <- table %>%
crossing(obs) %>%
mutate(login = c(3, 0, 0, 0, 2,
0, 1, 5, 0, 0,
1, 3, 1, 5, 0))
这是预期的结果:
customer obs login months_since_zero_login
<int> <date> <dbl> <dbl>
1 1 2020-01-01 3 0
2 1 2020-02-01 0 0
3 1 2020-03-01 0 1
4 1 2020-04-01 0 2
5 1 2020-05-01 2 0
6 2 2020-01-01 0 0
7 2 2020-02-01 1 0
8 2 2020-03-01 5 0
9 2 2020-04-01 0 0
10 2 2020-05-01 0 1
11 3 2020-01-01 1 0
12 3 2020-02-01 3 0
13 3 2020-03-01 1 0
14 3 2020-04-01 5 0
15 3 2020-05-01 0 0
到目前为止,这是我的代码,但是当有连续的零时(在客户 1 的情况下),我坚持如何将计数器增加 1
table %>%
group_by(customer) %>%
mutate(months_since_zero_login = case_when(
row_number() == 1 ~ 0,
lag(login) == 0 & login == 0 ~ 1,
TRUE ~ 0
))
#does not increase counter when there are consecutive zeroes
回答
这可以通过rleid. 基于'login'中'0'值的出现创建一个临时分组列,然后按'customer'、'grp'分组,同时指定i'login == 0'的行,创建'months_since_zero_login'作为序列减去的行数 1. 将NA同一列中的元素替换为 0(如果需要)
library(data.table)
setDT(table)[, grp := rleid(login == 0), .(customer)]
table[login == 0, months_since_zero_login := seq_len(.N) - 1,
.(customer, grp)][, grp := NULL]
table[is.na(months_since_zero_login), months_since_zero_login := 0]
-输出
table
# customer obs login months_since_zero_login
# 1: 1 2020-01-01 3 0
# 2: 1 2020-02-01 0 0
# 3: 1 2020-03-01 0 1
# 4: 1 2020-04-01 0 2
# 5: 1 2020-05-01 2 0
# 6: 2 2020-01-01 0 0
# 7: 2 2020-02-01 1 0
# 8: 2 2020-03-01 5 0
# 9: 2 2020-04-01 0 0
#10: 2 2020-05-01 0 1
#11: 3 2020-01-01 1 0
#12: 3 2020-02-01 3 0
#13: 3 2020-03-01 1 0
#14: 3 2020-04-01 5 0
#15: 3 2020-05-01 0 0
有了dplyr,我们仍然可以使用rleid
library(dplyr)
table %>%
group_by(grp = rleid(customer, login == 0), customer) %>%
mutate(months_since_zero_login = if(all(login == 0))
row_number() - 1 else 0) %>%
ungroup %>%
select(-grp)
-输出
# A tibble: 15 x 4
# customer obs login months_since_zero_login
# <int> <date> <dbl> <dbl>
# 1 1 2020-01-01 3 0
# 2 1 2020-02-01 0 0
# 3 1 2020-03-01 0 1
# 4 1 2020-04-01 0 2
# 5 1 2020-05-01 2 0
# 7 2 2020-02-01 1 0
# 8 2 2020-03-01 5 0
# 9 2 2020-04-01 0 0
#10 2 2020-05-01 0 1
#11 3 2020-01-01 1 0
#12 3 2020-02-01 3 0
#13 3 2020-03-01 1 0
#14 3 2020-04-01 5 0
#15 3 2020-05-01 0 0
或使用rle从base R
f1 <- function(x) {
with(rle(x == 0), rep(values, lengths) * (sequence(lengths) - 1))
}
table$months_since_zero_login <- with(table, ave(login, customer, FUN = f1))