基于滞后行的条件总和

我有一个数据框测量每月的登录次数。我正在尝试创建一个计数器测量months_since_zero_login,它仅在一个月的登录次数为零时添加。第一个月,每个客户的柜台将从零开始。

这是数据:

library(tidyverse)

obs <- seq(as.Date('2020-01-01'),
           as.Date('2020-05-01'),
           by = "month")
table <- tibble(customer = seq(1:3))
#output
table <- table %>% 
  crossing(obs) %>% 
  mutate(login = c(3, 0, 0, 0, 2,
                   0, 1, 5, 0, 0,
                   1, 3, 1, 5, 0)) 

这是预期的结果:

   customer obs        login months_since_zero_login
      <int> <date>     <dbl>                   <dbl>
 1        1 2020-01-01     3                       0
 2        1 2020-02-01     0                       0
 3        1 2020-03-01     0                       1
 4        1 2020-04-01     0                       2
 5        1 2020-05-01     2                       0
 6        2 2020-01-01     0                       0
 7        2 2020-02-01     1                       0
 8        2 2020-03-01     5                       0
 9        2 2020-04-01     0                       0
10        2 2020-05-01     0                       1
11        3 2020-01-01     1                       0
12        3 2020-02-01     3                       0
13        3 2020-03-01     1                       0
14        3 2020-04-01     5                       0
15        3 2020-05-01     0                       0

到目前为止,这是我的代码,但是当有连续的​​零时(在客户 1 的情况下),我坚持如何将计数器增加 1

table %>% 
  group_by(customer) %>% 
  mutate(months_since_zero_login = case_when(
    row_number() == 1 ~ 0, 
    lag(login) == 0 & login == 0 ~ 1,
    TRUE ~ 0
  ))
#does not increase counter when there are consecutive zeroes

回答

这可以通过rleid. 基于'login'中'0'值的出现创建一个临时分组列,然后按'customer'、'grp'分组,同时指定i'login == 0'的行,创建'months_since_zero_login'作为序列减去的行数 1. 将NA同一列中的元素替换为 0(如果需要)

library(data.table)
setDT(table)[,  grp := rleid(login == 0), .(customer)]
table[login == 0, months_since_zero_login := seq_len(.N) - 1, 
         .(customer, grp)][, grp := NULL]
table[is.na(months_since_zero_login), months_since_zero_login := 0]

-输出

table
#    customer        obs login months_since_zero_login
# 1:        1 2020-01-01     3                       0
# 2:        1 2020-02-01     0                       0
# 3:        1 2020-03-01     0                       1
# 4:        1 2020-04-01     0                       2
# 5:        1 2020-05-01     2                       0
# 6:        2 2020-01-01     0                       0
# 7:        2 2020-02-01     1                       0
# 8:        2 2020-03-01     5                       0
# 9:        2 2020-04-01     0                       0
#10:        2 2020-05-01     0                       1
#11:        3 2020-01-01     1                       0
#12:        3 2020-02-01     3                       0
#13:        3 2020-03-01     1                       0
#14:        3 2020-04-01     5                       0
#15:        3 2020-05-01     0                       0

有了dplyr,我们仍然可以使用rleid

library(dplyr)
table %>% 
   group_by(grp = rleid(customer, login == 0), customer) %>% 
   mutate(months_since_zero_login = if(all(login == 0)) 
         row_number() - 1 else 0) %>% 
   ungroup %>%
   select(-grp)

-输出

# A tibble: 15 x 4
#   customer obs        login months_since_zero_login
#      <int> <date>     <dbl>                   <dbl>
# 1        1 2020-01-01     3                       0
# 2        1 2020-02-01     0                       0
# 3        1 2020-03-01     0                       1
# 4        1 2020-04-01     0                       2
# 5        1 2020-05-01     2                       0
# 7        2 2020-02-01     1                       0
# 8        2 2020-03-01     5                       0
# 9        2 2020-04-01     0                       0
#10        2 2020-05-01     0                       1
#11        3 2020-01-01     1                       0
#12        3 2020-02-01     3                       0
#13        3 2020-03-01     1                       0
#14        3 2020-04-01     5                       0
#15        3 2020-05-01     0                       0

或使用rlebase R

f1 <- function(x) {
   with(rle(x == 0), rep(values, lengths) * (sequence(lengths) - 1))     
}

table$months_since_zero_login <- with(table, ave(login, customer, FUN = f1))


以上是基于滞后行的条件总和的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>