group_by并保留所有不包含特定值的组并过滤有值的地方
我有以下数据框:
df <- data.frame(
Code = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b"),
Inst = c("Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No"),
Date = c(
"2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05",
"2021-01-06", "2021-01-06", "2021-01-06", "2021-01-09", "2021-01-10"
)
)
我想应用dplyr::group_by到变量Code并过滤特定值 "Yes" 和 minimum Date,但我想保留不包含 Yes 值的组的所有观察结果。我试过了,filter(any(Inst == "Yes"))但这不起作用。
我想要这样的结果:
Code Inst Date
a Yes 2021-01-01
b No 2021-01-06
b No 2021-01-06
b No 2021-01-06
回答
如果可以有多个Yes值:
df %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes"))
Code Inst
<chr> <chr>
1 a Yes
2 b No
3 b No
4 b No
5 b No
6 b No
考虑到更新的问题:
df %>%
mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
group_by(Code) %>%
slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes")) %>%
filter(Date == min(Date))
Code Inst Date
<chr> <chr> <date>
1 a Yes 2021-01-01
2 b No 2021-01-06
3 b No 2021-01-06
4 b No 2021-01-06