组合:rowwise()、mutate()、cross(),用于多种功能
这在某种程度上与此相关的问题:原则上我试着去了解如何rowwise操作与mutate多个列采用更然后像(1个功能mean(),sum(),min()等)的工作。
我已经了解到可以across完成这项工作而不是c_across。我已经学会了该功能mean()是将不同的功能min()以如下方式mean()不起作用在dataframes,我们需要将其更改到可以不公开或as.matrix做载体- >从Ronak沙阿了解到这里了解横行()和 c_across()
现在以我的实际情况为例:我能够完成这项任务,但我丢失了一个 column d。我怎样才能避免d这种设置中的柱子松动。
我的 df:
df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b",
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
不工作:
df %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
# Output:
a b c d e avg min max
<int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1 1 6 11 a 1 NA 1 a
2 2 7 12 b 2 NA 12 b
3 3 8 13 c 3 NA 13 c
4 4 9 14 d 4 NA 14 d
5 5 10 15 e 5 NA 10 e
作品,但我松散列d:
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
a b c e avg min max
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 6 11 1 4.75 1 11
2 2 7 12 2 5.75 2 12
3 3 8 13 3 6.75 3 13
4 4 9 14 4 7.75 4 14
5 5 10 15 5 8.75 5 15
回答
使用pmap()frompurrr可能更可取,因为您只需要选择一次数据,并且可以使用 select 助手:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
或者加上tidyr:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)
回答
编辑:
最好的出路
df %>%
rowwise() %>%
mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE),
avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
较早的回答
您this will work甚至无法正常工作,如果您更改输出顺序,请参阅
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE),
avg = mean(unlist(cur_data()), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 5.17
2 2 7 12 2 2 12 6.17
3 3 8 13 3 3 13 7.17
4 4 9 14 4 4 14 8.17
5 5 10 15 5 5 15 9.17
因此,建议这样做 -
df %>%
select(-d) %>%
rowwise() %>%
mutate(min = min(c_across(a:e), na.rm = TRUE),
max = max(c_across(a:e), na.rm = TRUE),
avg = mean(c_across(a:e), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 4.75
2 2 7 12 2 2 12 5.75
3 3 8 13 3 3 13 6.75
4 4 9 14 4 4 14 7.75
5 5 10 15 5 5 15 8.75
另一种选择是
cols <- c('a', 'b', 'c', 'e')
df %>%
rowwise() %>%
mutate(min = min(c_across(cols), na.rm = TRUE),
max = max(c_across(cols), na.rm = TRUE),
avg = mean(c_across(cols), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
在这些情况下,即使 @Sinh 建议的 group_by 方法也无法正常工作。