获得每组“摘要”输出的整洁方法?
我的代码经常使用tapply和summary如下所示:
data <- tibble(
year = rep(2018:2021, 3),
x = runif(length(year))
)
tapply(data$x, data$year, summary)
输出看起来像:
$`2018`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3914 0.5696 0.7477 0.6668 0.8045 0.8614
$`2019`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1910 0.2863 0.3816 0.4179 0.5313 0.6809
(etc.)
$`2018`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3914 0.5696 0.7477 0.6668 0.8045 0.8614
$`2019`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1910 0.2863 0.3816 0.4179 0.5313 0.6809
(etc.)
有没有办法summary在小标题中获得这样的输出?
所需的输出,使用丑陋的代码:
# A tibble: 4 x 7
year Min. `1st Qu.` Median Mean `3rd Qu.` Max.
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018 0.39 0.570 0.75 0.67 0.8 0.86
2 2019 0.19 0.290 0.38 0.42 0.53 0.68
3 2020 0.01 0.35 0.7 0.55 0.82 0.93
4 2021 0.06 0.15 0.24 0.32 0.45 0.66
我希望有一个很好的dplyr函数组合可以做得更好——我获得所需输出的代码很笨拙。
当然,我希望不必重写 base R 的summary函数,如下所示:
tapply(data$x, data$year, summary)%>%
map(~ as.numeric(round(.x, 2))) %>%
map_dfr(set_names, names(summary(1))) %>%
add_column(year = 2018:2021, .before = 1)
回答
这是一个简洁的tidyverse方式。
library(dplyr)
library(purrr)
library(tidyr)
data %>%
nest_by(year) %>%
mutate(data = map(data, summary)) %>%
unnest_wider(data)
# # A tibble: 4 x 7
# year Min. `1st Qu.` Median Mean `3rd Qu.` Max.
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2018 0.105 0.256 0.407 0.307 0.407 0.407
# 2 2019 0.0354 0.205 0.375 0.313 0.452 0.529
# 3 2020 0.272 0.467 0.662 0.546 0.684 0.705
# 4 2021 0.00564 0.107 0.208 0.252 0.375 0.542
您也可以只转换原始行的表格输出。请注意,这里它转换year为字符,因此您可能希望将其改回。
library(purrr)
tapply(data$x, data$year, summary) %>%
map_dfr(c, .id = "year")
# # A tibble: 4 x 7
# year Min. `1st Qu.` Median Mean `3rd Qu.` Max.
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2018 0.105 0.256 0.407 0.307 0.407 0.407
# 2 2019 0.0354 0.205 0.375 0.313 0.452 0.529
# 3 2020 0.272 0.467 0.662 0.546 0.684 0.705
# 4 2021 0.00564 0.107 0.208 0.252 0.375 0.542