从指定的x值显示geom_smooth()趋势线

假设一个数据集包含每个多个时间段和每个多个组的计数数据,格式如下:

set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
                 week = rep(1:50, 3),
                 rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))

    group week rate
1       1    1  604
2       1    2  598
3       1    3  578
4       1    4  591
5       1    5  589
6       1    6  571
7       1    7  581
8       1    8  597
9       1    9  589
10      1   10  584

我有兴趣为每组拟合基于模型的趋势线,但是,我希望仅从某个 x 值显示此趋势线。要使用所有数据点可视化趋势线(需要ggplot2):

df %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE) 

或者根据特定的值范围(需要ggplot2dplyr)拟合模型:

df %>%
 group_by(group) %>%
 mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(aes(y = rate2),
             method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE)

但是,我找不到使用所有数据拟合模型的方法,只能显示特定 x 值(假设为 35+)的趋势线。因此,我基本上想要为第一个绘图计算的趋势线,但根据第二个绘图显示它ggplot2,理想情况下只使用一个管道。

回答

我去看看after_stat@tjebo提到的功能。看看以下是否适合你?

df %>%
  ggplot(aes(x = week,
             y = rate,
             lty = group)) + 
  geom_line() +
  geom_point() +
  geom_smooth(method = "glm", 
              aes(group = after_stat(interaction(group, x > 35)),
                  colour = after_scale(alpha(colour, as.numeric(x > 35)))),
              method.args = list(family = "quasipoisson"),
              se = F)

这是通过将与每条线相关联的点分成两组,x <=35 区域中的点和 x >35 区域中的点来实现的,因为线的颜色不应变化,并为每个新组定义单独的颜色透明度. 因此,只有 x > 35 区域中的线是可见的。

使用时,代码会触发警告,指出after_scale修改未应用于图例。不过我认为这不是问题,因为无论如何我们都不需要它出现在图例中。


回答

如果您可以容忍警告,则可以使用与示例代码的 1 行差异解决此问题stage()

library(tidyverse)

set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
                 week = rep(1:50, 3),
                 rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))

df %>%
  ggplot(aes(x = week,
             y = rate,
             group = group,
             lty = group)) + 
  geom_line() +
  geom_point() +
  geom_smooth(method = "glm", 
              method.args = list(family = "quasipoisson"),
              aes(x = stage(week, after_stat = ifelse(x > 35, x, NA))),
              se = FALSE) 
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 165 rows containing missing values (geom_smooth).


回答

一种方法是在外部构建拟合值,ggplot以便您可以控制它们:

df$fit <- glm(rate ~ week + group, data = df, family = "quasipoisson")$fitted.values

library(dplyr)
library(ggplot2)

ggplot(df, aes(x = week, group = group, lty = group)) + 
  geom_line(aes(y = rate)) +
  geom_point(aes(y = rate)) +
  geom_line(data = df %>% filter(week >= 35), aes(y = fit), color = "blue", size = 1.25)


以上是从指定的x值显示geom_smooth()趋势线的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>