从指定的x值显示geom_smooth()趋势线
假设一个数据集包含每个多个时间段和每个多个组的计数数据,格式如下:
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
group week rate
1 1 1 604
2 1 2 598
3 1 3 578
4 1 4 591
5 1 5 589
6 1 6 571
7 1 7 581
8 1 8 597
9 1 9 589
10 1 10 584
我有兴趣为每组拟合基于模型的趋势线,但是,我希望仅从某个 x 值显示此趋势线。要使用所有数据点可视化趋势线(需要ggplot2):
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
或者根据特定的值范围(需要ggplot2和dplyr)拟合模型:
df %>%
group_by(group) %>%
mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(aes(y = rate2),
method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
但是,我找不到使用所有数据拟合模型的方法,只能显示特定 x 值(假设为 35+)的趋势线。因此,我基本上想要为第一个绘图计算的趋势线,但根据第二个绘图显示它ggplot2,理想情况下只使用一个管道。
回答
我去看看after_stat@tjebo提到的功能。看看以下是否适合你?
df %>%
ggplot(aes(x = week,
y = rate,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
aes(group = after_stat(interaction(group, x > 35)),
colour = after_scale(alpha(colour, as.numeric(x > 35)))),
method.args = list(family = "quasipoisson"),
se = F)
这是通过将与每条线相关联的点分成两组,x <=35 区域中的点和 x >35 区域中的点来实现的,因为线的颜色不应变化,并为每个新组定义单独的颜色透明度. 因此,只有 x > 35 区域中的线是可见的。
使用时,代码会触发警告,指出after_scale修改未应用于图例。不过我认为这不是问题,因为无论如何我们都不需要它出现在图例中。
回答
如果您可以容忍警告,则可以使用与示例代码的 1 行差异解决此问题stage()。
library(tidyverse)
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
aes(x = stage(week, after_stat = ifelse(x > 35, x, NA))),
se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 165 rows containing missing values (geom_smooth).
回答
一种方法是在外部构建拟合值,ggplot以便您可以控制它们:
df$fit <- glm(rate ~ week + group, data = df, family = "quasipoisson")$fitted.values
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = week, group = group, lty = group)) +
geom_line(aes(y = rate)) +
geom_point(aes(y = rate)) +
geom_line(data = df %>% filter(week >= 35), aes(y = fit), color = "blue", size = 1.25)