从特定列中选择值并跳过 R 中的 NA 值

丹青 • 2022年8月16日 pm1:36 • 问答

我正在处理癌症登记数据。在下文中数据的例子（ex_data），变量ID和diagnosis_yr立场ID，并在每年确诊的癌症接受性。列x_2005到x_2010和y_2005到y_2010分别代表x和y的状态，每年（2005至2010年）。在我的实际工作数据，我已经很多年（2005- 2020年）多列。我想从最早的可用年份、最近的可用年份和诊断年份（即x_earliest、y_latest、x_at_diagnosis、y_at_diagnosis）中提取 x 和 y 值“通缉”中的变量）通过排除 NAs 。例如，对于 id 1，我想通过跳过 NA 从最早的一年中提取 x 值和从最近一年中提取 y 值。对于诊断年份的 x 和 y 值，如果诊断年份有 NA，我想跳过 NA 并提取前一年的可用数据。我如何实现以在 R 中获取想要的变量？

library(tidyverse)

#example data
ex_data <- tribble(
~id,~diagnosis_yr,~x_2005,~x_2006,~x_2007,~x_2008,~x_2009,~x_2010,~y_2005,~y_2006,~y_2007,~y_2008,~y_2009,~y_2010,
1,  2007,   NA, NA, 1,  2,  2,  3,  "a",    "b",    "c",    "d",    "e",    NA, 
2,  2008,   1,  3,  1,  NA, 1,  2,   NA,    "b",    "b",    "e",    "d", "d",
3,  2010,   NA, 2,  2,  2,  3,  NA, "a",    "b",    "c",     NA,     NA,    NA,
4,  2009, 1,    3,  1,  NA, 1,  2,   NA,     NA,     NA,     NA,     NA,    NA,
5,  2005, NA,   1,  1,  2,  2,  3,  "a",    "b",    "c",    "d",    "e",    "e"
)

#wanted variables
wanted <- tribble(
  ~id,~diagnosis_yr,~x_earliest,~y_latest,~x_at_diagnosis,~y_at_diagnosis,
  1,    2007,   1,  "e",    1,  "c",
  2,    2008,   1,  "d",    1,  "e",
  3,    2010,   2,  "c",    3,  "c",
  4,  2009, 1,   NA,  1,  NA,
  5,  2005, 1,  "e", NA,  "a"
)

回答

我不完全确定，如果这是正确的：

library(dplyr)
library(tidyr)
ex_data %>%
pivot_longer(-c(id, diagnosis_yr),
names_to = c(".value", "year"),
names_pattern = "(.*)_(\\d+)") %>%
group_by(id) %>%
mutate(x_earliest     = first(na.omit(x)),
x_at_diagnosis = last(na.omit(x[diagnosis_yr >= year])),
y_latest       = last(na.omit(y)),
y_at_diagnosis = last(na.omit(y[diagnosis_yr >= year]))) %>%
select(id, diagnosis_yr, x_earliest, y_latest, x_at_diagnosis, y_at_diagnosis) %>%
distinct() %>%
ungroup()

这返回

# A tibble: 3 x 6
id diagnosis_yr x_earliest y_latest x_at_diagnosis y_at_diagnosis
<dbl>        <dbl>      <dbl> <chr>             <dbl> <chr>
1     1         2007          1 e                     1 c
2     2         2008          1 d                     1 e
3     3         2010          2 c                     3 c

以上是从特定列中选择值并跳过 R 中的 NA 值的全部内容。

THE END

col na

二维码

连接两个流的 NullSafe 方法是什么？

< <上一篇

如何使用嵌入的 Typeform（使用隐藏字段）捕获页面 URL？

下一篇>>

搜索内容

从特定列中选择值并跳过 R 中的 NA 值

回答

目录

目录

推荐文章

最新文章