在 R 中使用来自 dplyr 的 %>% 运算符
我见过使用 R 中 dplyr(或 tidyverse)包中的 %>% 运算符对同一对象或数据框执行一系列操作的代码示例。但是,我从来没有能够让它为我自己的代码工作。例如,在下面的代码中,我尝试替换列中每个单元格的“1:2=”部分,然后将该列转换为数字。如果我一次执行每个步骤,这可以正常工作,但是当我尝试将一个命令传递给下一个命令时会导致错误。
谁能帮我理解我在这里做错了什么?
> df <- as.data.frame(vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD")))
Rows: 79 Columns: 8
-- Column specification ---------------------------------------------------------------------------------------------------
Delimiter: " "
chr (4): chromosome, AvsDD, AvsWD, DDvsWD
dbl (4): position, num_SNPs, prop_SNPs_coverage, min_coverage
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(df)
'data.frame': 79 obs. of 8 variables:
$ chromosome : chr "A01" "A01" "A01" "A01" ...
$ position : num 139 149 384 544 547 552 558 615 686 693 ...
$ num_SNPs : num 1 1 1 1 1 1 1 1 1 1 ...
$ prop_SNPs_coverage: num 1 1 1 1 1 1 1 1 1 1 ...
$ min_coverage : num 104 32 79 46 48 52 60 30 98 94 ...
$ AvsDD : chr "1:2=0.00000000" "1:2=0.08624012" "1:2=0.13233606" "1:2=0.00000000" ...
$ AvsWD : chr "1:3=0.10843987" "1:3=0.00000000" "1:3=0.12724615" "1:3=0.23923465" ...
$ DDvsWD : chr "2:3=0.33696506" "2:3=0.38416539" "2:3=0.00000000" "2:3=0.26549660" ...
- attr(*, "spec")=
.. cols(
.. chromosome = col_character(),
.. position = col_double(),
.. num_SNPs = col_double(),
.. prop_SNPs_coverage = col_double(),
.. min_coverage = col_double(),
.. AvsDD = col_character(),
.. AvsWD = col_character(),
.. DDvsWD = col_character()
.. )
- attr(*, "problems")=<externalptr>
> df <- df %>% gsub("1:2=","",as.character(AvsDD)) %>% as.numeric(AvsDD)
Error in gsub("1:2=", "", as.character(AvsDD)) : object 'AvsDD' not found
但是,当我一次执行每个步骤时,这可以正常工作,并导致 AvsDD 列转换为数字:
> df$AvsDD <- gsub("1:2=","",as.character(df$AvsDD))
> df$AvsDD <- as.numeric(df$AvsDD)
> str(df)
'data.frame': 79 obs. of 8 variables:
$ chromosome : chr "A01" "A01" "A01" "A01" ...
$ position : num 139 149 384 544 547 552 558 615 686 693 ...
$ num_SNPs : num 1 1 1 1 1 1 1 1 1 1 ...
$ prop_SNPs_coverage: num 1 1 1 1 1 1 1 1 1 1 ...
$ min_coverage : num 104 32 79 46 48 52 60 30 98 94 ...
$ AvsDD : num 0 0.0862 0.1323 0 0 ...
$ AvsWD : chr "1:3=0.10843987" "1:3=0.00000000" "1:3=0.12724615" "1:3=0.23923465" ...
$ DDvsWD : chr "2:3=0.33696506" "2:3=0.38416539" "2:3=0.00000000" "2:3=0.26549660" ...
- attr(*, "spec")=
.. cols(
.. chromosome = col_character(),
.. position = col_double(),
.. num_SNPs = col_double(),
.. prop_SNPs_coverage = col_double(),
.. min_coverage = col_double(),
.. AvsDD = col_character(),
.. AvsWD = col_character(),
.. DDvsWD = col_character()
.. )
- attr(*, "problems")=<externalptr>
回答
使用dplyr,转换操作在mutate
library(dplyr)
df <- df %>%
mutate(AvsDD = as.numeric(gsub("1:2=","",as.character(AvsDD), fixed = TRUE)))
虽然,可以通过提取列 ( .$) 并包装来执行 OP 所做的操作,{}但这不是一个好方法
由于有多个列,我们可以使用 across
library(stringr)
df <- df %>%
mutate(across(c(AvsDD, AvsWD, DDvsWD), ~ as.numeric(str_remove(., ".*\\="))))