将冒号和等号分隔的字符串拆分为R中的不同列

我有一个 dataframe ,其中一列包含冒号和等号分隔的字符串。

data$col1
  [1] "ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45"  
  [2] "ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05"  
  [3] "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82"
  [4] "DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03" 
  [5] "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37"    
  [6] "ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29" 

我想提取NLOD=和后面的数字TLOD=,然后将其分成两列。这是我想要的输出。

data
                                                        col1     TLOD      NLOD
    "ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45"     5.45     38.78
    "ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05"     4.05     36.58
  "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82"     5.82     20.42
   "DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03"     8.03     30.70
      "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37"     5.37     41.48
      "ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29"     5.29     40.59

任何帮助表示赞赏。谢谢你。

可重现的样本数据

structure(list(col1 = c("ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45", 
"ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05", "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82", 
"DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03", "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37", 
"ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

回答

在基础 R 中,您可以使用strcapture将数据捕获到单独的列中。

cbind(df, strcapture('NLOD=(.*?);TLOD=(.*)', df$col1, 
           proto = list(NLOD = numeric(), TLOD = numeric())))

#.                                                     col1  NLOD TLOD
#1   ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45 38.78 5.45
#2   ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05 36.58 4.05
#3 DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82 20.42 5.82
#4  DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03 30.70 8.03
#5     ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37 41.48 5.37
#6     ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29 40.59 5.29

要专门查找数字,您可以执行以下操作:

cbind(df, strcapture('NLOD=(d+.d+);TLOD=(d+.d+)', df$col1, 
           proto = list(NLOD = numeric(), TLOD = numeric())))

相同的正则表达式也可以用于tidyr::extract

tidyr::extract(df, col1, c('NLOD', 'TLOD'), 'NLOD=(.*?);TLOD=(.*)', remove = FALSE)

  • That is to make matching non-greedy. Although, it would not make any difference here but if the string had another `;` at the end the first capture group would have captured everything until the last `;` with `(.*)`.

以上是将冒号和等号分隔的字符串拆分为R中的不同列的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>