用于替换dyplr工作流程中字符串内括号内的非数字字符的正则表达式

我的问题在某种程度上与一个已经回答的问题需要使用 R 从字符串列中提取单个字符有关。

我尝试用我的知识解决这个问题,并且需要知道如何删除字符串中括号中的非数字字符:`

这是带有列的数据框x

  team     linescore     ondate                                     x
1  NYM     010000000 2020-08-01             0, 1, 0, 0, 0, 0, 0, 0, 0
2  NYM (10)1140006x) 2020-08-02 (, 1, 0, ), 1, 1, 4, 0, 0, 0, 6, x, )
3  BOS     002200010 2020-08-13             0, 0, 2, 2, 0, 0, 0, 1, 0
4  NYM  00000(11)01x 2020-08-15    0, 0, 0, 0, 0, (, 1, 1, ), 0, 1, x
5  BOS        311200 2020-08-20                      3, 1, 1, 2, 0, 0
structure(list(team = c("NYM", "NYM", "BOS", "NYM", "BOS"), linescore = c("010000000", 
"(10)1140006x)", "002200010", "00000(11)01x", "311200"), ondate = structure(c(18475, 
18476, 18487, 18489, 18494), class = "Date"), x = list(c("0", 
"1", "0", "0", "0", "0", "0", "0", "0"), c("(", "1", "0", ")", 
"1", "1", "4", "0", "0", "0", "6", "x", ")"), c("0", "0", "2", 
"2", "0", "0", "0", "1", "0"), c("0", "0", "0", "0", "0", "(", 
"1", "1", ")", "0", "1", "x"), c("3", "1", "1", "2", "0", "0"
))), class = "data.frame", row.names = c(NA, -5L))

期望输出:

  team     linescore     ondate                             x
1  NYM     010000000 2020-08-01     0, 1, 0, 0, 0, 0, 0, 0, 0
2  NYM (10)1140006x) 2020-08-02 10, 1, 1, 4, 0, 0, 0, 6, x, )
3  BOS     002200010 2020-08-13     0, 0, 2, 2, 0, 0, 0, 1, 0
4  NYM  00000(11)01x 2020-08-15    0, 0, 0, 0, 0, 11, 0, 1, x
5  BOS        311200 2020-08-20              3, 1, 1, 2, 0, 0

如何更改(, 1, 0, )10(, 1, 1, )11,剩下的为是。

到目前为止我已经得到了一些帮助:

  1. 用于替换括号外特定字符的正则表达式仅感谢 AnilGoyal

  2. gsub("D+", "", str1) 感谢阿克伦

  3. gsub("[(,) ]", "", "(, 1, 0, )") 感谢 Anoushirvan

谢谢!

回答

我们可以在base R. 一个选项是在(...)with之外的字符之间插入一个分隔符*SKIP/*FAIL,然后删除配对()同时通过将字符捕获为一个组来保留字符,最后list通过在,with处拆分来返回strsplit

df1$x <-  strsplit(gsub("((d+))", "1,",
    gsub("([^)]+)(*SKIP)(*FAIL)|(.)", "1,", 
      df1$linescore, perl = TRUE)),",")

-输出

df1$x
[[1]]
[1] "0" "1" "0" "0" "0" "0" "0" "0" "0"

[[2]]
 [1] "10" "1"  "1"  "4"  "0"  "0"  "0"  "6"  "x"  ")" 

[[3]]
[1] "0" "0" "2" "2" "0" "0" "0" "1" "0"

[[4]]
[1] "0"  "0"  "0"  "0"  "0"  "11" "0"  "1"  "x" 

[[5]]
[1] "3" "1" "1" "2" "0" "0"


以上是用于替换dyplr工作流程中字符串内括号内的非数字字符的正则表达式的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>