R中的正则表达式仅适用于第n个单词

html5 • 2022年12月2日 am5:10 • 问答

在 R 正则表达式中，如何不从目标字符串的开头而是仅从第 n 个单词开始计算正则表达式？

例如，假设有人想用符号替换字符串中的任何数字
@。然后可以使用gsub("d+", "@", string)，例如：

gsub("d+", "@", "words before 879 then more words then 1001 again")

结果将是：

> "words before @ then more words then @ again"

现在，跟上那个例子，使用正则表达式，如何才能做到只有从字符串中的第 4 个单词开始出现的数字才会被替换？所以上面的例子会返回，"words before 879 then more words then @ again"因为879是目标字符串中的第三个单词？

FWIW，我发现了很多关于提取和定位单词的问题，一些是从头匹配还是从尾匹配，一些是从第 n 个单词开始或从第 n 个单词开始匹配。但是没有关于如何在查找模式时只使用正则表达式忽略字符串的前 n 个单词。

回答

我们可以创建一个proto函数gsubfn来计算单词并替换

library(gsubfn)
gsubfn("w+", proto(fun = function(this, x) if(count > 3)
          sub("d+", "@", x) else x), str1)
#[1] "words before 879 then more words then @ again"

优点之一是它可以在任何字数处插入/替换，或者可以在多个字数处进行替换，即假设我们只想替换 4 到 6 之间的单词

gsubfn("w+", proto(fun = function(this, x) if(count  %in% 4:6)
         sub("d+", "@", x) else x), str1)

或更复杂的情况

gsubfn("w+", proto(fun = function(this, x) if(count  %in% c(4:6, 12:15))
      sub("d+", "@", x) else x), str2)
#[1] "words before 879 then @ replace not 1001 again and replace @ and @"

数据

str1 <- "words before 879 then more words then 1001 again"
str2 <- "words before 879 then 50 replace not 1001 again and replace 1003 and 1005"

以上是R中的正则表达式仅适用于第n个单词的全部内容。

THE END

二维码

在for循环期间修改range()

< <上一篇

iOS开发探索多线程GCD任务示例详解

下一篇>>

搜索内容

R中的正则表达式仅适用于第n个单词

回答

数据

目录

目录

推荐文章

最新文章