基于字符串结尾的正则表达式过滤数据集
我正在使用 REGEXP 过滤具有 10 行的数据集,如下所示:
ID Product
1 "VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)"
2 "MINOXIDIL POWDER"
3 "MENTHOL LOZENGE 10 MG"
4 "ZINC CHLORIDE GRANULES"
5 "CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)"
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
10 "ZONACORT 7 DAY"
并且会让它看起来像
ID Product
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
实际上,我想根据最后一个字符是否是括号内的数字来过滤数据集。我试过使用无济于事:
ID Product
1 "VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)"
2 "MINOXIDIL POWDER"
3 "MENTHOL LOZENGE 10 MG"
4 "ZINC CHLORIDE GRANULES"
5 "CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)"
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
10 "ZONACORT 7 DAY"
回答
在 中base R,我们可以使用grepl左括号 ( () 后跟一位或多位数字 ( d+),然后匹配字符串)末尾 ( $)的右括号 ( )
subset(df1, grepl("(d+)$", Product))
# ID Product
#6 6 METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)
#7 7 DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)
#8 8 METHYLPREDNISOLONE DOSE P (16)
#9 9 MILLIPRED DP (13)
数据
df1 <- structure(list(ID = 1:10, Product = c("VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)",
"MINOXIDIL POWDER", "MENTHOL LOZENGE 10 MG", "ZINC CHLORIDE GRANULES",
"CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)", "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)",
"DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)", "METHYLPREDNISOLONE DOSE P (16)",
"MILLIPRED DP (13)", "ZONACORT 7 DAY")), class = "data.frame", row.names = c(NA,
-10L))