从另一个文件中删除与特定模式匹配的行

html5 • 2022年11月4日 am1:32 • 问答

我有两个文件（我只显示这些文件的开头）：

模式.txt

m64071_201130_104452/13
m64071_201130_104452/26
m64071_201130_104452/46
m64071_201130_104452/49
m64071_201130_104452/113
m64071_201130_104452/147

我的文件.txt

>m64071_201130_104452/13/ccs
ACAGTCGAGCG
>m64071_201130_104452/16/ccs
ACAGTCGAGCG
>m64071_201130_104452/20/ccs
CAGTCGAGCGC
>m64071_201130_104452/22/ccs
CACACATCTCG
>m64071_201130_104452/26/ccs
TAGACAATGTA

我应该得到这样的输出：

>m64071_201130_104452/13/ccs
ACAGTCGAGCG
>m64071_201130_104452/26/ccs
TAGACAATGTA

如果 patterns.txt 中的行与 myfile.txt 中的行匹配，我想创建一个新文件。我需要保留与所讨论的模式相关联的字母 ACTG。我用：

for i in $(cat patterns.txt); do 
     grep -A 1 $i myfile.txt; done > my_newfile.txt

它可以工作，但是创建新文件的速度很慢......我处理的文件很大但不是太多（patterns.txt 为 14M，myfile.txt 为 700M）。

我也尝试使用，grep -v因为我有另一个文件，其中包含不存在于 patterns.txt 中的 myfile.txt 的其他模式。但它是相同的“速度填充文件”问题。

如果您看到解决方案..

回答

使用您显示的样本，请尝试以下操作。用 GNU 编写和测试awk。

awk '
FNR==NR{
  arr[$0]
  next
}
/^>/{
  found=0
  match($0,/.*//)
  if((substr($0,RSTART+1,RLENGTH-2)) in arr){
    print
    found=1
  }
  next
}
found
'  patterns.txt myfile.txt

说明：为以上添加详细说明。

awk '                         ##Starting awk program from here.
FNR==NR{                      ##Checking condition which will be TRUE when patterns.txt is being read.
  arr[$0]                     ##Creating array with index of current line.
  next                        ##next will skip all further statements from here.
}
/^>/{                         ##Checking condition if line starts from > then do following.
  found=0                     ##Unsetting found here.
  match($0,/.*//)            ##using match to match a regex to till / in current line.
  if((substr($0,RSTART+1,RLENGTH-2)) in arr){  ##Checking condition if sub string of matched regex is present in arr then do following.
    print                     ##Printing current line here.
    found=1                   ##Setting found to 1 here.
  }
  next                        ##next will skip all further statements from here.
}
found                         ##Printing the line if found is set.
'  patterns.txt myfile.txt    ##Mentioning Input_file names here.

以上是从另一个文件中删除与特定模式匹配的行的全部内容。

THE END

二维码

多行的SQL加权平均值-

< <上一篇

在Rust中为特定类型实现结构体的函数

下一篇>>

搜索内容

从另一个文件中删除与特定模式匹配的行

回答

目录

目录

推荐文章

最新文章