合并和删除文件中的冗余行

html5 • 2022年9月8日 pm1:29 • 问答

我需要合并几个文件，删除冗余线路中的文件，同时保持冗余线路内的文件。我的文件的示意图如下：

文件1.txt

文件2.txt

文件3.txt

所需的输出是：

我更愿意在 awk、bash 或 R 语言中获得解决方案。我在网上搜索了解决方案，尽管有很多解决方案*（请在下面找到一些示例），但无论它们位于文件内还是文件外，都删除了重复的行。

提前致谢。阿图罗

以前删除文件内外冗余行的解决方案示例：https
:
//unix.stackexchange.com/questions/50103/merge-two-lists-while-removing-duplicates https://unix.stackexchange.com/questions/ 457320/combine-text-files-and-delete-duplicate-lines
https://unix.stackexchange.com/questions/350520/awk-combine-two-big-files-and-remove-duplicated-lines
https:// unix.stackexchange.com/questions/257467/merging-2-files-and-keeping-the-one-duplicate

回答

使用您显示的样本，您能否尝试以下操作。这不会删除文件中的冗余行，但会明智地删除它们。

awk '
FNR==1{
  for(key in current){
    total[key]
  }
  delete current
}
!($0 in total)
{
  current[$0]
}
' file1.txt file2.txt  file3.txt

说明：为以上添加详细说明。

awk '                                ##Starting awk program from here.
FNR==1{                              ##Checking condition if its first line(of each file) then do following.
  for(key in current){               ##Traverse through current array here.
    total[key]                       ##placing index of current array into total(for all files) one.
  }
  delete current                     ##Deleting current array here.
}
!($0 in total)                       ##If current line is NOT present in total then do following.
{
  current[$0]                        ##Place current line into current array.
}
' file1.txt file2.txt  file3.txt     ##Mentioning Input_file names here.

回答

这是添加到/sf/answers/1076955631/ usingdiff及其输出格式的技巧。这里可能有一个“排序”的假设，未经测试。

out=$(mktemp -p .)
tmpout=$(mktemp -p .)
trap 'rm -f "${out}" "${tmpout}"' EXIT
for F in ${@} ; do
    { cat "${out}" ;
      diff --changed-group-format='%>' --unchanged-group-format='' "${out}" "${F}" ;
    } > "${tmpout}"
    mv "${tmpout}" "${out}"
done
cat "${out}"

输出：

$ ./question.sh F*
1
2
3
3
4
5
6
7
8
8
9
10
10
11

$ diff <(./question.sh F*) Output.txt

（根据markp-fuso 的评论，如果File3.txt有两个9s，这将保留两者。）

以上是合并和删除*文件中*的冗余行的全部内容。

THE END

二维码

Haskell中的*>和>>有什么区别？

< <上一篇

为什么我不能到处使用undefined？

下一篇>>

搜索内容

合并和删除文件中的冗余行

回答

回答

目录

目录

推荐文章

最新文章