从一列中提取域，同时保留其他列

html5 • 2022年9月2日 pm1:29 • 问答

我有一个包含三列的文件，如下所示：

0       1612291061      http://www.staropolska.pl/
0       1612450417      http://m.kerygma.pl/
6831926761338023936     1612171787      http://www.kerygma.pl/hermeneutyka-biblijna/377-ksiegi-starego-testamentu-mini-streszczenie
6867871457052077056     1612534199      http://www.kerygma.pl/katechizm-kkk/kkk-iv-modlitwa/538-kkk-2558-2565

我想从第三列中提取域，同时保留前两列，所以我想要一个看起来像这样的文件：

0       1612291061      http://www.staropolska.pl
0       1612450417      http://m.kerygma.pl
6831926761338023936     1612171787      http://www.kerygma.pl
6867871457052077056     1612534199      http://www.kerygma.pl

到目前为止，我能够使用 grep 提取域：

cat file.txt | grep -Eo '(http|https)://[^/"]+'

但这只给了我第三列中的域：

http://www.staropolska.pl
http://m.kerygma.pl
http://www.kerygma.pl
http://www.kerygma.pl

不打印前两个。

回答

另一种选择是cut，/用作分隔符：

$ cat file.txt | cut -d '/' -f 1-3
0       1612291061      http://www.staropolska.pl
0       1612450417      http://m.kerygma.pl
6831926761338023936     1612171787      http://www.kerygma.pl
6867871457052077056     1612534199      http://www.kerygma.pl

回答

您只需要允许grep正则表达式匹配之前的任何内容https?://：

grep -Eo '.*[[:blank:]]https?://[^/"]+' file

0       1612291061      http://www.staropolska.pl
0       1612450417      http://m.kerygma.pl
6831926761338023936     1612171787      http://www.kerygma.pl
6867871457052077056     1612534199      http://www.kerygma.pl

正则表达式解释：

.*: 匹配 0 个或多个任意字符
[[:blank:]]: 匹配一个空格或制表符
https?: 匹配https或http
://：比赛 ://
[^/"]+: 匹配任何不是 a/和 a 的字符的 1+"

或者，你也可以试试这个sed：

sed -E 's~([[:blank:]]https?://[^/"]+).*~1~' file

回答

使用中显示的样本awk，请尝试以下操作。

awk 'match($0,/.*http[s]?://[^/]*/){print substr($0,RSTART,RLENGTH)}' Input_file

说明：为以上添加详细说明。

awk '                                ##Starting awk program from here.
match($0,/.*http[s]?://[^/]*/){    ##Using match function to match regex from starting to till http/https:// till next / here.
  print substr($0,RSTART,RLENGTH)    ##Printing sub string of matched regex here.
}
' Input_file                         ##Mentioning Input_file name here.

以上是从一列中提取域，同时保留其他列的全部内容。

THE END

二维码

在Rust特征绑定中需要交换操作

< <上一篇

如果输出文件属性更改，z/OS汇编程序会奇怪地循环

下一篇>>

搜索内容

从一列中提取域，同时保留其他列

回答

回答

回答

目录

目录

推荐文章

最新文章