打印CSV的前N​​行,其中引用的字段可以包含换行符

CSV 文件可以包含带新行的数据。它可以与任何列。还有一些行可以有没有任何新行的数据,所以它应该适用于所有情况

样本输入

   ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect

Thanks for your time!
With Joy.
Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect

Thanks for your time!
With Joy.
Test",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2

我正在使用以下命令读取 csv 的前 5 条记录

awk -v RS='("[^"]*")?r?n' 'NF{ORS = gensub(/r?n(.)/, "\n1", "g", RT);  ++n; print} n==5{exit}' file.csv

实际输出:

ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2

想要的输出:

ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2

回答

仅使用您显示的示例,您能否尝试以下awk代码。用 GNU 编写和测试awk。使用RS记录分隔符,然后全局替换以取消 RT 中的新行,然后相应地打印行。

awk -v RS='"[^"]*"' '{gsub(/n/,"n",RT);ORS=RT} 1' Input_file

要获取前 10 条记录,请尝试以下操作:

awk -v RS='"[^"]*"' '{gsub(/n/,"n",RT);ORS=RT} 1' Input_file | head -10


以上是打印CSV的前N​​行,其中引用的字段可以包含换行符的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>