根据字段数添加其他字段
我在文件中有以下格式的数据
"123","XYZ","M","N","P,Q"
"345",
"987","MNO","A,B,C"
我总是希望在行中有 5 个条目,因此如果需要添加 2 中的字段数,则需要添加 3 个额外的 ("")。
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""
我查看了页面上的解决方案
根据字段数添加额外的字符串 - Sed/Awk
它有非常相似的要求,但是当我尝试时它失败了,因为我在字段中也有逗号 (,)。
谢谢。
回答
在awk带有您显示的示例的GNU中,请尝试以下代码。
awk -v s1=""" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
nof=(NF>nof?NF:nof)
next
}
NF<nof{
val=""
i=($0~/,$/?NF:NF+1)
for(;i<=nof;i++){
val=(val?val OFS:"")s1 s1
}
sub(/,$/,"")
$0=$0 OFS val
}
1
' Input_file Input_file
说明:为以上添加详细说明。
awk -v s1=""" -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program from here setting FPAT to csv file parsing here.
BEGIN{ OFS="," } ##Starting BEGIN section of this program setting OFS to comma here.
FNR==NR{ ##Checking condition FNR==NR here, which will be true for first time file reading.
nof=(NF>nof?NF:nof) ##Create nof to get highest NF value here.
next ##next will skip all further statements from here.
}
NF<nof{ ##checking if NF is lesser than nof then do following.
val="" ##Nullify val here.
i=($0~/,$/?NF:NF+1) ##Setting value of i as per condition here.
for(;i<=nof;i++){ ##Running loop till value of nof matches i here.
val=(val?val OFS:"")s1 s1 ##Creating val which has value of "" in it.
}
sub(/,$/,"") ##Removing ending , here.
$0=$0 OFS val ##Concatinate val here.
}
1 ##Printing current line here.
' Input_file Input_file ##Mentioning Input_file names here.
编辑:在此处添加此代码,其中保留一个名为的变量nof,我们可以在其中给出我们应该在所有缺失行中添加的最小字段值的数量,如果任何行的字段值超过最小字段值,则将使用该值添加缺少字段行中的许多字段。
awk -v s1=""" -v nof="5" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
nof=(NF>nof?NF:nof)
next
}
NF<nof{
val=""
i=($0~/,$/?NF:NF+1)
for(;i<=nof;i++){
val=(val?val OFS:"")s1 s1
}
sub(/,$/,"")
$0=$0 OFS val
}
1
' Input_file Input_file
- It works .. Thank you .. Any reason why there are entries for input file?
- Thank you for you detailed explaination.
回答
这是 GNU awkFPAT在[你] 总是希望在行中有 5 个条目时使用的一个:
$ awk '
BEGIN {
FPAT="([^,]*)|("[^"]+")"
OFS=","
}
{
NF=5 # set NF to limit too long records
for(i=1;i<=NF;i++) # iterate to NF and set empties to ""
if($i=="")
$i=""""
}1' file
输出:
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""
- Since you need to loop anyway, doing `NF=5` first isn't actually doing anything for you in this case except adding some more work/time to the execution. Just get rid of `NF=5` and make the loop `for(i=1;i<=5;i++)` and it'll produce the same output but run a bit faster.
- Thank you .. This solution also works ..
- I hadn't considered that there might be more than 5 fields. I wonder if the desired behavior in that case really is to remove data or if we should add null fields up to the number present - I'd think it'd be the latter as @RavinderSingh13 is doing but idk. `NF=5` would add fields if there were less than 5 but what it'd do if there were more than 5 is undefined behavior, though you are already using FPAT and it'd do what you want in gawk.
回答
这是一个awk适用于任何版本的命令awk:
awk -v n=5 -v ef=',""' -F '","' '
{
sub(/,+$/, "")
for (i=NF; i<n; ++i)
$0 = $0 ef
} 1' file
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""