在段落中查找匹配的字符串
我有一个包含 LaTeX 数学方程的 TXT 文件,其中每个内联方程前后使用单个 $ 分隔符。
我想在一个段落中找到每个方程,并用 XML 开始和结束标记替换分隔符......
例如,
以下段落:
This is the beginning of a paragraph $first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$
应该变成:
This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>
我已经尝试了 sed 和 perl 命令,例如:
perl -p -e 's/($)(.*[^$])($)/<equation>$2</equation>/'
但是这些命令会导致方程的第一个和最后一个实例被转换,但不会转换这两个之间的方程:
This is the beginning of a paragraph <equation>first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>
我还想要一个强大的解决方案,它可以考虑到不用作 LaTeX 分隔符的单个 $ 的存在。例如,
This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$
不会变成:
This is the beginning of a paragraph <equation>first equation$ ...and here is some text that includes a single dollar sign: He paid <equation>2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>
注意:我是用 Bash 编写的。
回答
注意:此答案的第一部分仅侧重于替换成对的$'s; 对于 OP不替换独立的请求$'s......请参阅答案的第二部分。
更换对 $'s
样本数据:
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$
一个sed想法:
sed -E 's|$([^$]*)$|<equation>1</equation>|g' latex.txt
在哪里:
-E- 启用扩展正则表达式支持$- 匹配文字$([^$]*)- [捕获组 #1] - 匹配不是文字的$所有内容(在这种情况下,一对 之间的所有内容$'s)$- 匹配文字$<equation>1</equation>-替换匹配字符串<equation>+contents of capture group+</equation>/g- 根据需要重复搜索/替换
这会产生:
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>
单独处理 $
如果独立$可以被转义(例如,$),一个想法是sed用无意义的文字替换它,执行<equation> / </equation>替换,然后将无意义的文字改回$.
样本数据:
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$
... $first equation$ ... $3.50 cup of coffee ... $third equation$
sed带有新替代品的原始解决方案:
sed -E 's|$|LITDOL|g;s|$([^$]*)$|<equation>1</equation>|g;s|LITDOL|$|g' latex.txt
在我们替换$为LITDOL(LITeral DOLlar) 的地方,执行我们原来的替换,然后切换LITDOL回$.
产生:
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>
... <equation>first equation</equation> ... $3.50 cup of coffee ... <equation>third equation</equation>