英文:
Windows Bat Find and Replace Line Breaks in Specific Lines
问题
我不是专业开发人员,需要一个简单的解决方案。我尝试使用 fart.exe 在 Windows 批处理文件中实现这一点,但在寻找需要替换换行符的确切行时遇到了困难。在一个 XML 文件中,我正在尝试做以下操作。
我需要从这个(大文件中间的几行)开始:
<meta name="xyz:moreinfohere" content="some content"/>
<meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
<meta name="abc:infoagain" content="this is confusing"/>
<meta name="xyz:blahblah" content="please help"/>
转换为这个:
<meta name="xyz:moreinfohere" content="some content"/>
<meta name="abc:evenmoreinfo" content="more content&#xa;and here is where&#xa;the problem lies"/>
<meta name="abc:infoagain" content="this is confusing"/>
<meta name="xyz:blahblah" content="please help"/>
这些字段中填充的数据将是可变的,这只是一个虚构的示例。基本上,我试图用 XA 代码替换换行符,但只替换特定的行,正如您所见。我已经成功使用 fart.exe 替换了所有的 \n\r 实例,但我无法弄清楚如何只替换所需的行。并不是每一行都以 "meta..." 开头。然而,文件中的每一行都应该以 ">" 结尾......这是文件中每一行上唯一的固定字符。请帮忙!我可以接受任何在标准 Windows 批处理文件中工作的方法(fart,java 等)。
英文:
I am not a pro developer and need a simple solution. I have tried using fart.exe within a Windows Bat file to accomplish this, but having trouble finding the exact lines I need to replace line breaks. In an XML file, here is what I am trying do.
I need to go from this (a few lines in the middle of a larger file):
<meta name="xyz:moreinfohere" content="some content"/>
<meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
<meta name="abc:infoagain" content="this is confusing"/>
<meta name="xyz:blahblah" content="please help"/>
to this:
<meta name="xyz:moreinfohere" content="some content"/>
<meta name="abc:evenmoreinfo" content="more content&#xa;and here is where&#xa;the problem lies"/>
<meta name="abc:infoagain" content="this is confusing"/>
<meta name="xyz:blahblah" content="please help"/>
The data filled in these fields will be variable, and this is a fictitious example. Basically, i am trying to replace the line breaks with the XA code, but only certain lines as you can see. I have managed to use fart.exe to replace all instances of \n\r but i can't figure out how to only do the needed ones. Not every line starts with "meta...". However every line in the files is supposed to end with ">" ...its the only constant/fixed character on every line in the files. Please help! I open to anything that works in a standard Windows Bat file (fart, java, etc.)
答案1
得分: 0
正如你所发现的,一个符合标准的 XML 解析器会将属性值中的换行替换为一个空格,除非换行符使用字符引用进行编码(例如 &#xA;
)。(参考链接)
因此,虽然我通常建议使用正确的 XML 解析器,但在这种情况下行不通,因为我们试图修复损坏的 XML(即与我们想要的含义不同的 XML)。
我们可以编写一个正确的 XML 解析器,只需不执行换行到空格的替换,然后使用它来修复文件,但这是很多工作。以下可能已经足够。
假设:
- 所有需要修复的属性值都使用双引号(不是单引号)。
- 文件中的双引号总是成对出现的。
fix.pl
:
use strict;
use warnings;
local $/;
while (<>) {
while (1) {
/\G ( [^"]+ ) /xgc
and print $1;
/\G \z /xgc
and last;
/\G ( " [^"]* " ) /xgc
and do {
print $1 =~ s/\n/&#xA;/rg;
next;
};
die("Unbalanced quotes");
}
}
使用方法:
perl fix.pl file_to_fix.xml > fixed_file.xml
或者
perl -i.bak fix.pl file_to_fix.xml
后者在备份文件后会就地修改文件。
在使用这个工具之后,使用文件比较工具(例如 Beyond Compare)来确保修复已经正确应用。
英文:
As you found out, a standard-compliant XML parser will replace a line feed in an attribute's value with a space unless the line feed is encoded using a character reference (e.g. &#xA;
). (Reference)
So while I would normally recommend using a proper XML parser, that won't work here because we're trying to fix broken XML (i.e. XML that means something different than what we want it to mean).
We could write a proper XML parser that simply doesn't perform the line feed to space substitution and use that to fix the file, but that's a lot of work. The following is probably sufficient.
Assumptions:
- All attributes values that need fixing use double-quotes (not single-quotes).
- Double-quotes are always found in pairs in the documents to be fixed.
fix.pl
:
use strict;
use warnings;
local $/;
while (<>) {
while (1) {
/\G ( [^"]+ ) /xgc
and print $1;
/\G \z /xgc
and last;
/\G ( " [^"]* " ) /xgc
and do {
print $1 =~ s/\n/&#xA;/rg;
next;
};
die("Unbalanced quotes");
}
}
Usage:
perl fix.pl file_to_fix.xml >fixed_file.xml
or
perl -i.bak fix.pl file_to_fix.xml
The latter modifies the file in-place after making a backup.
After you use this tool, use a file comparison tool (e.g. Beyond Compare) to make sure the fix was properly applied.
专注分享java语言的经验与见解,让所有开发者获益!
评论