在特定行中查找并替换 Windows 批处理中的换行符

huangapple 未分类评论46阅读模式
英文:

Windows Bat Find and Replace Line Breaks in Specific Lines

问题

我不是专业开发人员,需要一个简单的解决方案。我尝试使用 fart.exe 在 Windows 批处理文件中实现这一点,但在寻找需要替换换行符的确切行时遇到了困难。在一个 XML 文件中,我正在尝试做以下操作。

我需要从这个(大文件中间的几行)开始:

    <meta name="xyz:moreinfohere" content="some content"/>
    <meta name="abc:evenmoreinfo" content="more content
    and here is where
    the problem lies"/>
    <meta name="abc:infoagain" content="this is confusing"/>
    <meta name="xyz:blahblah" content="please help"/>

转换为这个:

    <meta name="xyz:moreinfohere" content="some content"/>
    <meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
    <meta name="abc:infoagain" content="this is confusing"/>
    <meta name="xyz:blahblah" content="please help"/>

这些字段中填充的数据将是可变的,这只是一个虚构的示例。基本上,我试图用 XA 代码替换换行符,但只替换特定的行,正如您所见。我已经成功使用 fart.exe 替换了所有的 \n\r 实例,但我无法弄清楚如何只替换所需的行。并不是每一行都以 "meta..." 开头。然而,文件中的每一行都应该以 ">" 结尾......这是文件中每一行上唯一的固定字符。请帮忙!我可以接受任何在标准 Windows 批处理文件中工作的方法(fart,java 等)。
英文:

I am not a pro developer and need a simple solution. I have tried using fart.exe within a Windows Bat file to accomplish this, but having trouble finding the exact lines I need to replace line breaks. In an XML file, here is what I am trying do.

I need to go from this (a few lines in the middle of a larger file):

<meta name="xyz:moreinfohere" content="some content"/>
			<meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
			<meta name="abc:infoagain" content="this is confusing"/>
			<meta name="xyz:blahblah" content="please help"/>

to this:

			<meta name="xyz:moreinfohere" content="some content"/>
			<meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
			<meta name="abc:infoagain" content="this is confusing"/>
			<meta name="xyz:blahblah" content="please help"/>

The data filled in these fields will be variable, and this is a fictitious example. Basically, i am trying to replace the line breaks with the XA code, but only certain lines as you can see. I have managed to use fart.exe to replace all instances of \n\r but i can't figure out how to only do the needed ones. Not every line starts with "meta...". However every line in the files is supposed to end with ">" ...its the only constant/fixed character on every line in the files. Please help! I open to anything that works in a standard Windows Bat file (fart, java, etc.)

答案1

得分: 0

正如你所发现的,一个符合标准的 XML 解析器会将属性值中的换行替换为一个空格,除非换行符使用字符引用进行编码(例如 
)。(参考链接)

因此,虽然我通常建议使用正确的 XML 解析器,但在这种情况下行不通,因为我们试图修复损坏的 XML(即与我们想要的含义不同的 XML)。

我们可以编写一个正确的 XML 解析器,只需不执行换行到空格的替换,然后使用它来修复文件,但这是很多工作。以下可能已经足够。

假设:

  • 所有需要修复的属性值都使用双引号(不是单引号)。
  • 文件中的双引号总是成对出现的。

fix.pl:

use strict;
use warnings;

local $/;
while (<>) {
   while (1) {
      /\G ( [^&quot;]+ ) /xgc
         and print $1;

      /\G \z /xgc
         and last;

      /\G ( &quot; [^&quot;]* &quot; ) /xgc
         and do {
            print $1 =~ s/\n/&amp;#xA;/rg;
            next;
         };

      die("Unbalanced quotes");
   }
}

使用方法:

perl fix.pl file_to_fix.xml > fixed_file.xml

或者

perl -i.bak fix.pl file_to_fix.xml

后者在备份文件后会就地修改文件。

在使用这个工具之后,使用文件比较工具(例如 Beyond Compare)来确保修复已经正确应用。

英文:

As you found out, a standard-compliant XML parser will replace a line feed in an attribute's value with a space unless the line feed is encoded using a character reference (e.g. &amp;#xA;). (Reference)

So while I would normally recommend using a proper XML parser, that won't work here because we're trying to fix broken XML (i.e. XML that means something different than what we want it to mean).

We could write a proper XML parser that simply doesn't perform the line feed to space substitution and use that to fix the file, but that's a lot of work. The following is probably sufficient.

Assumptions:

  • All attributes values that need fixing use double-quotes (not single-quotes).
  • Double-quotes are always found in pairs in the documents to be fixed.

fix.pl:

use strict;
use warnings;

local $/;
while (&lt;&gt;) {
   while (1) {
      /\G ( [^&quot;]+ ) /xgc
         and print $1;

      /\G \z /xgc
         and last;

      /\G ( &quot; [^&quot;]* &quot; ) /xgc
         and do {
            print $1 =~ s/\n/&amp;#xA;/rg;
            next;
         };

      die(&quot;Unbalanced quotes&quot;);
   }
}

Usage:

perl fix.pl file_to_fix.xml &gt;fixed_file.xml

or

perl -i.bak fix.pl file_to_fix.xml

The latter modifies the file in-place after making a backup.

After you use this tool, use a file comparison tool (e.g. Beyond Compare) to make sure the fix was properly applied.

huangapple
  • 本文由 发表于 2020年4月7日 03:03:26
  • 转载请务必保留本文链接:https://java.coder-hub.com/61067108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定