正则表达式,在 Java 中匹配不在 {…} 内的空格。

huangapple 未分类评论42阅读模式
英文:

Regex that match spaces if not inside {...} in Java

问题

我正在尝试查找正则表达式,以匹配在任何组合中不位于{ }之间的每个空格。我已经想出了这个 ```\s(?![^{]*})+``` 但它没有正常工作!
<br>
<a href="https://i.stack.imgur.com/5NYW0.png>链接到图片</a> <br>
正如您在图片上所看到的,它正在匹配那些位于红色矩形中的空格,即使它们位于{ }之间,但不是恰好位于{ }之间,这是我不想要的...
<br>
这里有一些示例,以便理解我想要的正则表达式,因为它有点复杂:<br>
1. <br>
```lol.something.xd 1 F 1.99 #adasdaops {something.awesome 4 8 9} null T;``` = 在这里,只需匹配不在{ }内的每个空格!
<br>
2. <br>
```lol.something.xd-1-F-1.99-#adasdpopu-{something.awesome 4 8 9 5 {#adsadasdasd} null F T {something.awesome.Foo 7 8 9 4545 T} null null F T}-45.5F-null-{alpha.beta.gama 45}-45-null;``` = 仅匹配表示为“-”的空格
<br>
我需要这个正则表达式来在Java中分割字符串,如果有影响的话。
<br><br>
注意:不幸的是,Java不支持递归正则表达式,因此请在未来的回答中注意!
<br>
附注:由于某种原因,网页上没有显示出我的问题中的代码框。也许只有我看不到它,但如果您也看不到,并且知道原因,请修复一下。
英文:

<br>
I am trying to find Regular Expression that will match every space if it is not between { } in any combination. I have already come up with this one \s(?![^{]*})+ but it's not working correctly!<br>
<a href="https://i.stack.imgur.com/5NYW0.png>link to image</a> <br>
As you can see on the image its matching those spaces marked in red rectangle even though they are between { }, only not exatly between { } but between { { and that something that I do not want to... <br>
There are some examples to understand what regex I want because its a little but complex: <br>

  1. <br>
    lol.something.xd 1 F 1.99 #adasdaops {something.awesome 4 8 9} null T; = Match every space here simply if not inside { }!
    <br>
  2. <br>
    lol.something.xd-1-F-1.99-#adasdpopu-{something.awesome 4 8 9 5 {#adsadasdasd} null F T {something.awesome.Foo 7 8 9 4545 T} null null F T}-45.5F-null-{alpha.beta.gama 45}-45-null; = Match only spaces that are represented as "-"
    <br>
    I need this regex to split Strings in Java if it matters.
    <br><br>
    Note: Unfortunately, Java does not support recursive regex so keep it on mind in future answers!
    <br>
    Side Note: The web page is not showing code fences on my question from some reason. Maybe only I do not see it, but If you too and you know why then please fix it.

答案1

得分: 0

"匹配X,但不在Y内时"的一般思路是先匹配Y,然后将其丢弃,然后匹配X。

匹配任意嵌套的大括号本身就不是一项容易的任务,因为它需要一个递归正则表达式。

下面是一个适用于你需求的正则表达式,以及一些解释:

/(\{(?:(?&gt;[^{}]+)|(?1))*\})(*SKIP)(*FAIL)| /g

真是一团糟!

/
  ( # 第一个子模式
    \{ # 匹配左大括号
    (?: # 子模式
      (?&gt; # 原子组(不回溯,避免灾难性回溯)
        [^{}]+ # 匹配一个或多个非大括号字符
      )
    | # 或者...
      (?1) # 递归匹配第一个子模式(即嵌套的大括号)
    )* # 零次或多次
    \} # 匹配右大括号
  )
  (*SKIP) # 跳过此部分的主题字符串
  (*FAIL) # 总是失败,这将使大括号的内容不在匹配范围内
| # 或者(即如果跳过的部分没有匹配)
[space] # 匹配空格
/g # 查找所有匹配项

然而,并非所有类型的正则表达式都支持像 (*SKIP)(*FAIL) 这样的动作,其他一些不支持递归正则表达式。在这些情况下,你将需要某种类型的解析器。

这将需要更多的工作,但一般的思路是将字符串分解,逐字符进行处理。如果是 {,增加一个计数器。如果是 },减少计数器。如果是空格,只有在计数器为零时才对其进行处理。

英文:

The general idea for "match X, but not when it's inside Y" is to match Y, discard it, and then match X.

Matching arbitrarily nested braces is in itself no easy task, as it requires a recursive regex.

Here's a regex that should work for your needs, and some explanation:

/(\{(?:(?&gt;[^{}]+)|(?1))*\})(*SKIP)(*FAIL)| /g

What a mess!

/
  ( # first subpattern
    \{ # match an opening brace
    (?: # subpattern
      (?&gt; # atomic group (don&#39;t backtrack this, avoids Catastrophic Backtrack)
        [^{}]+ # match one or more non-brace characters
      )
    | # or...
      (?1) # recursively match the first subpattern (ie. nested braces)
    )* # zero or more times
    \} # match a closing brace
  )
  (*SKIP) # skip this part of the subject string
  (*FAIL) # always fails, which excludes the contents of the braces from the match
| # or (ie. if the skipped part doesn&#39;t happen)
[space] # match a space
/g # find all matches

However, not all flavours of regex support verbs like (*SKIP)(*FAIL), and others don't support recursive regexes. In these cases, you will need some kind of parser.

That's going to be more work, but the general idea will be to break your string down and go character-by-character. If it's a {, increment a counter. If it's }, decrease the counter. If it's a space, then do something with it only if the counter is zero.

答案2

得分: 0

最终我成功地为自己解决了这个问题!根据发布的答案,我意识到在这种情况下正则表达式不起作用。因此,我决定使用一些常规的Java方法来解决它,最终我得出了这个方法:

public static String[] splitValues(String s)
{
    List<String> resault = new ArrayList<>();

    for (int i = 0, brackets = 0, lastIndex = 0; i < s.length(); i++)
    {
        if (s.charAt(i) == ' ' && brackets == 0)
            resault.add(s.substring(lastIndex == 0 ? 0 : lastIndex + 1, lastIndex = i));
        else if (s.charAt(i) == '{')
            brackets++;
        else if (s.charAt(i) == '}')
        {
            if (brackets > 0)
                brackets--;
            else
                throw new IllegalArgumentException("Unclosed or missing: { or } at index " + i);
        }

        if (i == s.length()-1 && brackets > 0)
            throw new IllegalArgumentException("Unclosed or missing: { or }");
        else if (i == s.length()-1)
            resault.add(s.substring(lastIndex == 0 ? 0 : lastIndex + 1, s.length()));
    }

    return resault.toArray(new String[0]);
}

虽然这段代码又臭又长,而且并没有解决我最初提出的正则表达式问题,但它解决了我主要的问题,这才是关键。但如果有人最终找到了一个正则表达式的解决方案,欢迎随时发布出来!但说实话,我并不认为这是可能的,因为请记住,发布Java时没有递归正则表达式!
另外,上面的函数现在是SerialX的一部分,是我的一个库!

英文:

Finally I manage to solve this problem for my self!
Based on posted answers I realized that regex is not going to work in this case. So I decided to solve it using some regular Java stuff and I come up with this freak:

	public static String[] splitValues(String s)
{
	List&lt;String&gt; resault = new ArrayList&lt;&gt;();
	
	for (int i = 0, brackets = 0, lastIndex = 0; i &lt; s.length(); i++)
	{
		if (s.charAt(i) == &#39; &#39; &amp;&amp; brackets == 0)
			resault.add(s.substring(lastIndex == 0 ? 0 : lastIndex + 1, lastIndex = i));
		else if (s.charAt(i) == &#39;{&#39;)
			brackets++;
		else if (s.charAt(i) == &#39;}&#39;)
		{
			if (brackets &gt; 0)
				brackets--;
			else
				throw new IllegalArgumentException(&quot;Unclosed or missing: { or } at index &quot; + i);
		}
		
		if (i == s.length()-1 &amp;&amp; brackets &gt; 0)
			throw new IllegalArgumentException(&quot;Unclosed or missing: { or }&quot;);
		else if (i == s.length()-1)
			resault.add(s.substring(lastIndex == 0 ? 0 : lastIndex + 1, s.length()));
	}
	
	return resault.toArray(new String[0]);
}

It is ugly and long and also not solving the regex thing that I have originally asked but it solves the main problem for me and that is the point. But If somebody eventually finds a regex solution just feel free to post it! But after all, I simply do not think so it is possible because remember while posting Java has no recursive regex! <br>
Also, that function above is now a part of SerialX, one of my libs!

huangapple
  • 本文由 发表于 2020年7月25日 00:42:54
  • 转载请务必保留本文链接:https://java.coder-hub.com/63077923.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定