高效地在Java中拆分大字符串

huangapple 未分类评论48阅读模式
英文:

Efficiently split large strings in Java

问题

我有一个大字符串,应在某个字符处进行拆分,前面没有另一个特定字符。

最有效的方法是什么?

示例:在 ':' 处拆分此字符串,但不在 "?:" 处拆分:

part1:part2:https?:example.com:anotherstring

我已经尝试过的方法:

  1. 正则表达式 (?<!\?):。非常慢。

  2. 首先获取要拆分字符串的索引,然后进行拆分。只适用于字符串中没有多个拆分字符的情况。

  3. 按字符迭代字符串。在有不多的保护字符(例如 '?')的情况下效率高。

英文:

I have a large string that should be split at a certain character, if it is not preceded by another certain character.

Would is the most efficient way to do this?

An example: Split this string at ':', but not at "?:":

part1:part2:https?:example.com:anotherstring

What I have tried so far:

  1. Regex (?<!\?):. Very slow.

  2. First getting the indices where to split the string and then split it. Only efficient if there are not many split characters in the string.

  3. Iterating over the string character by character. Efficient if there are not many protect characters (e.g. '?').

答案1

得分: 0

int lastIndex = 0;
for (int index = string.indexOf(":"); index >= 0; index = string.indexOf(":", lastIndex)) {
    if (index == 0 || string.charAt(index - 1) != '?') {
        String splitString = string.substring(lastIndex, index);
        // 将 splitString 添加到列表或数组
        lastIndex = index + 1;
    }
}
// 将 string.substring(lastIndex) 添加到列表或数组
英文:

I fear you would have to go through the string and check if a ":" is preceded by a "?"

int lastIndex=0;
for(int index=string.indexOf(":"); index >= 0; index=string.indexOf(":", lastIndex)){
    if(index == 0 || string.charAt(index-1) != '?'){
        String splitString = string.subString(lastIndex, index);
        // add splitString to list or array
        lastIndex = index+1;
    }
}
// add string.subString(lastIndex) to list or array

</details>



# 答案2
**得分**: 0

你将不得不非常谨慎地测试这个(因为我没有这样做),但是在`split()`中使用正则表达式可能会产生你想要的结果:

```Java
public static void main(String[] args) {
	String s = "Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End";
	String[] result = s.split("(?<!\\?)\\.(?!\\.)");
	System.out.println(String.join("|", result));
}

输出:

Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End

注意:
这仅在点号前面不是问号的情况下考虑了按点号分割的示例。

我认为你不太可能得到比正则表达式更高效的解决方案了...

英文:

You will have to test this very carefully (since I didn't do that), but using a regular expression in the split() might produce the results you want:

public static void main(String[] args) {
	String s = &quot;Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End&quot;;
	String[] result = s.split(&quot;(?&lt;!\\?)\\.(?!\\.)&quot;);
	System.out.println(String.join(&quot;|&quot;, result));
}

Output:

Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End

Note:
This only considers your example about splitting by dot if the dot is not preceded by an interrogation mark.

I don't think you will get a much more performant solution than the regex...

huangapple
  • 本文由 发表于 2020年5月19日 19:00:15
  • 转载请务必保留本文链接:https://java.coder-hub.com/61889418.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定