XMLStreamException在有效的XML上发生

huangapple 未分类评论60阅读模式
英文:

XMLStreamException on valid XML

问题

I'm currently facing a strange issue, which does not happen frequently. My application unmarshals an XML file using STaX with JAXB and Java-Streams (XMLStreamReader) with several million rows and imports these objects to a database on startup if the XML has been changed. So far, this is working correctly, except on some devices (approximately 5% of over 1000 devices). On these devices, I get a javax.xml.stream.XMLStreamException. Sometimes a restart helps and the XML could be successfully processed. The XML itself always has the same content on all devices, so XML and XSD are both valid.

The exception also does not always occur in the same place. For example:
>Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[2650616,17]
Message: Element type "XX" must be followed by either attribute specifications, ">" or "/>".

Later:
>[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3272359,14]
Message: Element type "XY" must be followed by either attribute specifications, ">" or "/>".]

The entire application is running in a microservice architecture, but there are no dependencies on other services. During startup, a lot happens as each microservice initializes its own state. It seems to me that there might be some memory issues, as it's not reproducible and the microservices on the devices don't differ in their versions.

Before optimizing the unmarshalling process, I would like to be able to reproduce the issue first to ensure that any improvements are effective. When I try to reduce Xmx and Xms, I might get an OutOfMemoryException but never an XMLStreamException.

Right now, I'm asking myself:

  • When and why may an XMLStreamException occur, and how can I reproduce this behavior?
  • Why does this happen infrequently, as all devices should be the same?
  • Should I switch to SAX, which is more memory-efficient?

Thanks for all the help in advance.

英文:

I'm currently facing a strange issue, which does not happen frequently. My application unmarshals a XML file using STaX with JAXB and Java-Streams (XMLStreamReader) with several millions rows and import these objects to a database on startup if XML has been changed. So far this is working correctly, except on some devices (approximately 5% of over 1000 devices). On these devices I got a javax.xml.stream.XMLStreamException. Sometimes a restart helps and the XML could be successfully processed. The XML itself has always the same content on all devices, so XML and XSD are both valid.

The exception also not always occur on same place. E.g:
>Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[2650616,17]
Message: Element type "XX" must be followed by either attribute specifications, ">" or "/>".

Later:
>[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3272359,14]
Message: Element type "XY" must be followed by either attribute specifications, ">" or "/>".]

The whole application is running in a microservice architecture, but there are no dependencies to other services. On startup there happens a lot as each microservice initializes his own state. For me it seems, that there might be some memory issues as it's not reproducible and the microservices on the devices don't differ in their versions.

Before optimizing unmarshalling process I would like to be able to reproduce the issue first to ensure, that any improvements are working. When I try to reduce Xmx and Xms I'll might get OutOfMemoryException but never XMLStreamException.

Right now I'm asking myself,

  • When and why may XMLStreamException occur and how can I reproduce this behaviour?
  • Why this may happen not frequently, as all devices should be the same?
  • Should I switch to SAX which is more memory-efficient?

Thanks for all help in advance.

答案1

得分: 0

以下是翻译好的内容:

  1. 你看到的变化几乎肯定是由于输入的变化,而不是设备故障。

  2. 这些错误表明流不是格式良好的 XML。(实际上,文本数据甚至在技术上都不是 XML;它引发了预验证解析错误。)

  3. 这是一个简单的例子,其中包含不是格式良好的 XML,会生成这种错误:

    <r a='''/>
    

    注意,在属性值中有一个未转义的 '。当代码从源中提取数据但未进行转义并将其写入属性值时,很容易发生这种情况。可变性将来自数据的变化。例如,大多数名称中都没有 ',但O'Toole 中有。

记录下失败的确切 XML 作为下一步来调试问题,如评论中 @vanje 所提到的。

另请参见

英文:

There's not enough information in your question to allow a definitive answer, but we can help you hone in on the problem.

  1. The variations you're seeing are almost certainly due to input variations, not device failures.

  2. The errors indicate that the stream is not well-formed XML. (The textual data is technically not even XML; it's causing a pre-validation parsing error.)

  3. Here is a simple example of not-well-formed XML that would generate such errors:

    <r a='''/>
    

    Notice that there's an unescaped ' within an attribute value. This can easily happen when code pulls data from a source, fails to escape it, and writes it into an attribute value. The variability would arise from data variability. For example, most names do not have ' in them, but O'Toole does.

Log the exact XML that's failing as a next step to debug the problem, as mentioned by @vanje in comments.

See also

huangapple
  • 本文由 发表于 2020年7月27日 19:09:20
  • 转载请务必保留本文链接:https://java.coder-hub.com/63114061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定