英文:
XMLStreamException on valid XML
问题
I'm currently facing a strange issue, which does not happen frequently. My application unmarshals an XML file using STaX with JAXB and Java-Streams (XMLStreamReader
) with several million rows and imports these objects to a database on startup if the XML has been changed. So far, this is working correctly, except on some devices (approximately 5% of over 1000 devices). On these devices, I get a javax.xml.stream.XMLStreamException
. Sometimes a restart helps and the XML could be successfully processed. The XML itself always has the same content on all devices, so XML and XSD are both valid.
The exception also does not always occur in the same place. For example:
>Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[2650616,17]
Message: Element type "XX" must be followed by either attribute specifications, ">" or "/>".
Later:
>[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3272359,14]
Message: Element type "XY" must be followed by either attribute specifications, ">" or "/>".]
The entire application is running in a microservice architecture, but there are no dependencies on other services. During startup, a lot happens as each microservice initializes its own state. It seems to me that there might be some memory issues, as it's not reproducible and the microservices on the devices don't differ in their versions.
Before optimizing the unmarshalling process, I would like to be able to reproduce the issue first to ensure that any improvements are effective. When I try to reduce Xmx and Xms, I might get an OutOfMemoryException
but never an XMLStreamException
.
Right now, I'm asking myself:
- When and why may an
XMLStreamException
occur, and how can I reproduce this behavior? - Why does this happen infrequently, as all devices should be the same?
- Should I switch to SAX, which is more memory-efficient?
Thanks for all the help in advance.
英文:
I'm currently facing a strange issue, which does not happen frequently. My application unmarshals a XML file using STaX with JAXB and Java-Streams (XMLStreamReader
) with several millions rows and import these objects to a database on startup if XML has been changed. So far this is working correctly, except on some devices (approximately 5% of over 1000 devices). On these devices I got a javax.xml.stream.XMLStreamException
. Sometimes a restart helps and the XML could be successfully processed. The XML itself has always the same content on all devices, so XML and XSD are both valid.
The exception also not always occur on same place. E.g:
>Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[2650616,17]
Message: Element type "XX" must be followed by either attribute specifications, ">" or "/>".
Later:
>[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3272359,14]
Message: Element type "XY" must be followed by either attribute specifications, ">" or "/>".]
The whole application is running in a microservice architecture, but there are no dependencies to other services. On startup there happens a lot as each microservice initializes his own state. For me it seems, that there might be some memory issues as it's not reproducible and the microservices on the devices don't differ in their versions.
Before optimizing unmarshalling process I would like to be able to reproduce the issue first to ensure, that any improvements are working. When I try to reduce Xmx and Xms I'll might get OutOfMemoryException
but never XMLStreamException
.
Right now I'm asking myself,
- When and why may
XMLStreamException
occur and how can I reproduce this behaviour? - Why this may happen not frequently, as all devices should be the same?
- Should I switch to SAX which is more memory-efficient?
Thanks for all help in advance.
答案1
得分: 0
以下是翻译好的内容:
-
你看到的变化几乎肯定是由于输入的变化,而不是设备故障。
-
这些错误表明流不是格式良好的 XML。(实际上,文本数据甚至在技术上都不是 XML;它引发了预验证解析错误。)
-
这是一个简单的例子,其中包含不是格式良好的 XML,会生成这种错误:
<r a='''/>
注意,在属性值中有一个未转义的
'
。当代码从源中提取数据但未进行转义并将其写入属性值时,很容易发生这种情况。可变性将来自数据的变化。例如,大多数名称中都没有'
,但O'Toole
中有。
记录下失败的确切 XML 作为下一步来调试问题,如评论中 @vanje 所提到的。
另请参见
英文:
There's not enough information in your question to allow a definitive answer, but we can help you hone in on the problem.
-
The variations you're seeing are almost certainly due to input variations, not device failures.
-
The errors indicate that the stream is not well-formed XML. (The textual data is technically not even XML; it's causing a pre-validation parsing error.)
-
Here is a simple example of not-well-formed XML that would generate such errors:
<r a='''/>
Notice that there's an unescaped
'
within an attribute value. This can easily happen when code pulls data from a source, fails to escape it, and writes it into an attribute value. The variability would arise from data variability. For example, most names do not have'
in them, butO'Toole
does.
Log the exact XML that's failing as a next step to debug the problem, as mentioned by @vanje in comments.
See also
专注分享java语言的经验与见解,让所有开发者获益!
评论