英文:
Remove few html tag from the html dom
问题
<table>
<colgroup>
<col style="width: 20%"/>
<col style="width: 20%"/>
<col style="width: 50%"/>
<col style="width: 10%"/>
</colgroup>
<tbody>
<tr>
<th colspan="1">
<p>Header1</p>
</th>
<th colspan="2">
<p>Header2</p>
</th>
<th colspan="1">
<p><a><strong>Header3</strong></a></p>
</th>
</tr>
<tr>
<td colspan="1">
<p>Value1</p>
</td>
<td colspan="2">
<p>Value2</p>
</td>
<td colspan="1">
<p><a><strong>Value3</strong></a></p>
</td>
</tr>
</tbody>
</table>
英文:
I have below html table. I want to convert it to xml. I have done my coding as below whereby this will convert to html dom first and later i will convert it to xml. My problem is i just want to remain the <th>, <tr> <tbody>,<table> and <p> tag the rest of the tag should not be captured in the document How can i do that? As i would like to change the html table to xml table. So after that i will able to proceed to use list to insert the data to a class which will then be converted ot xml.
builder = factory.newDocumentBuilder();
is = new InputSource(new StringReader(tableInString));
document = builder.parse(is);
document.getDocumentElement().normalize();
<table style="width: 100%;">
<colgroup>
<col style="width: 20%;"/>
<col style="width: 20%;"/>
<col style="width: 50%;"/>
<col style="width: 10%;"/>
</colgroup>
<tbody>
<tr>
<th colspan="1">
<p>Header1</p>
</th>
<th colspan="2">
<span><div>Header2</div></span>
</th>
<th colspan="1">
p><a><strong>Header3</strong></a></p>
</th>
</tr>
<tr>
<td colspan="1">
<div>Value1</div>
</td>
<td colspan="2">
<span><div>Value2</div></span>
</td>
<td colspan="1">
<p><a><strong>Value3</strong></a></p>
</td>
</tr>
</tbody>
</table>
专注分享java语言的经验与见解,让所有开发者获益!
评论