英文:
JSOUP missing tag when converting html row
问题
[<html>
<head></head>
<body>
<div class="content-wrapper">
<p><strong><span class="CLASS 1 CLASS 2 CLASS 3">123</span></strong><br><strong>DATA 1</strong></p>
</div>
</body>
</html>]
英文:
I having problem with jsoup whereby i want to get a row of data which later I will be inserting the row into another html document. But when i inspect time saw that there is no <tr> and <t> tag. How can i solve it
String htmlcontent = "<tr><td colspan=\"2\"><div class=\"content-wrapper\"><p><strong><span class=\"CLASS 1 CLASS 2 CLASS 3\">123</span></strong><br /><strong>DATA 1</strong></p></td><td></td><td></td><td></td><td></td></tr>";
Document docnewinput = Jsoup.parse(htmlcontent, "UTF-8");
[<html>
<head></head>
<body>
<div class="content-wrapper">
<p><strong><span class="CLASS 1 CLASS 2 CLASS 3">123</span></strong><br><strong>DATA 1</strong></p>
</div>
</body>
</html>]
答案1
得分: 0
你有一个HTML片段(例如,包含几个p标签的div;与完整的HTML文档相对)需要解析。
使用Jsoup.parseBodyFragment(String html)
方法。
String html = "<table><tr><td colspan=\"2\"><div class=\"content-wrapper\"><p><strong><span class=\"CLASS 1 CLASS 2 CLASS 3\">123</span></strong><br /><strong>DATA 1</strong></p></td><td></td><td></td><td></td><td></td><td></td></tr></table>";
Document doc = Jsoup.parseBodyFragment(html);
parseBodyFragment
方法会创建一个空的文档结构,并将解析后的HTML插入到body
元素中。如果你使用正常的Jsoup.parse(String html)
方法,通常会得到相同的结果,但是显式地将输入视为正文片段,可以确保用户提供的任何有问题的HTML都被解析到body
元素中。
无论提供的HTML是否格式良好,解析器都会尽力创建一个干净的解析结果。它可以处理:
- 未闭合的标签(例如,
<p>Lorem <p>Ipsum
会解析为<p>Lorem</p> <p>Ipsum</p>
) - 隐式标签(例如,一个孤立的
<td>Table data</td>
会被包裹在<table><tr><td>...</td></tr></table>
中) - 可靠地创建文档结构(包含头部和正文的html,并且头部中只有适当的元素)
使用Jsoup.parse()
的示例:
String html = "<table><tr><td colspan=\"2\"><div class=\"content-wrapper\"><p><strong><span class=\"CLASS 1 CLASS 2 CLASS 3\">123</span></strong><br /><strong>DATA 1</strong></p></td><td></td><td></td><td></td><td></td><td></td></tr></table>";
Document doc = Jsoup.parse(html);
工作演示:https://try.jsoup.org/~EdJSrHl_biDcQkyhL2BLH5ZNnck
英文:
You have a fragment of body HTML (e.g. a div containing a couple of p tags; as opposed to a full HTML document) that you want to parse.
Use the Jsoup.parseBodyFragment(String html)
method.
String html = "<table><tr><td colspan=\"2\"><div class=\"content-wrapper\"><p><strong><span class=\"CLASS 1 CLASS 2 CLASS 3\">123</span></strong><br /><strong>DATA 1</strong></p></td><td></td><td></td><td></td><td></td></tr></table>";
Document doc = Jsoup.parseBodyFragment(html);
The parseBodyFragment
method creates an empty shell document, and inserts the parsed HTML into the body element. If you used the normal Jsoup.parse(String html)
method, you would generally get the same result, but explicitly treating the input as a body fragment ensures that any bozo HTML provided by the user is parsed into the body element.
The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. It handles:
unclosed tags (e.g. <p>Lorem <p>Ipsum parses to <p>Lorem</p> <p>Ipsum</p>)
implicit tags (e.g. a naked <td>Table data</td> is wrapped into a <table><tr><td>...)
reliably creating the document structure (html containing a head and body, and only appropriate elements within the head)
EDIT:
By using Jsoup.parse():
String html = "<table><tr><td colspan=\"2\"><div class=\"content-wrapper\"><p><strong><span class=\"CLASS 1 CLASS 2 CLASS 3\">123</span></strong><br /><strong>DATA 1</strong></p></td><td></td><td></td><td></td><td></td></tr></table>";
Document doc = Jsoup.parse(html);
Working Demo: https://try.jsoup.org/~EdJSrHl_biDcQkyhL2BLH5ZNnck
答案2
得分: 0
需要使用xmlParser(),这样它将只是将字符串作为原样读取,而不会对其进行格式化。
英文:
Need to use xmlParser() so that it will just read the string as it without formatting it.
专注分享java语言的经验与见解,让所有开发者获益!
评论