英文:
Extract text from unselected PDF content using java PDFBox
问题
以下是翻译好的内容:
我可以轻松地从PDF文件中获取内容,但我有一些文件,当我打开它时,其中的文本是不可选择的。我现有的代码不能用以下代码块提取这些文本 -
public class PDFBoxExample {
public static void main(String[] args) {
try {
File file = new File("C:\\pdf\\pdf_result.pdf");
try (PDDocument document = PDDocument.load(new FileInputStream(file))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(false);
stripper.setShouldSeparateByBeads(true);
PDFTextStripper tStripper = new PDFTextStripper();
String content = tStripper.getText(document);
System.out.println(content);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
请在以下链接中找到我的PDF文件 -
https://1drv.ms/b/s!AmRKaLhGJhJphvMOUBGADveatrx0hA?e=a0seG7
你能否为此提供一些解决方案。
英文:
I can easily get the content from the PDF file, but I got some file which text is not selectable when I open it. My existing code doesn't able to extract those text with following code block -
public class PDFBoxExample {
public static void main(String[] args) {
try {
File file = new File("C:\\pdf\\pdf_result.pdf");
try (PDDocument document = PDDocument.load(new FileInputStream(file))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(false);
stripper.setShouldSeparateByBeads(true);
PDFTextStripper tStripper = new PDFTextStripper();
String content = tStripper.getText(document);
System.out.println(content);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Please find the following link of my pdf file-
https://1drv.ms/b/s!AmRKaLhGJhJphvMOUBGADveatrx0hA?e=a0seG7
Can you please provide some solution for the same.
专注分享java语言的经验与见解,让所有开发者获益!
评论