问题

以下是翻译好的内容：

我可以轻松地从PDF文件中获取内容，但我有一些文件，当我打开它时，其中的文本是不可选择的。我现有的代码不能用以下代码块提取这些文本 -

public class PDFBoxExample {
    public static void main(String[] args) {
        try {
            File file = new File("C:\\pdf\\pdf_result.pdf");
            try (PDDocument document = PDDocument.load(new FileInputStream(file))) {
                document.getClass();
                if (!document.isEncrypted()) {
                    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                    stripper.setSortByPosition(false);
                    stripper.setShouldSeparateByBeads(true);
                    PDFTextStripper tStripper = new PDFTextStripper();

                    String content = tStripper.getText(document);
                    System.out.println(content);
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

请在以下链接中找到我的PDF文件 -
https://1drv.ms/b/s!AmRKaLhGJhJphvMOUBGADveatrx0hA?e=a0seG7

你能否为此提供一些解决方案。

英文:

I can easily get the content from the PDF file, but I got some file which text is not selectable when I open it. My existing code doesn't able to extract those text with following code block -

public class PDFBoxExample {
    public static void main(String[] args) {
        try {
            File file = new File(&quot;C:\\pdf\\pdf_result.pdf&quot;);
            try (PDDocument document = PDDocument.load(new FileInputStream(file))) {
                document.getClass();
                if (!document.isEncrypted()) {
                    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                    stripper.setSortByPosition(false);
                    stripper.setShouldSeparateByBeads(true);
                    PDFTextStripper tStripper = new PDFTextStripper();

                    String content = tStripper.getText(document);
                    System.out.println(content);
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Please find the following link of my pdf file-
https://1drv.ms/b/s!AmRKaLhGJhJphvMOUBGADveatrx0hA?e=a0seG7

Can you please provide some solution for the same.

专注分享java语言的经验与见解，让所有开发者获益！

使用Java的PDFBox从未选中的PDF内容中提取文本。

问题

Go like channels in Java

在低资源环境下使用Apache Cassandra和Go服务器

avatica-go客户端读取Phoenix查询服务器：[驱动程序：连接错误]

向Spring端点发送POST请求，返回状态码400。

Spring Boot控制器从Golang应用程序接收到的重定向请求会被重复执行两次。

可以在不将其读入内存的情况下多次重用HTTP请求体吗？

How to register my go lang microservice in Spring Eureka Service Discovery

在应用程序-go + BDD-java中模拟第三方服务

What is value, reference vs pointer and what these three example used to pass?

Do goroutines and light-weight Java threads mean we never need use thread pools and async code again?

发表评论