问题

以下是您要翻译的内容：

这是输入（PDF）的摘录：

这是我的代码：

public static String pdfPageToText(
    PDDocument docIn,
    int pageNumber
) {
    String pageText = &quot;&quot;;
    try {
        PDFTextStripper stripper = new PDFTextStripper( );
        stripper.setStartPage( pageNumber );
        stripper.setEndPage( pageNumber );
        pageText = stripper.getText( docIn );
    } catch ( Exception e ) {
        LOGGER.severe( e.getMessage( ) );
    }
    return pageText;
}

提取的文本看起来像这样：

我希望它更像这样：

请指引我正确的方向。谢谢。

英文:

Here is an excerpt of the input (PDF):

Here is my code:

    public static String pdfPageToText(
        PDDocument docIn,
        int pageNumber
    ) {
        String pageText = &quot;&quot;;
        try {
            PDFTextStripper stripper = new PDFTextStripper( );
            stripper.setStartPage( pageNumber );
            stripper.setEndPage( pageNumber );
            pageText = stripper.getText( docIn );
        } catch ( Exception e ) {
            LOGGER.severe( e.getMessage( ) );
        }
        return pageText;
    }

The extracted text looks like this:

I would expect it to be more like this:

Please point me in the right direction. Thank you.

专注分享java语言的经验与见解，让所有开发者获益！

如何让 PDFTextStripper 按行提取文本？

问题

Go like channels in Java

在低资源环境下使用Apache Cassandra和Go服务器

avatica-go客户端读取Phoenix查询服务器：[驱动程序：连接错误]

向Spring端点发送POST请求，返回状态码400。

Spring Boot控制器从Golang应用程序接收到的重定向请求会被重复执行两次。

可以在不将其读入内存的情况下多次重用HTTP请求体吗？

How to register my go lang microservice in Spring Eureka Service Discovery

在应用程序-go + BDD-java中模拟第三方服务

What is value, reference vs pointer and what these three example used to pass?

Do goroutines and light-weight Java threads mean we never need use thread pools and async code again?

发表评论