如何解决在使用Java Apache HttpClient 4.5.12时出现的“连接重置”问题。

huangapple 未分类评论51阅读模式
英文:

How to resolve "Connection Reset" when using Java Apache HttpClient 4.5.12

问题

我们已经与我们的一个数据提供商讨论了一个问题,即我们的HTTP请求中的一些请求间歇性地由于“连接重置”异常而失败,但我们也见过“目标服务器未能响应”的异常。

许多 Stack Overflow 的帖子指出了一些潜在的解决方案,包括:

  • 这是一个连接池配置问题,尝试重新收回
  • HttpClient 版本问题 - 建议降级到 HttpClient 4.5.1(通常从 4.5.3)来修复。我正在使用 4.5.12。https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient
  • 目标服务器实际上未能处理请求(或者是在原始服务器之前的 CloudFront)。

我希望这个问题能帮助我找到根本原因。

背景

这是一个托管在 AWS Elastic Beanstalk 上的 Java Web 应用程序,根据负载基于 2 到 4 个服务器。Java WAR 文件使用 HttpClient 4.5.12 进行通信。在过去的几个月里,我们已经看到了

45 次连接重置(仅有 3 次超过 30 秒的超时,其余的在 20 毫秒内失败)

为了将其置于背景之中,我们对该供应商进行了大约 10,000 次请求,因此错误率并不过高,但由于我们的客户支付了该服务,所以随后的失败仍然令人不便。

现在我们正在努力消除“连接重置”情况,我们已经被建议尝试以下方法:

  1. 重新启动我们的应用服务器(在万不得已的情况下)

  2. 将 DNS 服务器更改为使用 Google 的 8.8.8.8 和 8.8.4.4(以便我们的请求走不同的路径)

  3. 为每个服务器分配一个静态 IP(这样他们就可以在不经过他们的 CloudFront 分发的情况下与我们通信)

我们将逐步实施这些建议,但与此同时,我也想了解我们的 HttpClient 实现可能存在的问题。

典型用法

用户请求 --> 我们的服务器(JAX-RS 请求) --> HttpClient 发给第三方 --> 收到响应,例如 JSON/XML --> 处理响应并发送回去(我们的 JSON 格式)

技术细节

运行在 64 位 Amazon Linux 上的 Tomcat 8 和 Java 8

4.5.12 HttpClient
4.4.13 HttpCore <-- Maven 依赖显示 HttpClient 4.5.12 需要 4.4.13
4.5.12 HttpMime

通常,一个 HTTP 请求将在 200 毫秒至 10 秒之间完成,超时设置在 15-30 秒左右,具体取决于我们调用的 API。我们还使用了一个连接池,鉴于大多数请求应在 30 秒内完成,因此我认为安全地清除任何超过这段时间的连接是合适的。

关于这些是否合理的建议会很有帮助。

// 连接池中最大的连接数为 200
CONNECTIONS_MAX = 200;

// 每个第三方 API 最多可以使用 50 个连接,因此最差的情况下有 4 个 API 可能会被耗尽
CONNECTIONS_MAX_PER_ROUTE = 50;

// 由于我们的超时通常是 30 秒,我假设安全地清除连接,这是它的两倍
CONNECTION_CLOSE_IDLE_MS = 60000;

// 如果连接在 60 秒内没有被使用,那么我们就不繁忙,可以从连接池中删除
CONNECTION_EVICT_IDLE_MS = 60000;

// 这是每个请求还是每个数据包,但所有请求都应该在 30 秒内完成
CONNECTION_TIME_TO_LIVE_MS = 60000;

// 为了确保连接在连接池中,但至少 500 毫秒没有被使用,可以被验证
CONNECTION_VALIDATE_AFTER_INACTIVITY_MS = 500; // 还没有测试 500 毫秒

此外,我们倾向于将三个超时设置为 30 秒,但我确信我们可以进行微调...

// 客户端尝试连接服务器。这表示在连接建立之前经过的时间,或者服务器对连接请求作出响应的时间。
// 与远程主机建立连接所用的时间
.setConnectTimeout(...) // 典型的是 30 秒 - 我想如果在此之前无法连接,远程服务器将不可用/繁忙

// 在从连接管理器(连接池)请求连接时使用
// 从连接池获取连接所用的时间
.setConnectionRequestTimeout(...) // 典型的是 30 秒 - 我猜只有在池已饱和的情况下才适用,然后这意味着等待多长时间才能获得连接?

// 在建立连接后,客户端套接字在发送请求后等待响应。
// 这是等待数据包到达的不活动时间
.setSocketTimeout(...) // 典型的是 30 秒 - 我相信这是我们关心的主要超时,如果在 30 秒内没有获取有效载荷,那就放弃

这是我们用于所有 GET/POST 请求的主要代码,但去除了无关紧要的部分,如重试逻辑、预缓存和后缓存。

我们使用了一个单一的 PoolingHttpClientConnectionManager 和一个单一的 CloseableHttpClient,它们的配置如下...

private static PoolingHttpClientConnectionManager createConnectionManager() {
    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnection

<details>
<summary>英文:</summary>

We have been discussing with one of our data providers the issue that some of the requests from our HTTP requests are intermittently failing due to &quot;Connection Reset&quot; exceptions, but we have also seen &quot;The target server failed to respond&quot; exceptions too.

Many Stack Overflow posts point to some potential solutions, namely

- It&#39;s a pooling configuration issue, try reaping
- HttpClient version issue - suggesting downgrading to HttpClient 4.5.1 (often from 4.5.3) fixes it. I&#39;m using 4.5.12 https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient
- The target server is actually failing to process the request (or cloudfront before the origin server).  

I&#39;m hoping this question will help me get to the bottom of the root cause.

**Context**

It&#39;s a Java web application hosted in AWS Elastic Beanstalk with 2..4 servers based on load.  The Java WAR file uses HttpClient 4.5.12 to communicate.  Over the last few months we have seen 

45 x Connection Reset (only 3 were timeouts over 30s, the others failed within 20ms)

To put this into context, we perform in the region of 10,000 requests to this supplier, so the error rate isn&#39;t excessive, but it is very inconvenient because our customers pay for the service that then subsequently fails.

Right now we are trying to focus on eliminating the &quot;connection reset&quot; scenarios and we have been recommended to try the following: 

1) Restart our app servers (a desperate just-in-case scenario)

2) Change the DNS servers to use Google 8.8.8.8 &amp; 8.8.4.4 (so our request take a different path)

3) Assign a static IP to each server (so they can enable us to communicate without going through their CloudFront distribution)

We will work through those suggestions, but at the same time I want to understand where our HttpClient implementation might not be quite right.

**Typical usage** 

User Request --&gt; Our server (JAX-RS request) --&gt; HttpClient to 3rd party --&gt; Response received e.g. JSON/XML --&gt; Massaged response is sent back (Our JSON format)

**Technical details**

Tomcat 8 with Java 8 running on 64bit Amazon Linux

4.5.12 HttpClient
4.4.13 HttpCore &lt;-- Maven dependencies shows HttpClient 4.5.12 requires 4.4.13
4.5.12 HttpMime

Typically a HTTP request will take anywhere between 200ms and 10 seconds, with timeouts set around 15-30s depending on the API we are invoking.  I also use a connection pool and given that most requests should be complete within 30 seconds I felt it was safe to evict anything older than double that period.

Any advice on whether these are sensible values is appreciated.

// max 200 requests in the connection pool
CONNECTIONS_MAX = 200;

// each 3rd party API can only use up to 50, so worst case 4 APIs can be flooded before exhuasted
CONNECTIONS_MAX_PER_ROUTE = 50;

// as our timeouts are typically 30s I'm assuming it's safe to clean up connections
// that are double that

// Connection timeouts are 30s, wasn't sure whether to close 31s or wait 2xtypical = 60s
CONNECTION_CLOSE_IDLE_MS = 60000;

// If the connection hasn't been used for 60s then we aren't busy and we can remove from the connection pool
CONNECTION_EVICT_IDLE_MS = 60000;

// Is this per request or each packet, but all requests should finish within 30s
CONNECTION_TIME_TO_LIVE_MS = 60000;

// To ensure connections are validated if in the pool but hasn't been used for at least 500ms
CONNECTION_VALIDATE_AFTER_INACTIVITY_MS = 500; // WAS 30000 (not test 500ms yet)


Additionally we tend to set the three timeouts to 30s, but I&#39;m sure we can fine-tune these...

// client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
// The time to establish a connection with the remote host
.setConnectTimeout(...) // typical 30s - I guess this could be 5s (if we can't connect by then the remote server is stuffed/busy)

// Used when requesting a connection from the connection manager (pooling)
// The time to fetch a connection from the connection pool
.setConnectionRequestTimeout(...) // typical 30s - I guess only applicable if our pool is saturated, then this means how long to wait to get a connection?

// After establishing the connection, the client socket waits for response after sending the request.
// This is the time of inactivity to wait for packets to arrive
.setSocketTimeout(...) // typical 30s - I believe this is the main one that we care about, if we don't get our payload in 30s then give up


I have copy and pasted the main code we use for all GET/POST requests but stripped out the un-important aspects such as our retry logic, pre-cache and post-cache

We are using a single PoolingHttpClientConnectionManager with a single CloseableHttpClient, they&#39;re both configured as follows...

private static PoolingHttpClientConnectionManager createConnectionManager() {
    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();

    cm.setMaxTotal(CONNECTIONS_MAX); // 200
    cm.setDefaultMaxPerRoute(CONNECTIONS_MAX_PER_ROUTE); // 50
    cm.setValidateAfterInactivity(CONNECTION_VALIDATE_AFTER_INACTIVITY_MS); // Was 30000 now 500

    return cm;
}

private static CloseableHttpClient createHttpClient() {

    httpClient = HttpClientBuilder.create()
            .setConnectionManager(cm)
            .disableAutomaticRetries() // our code does the retries
            .evictIdleConnections(CONNECTION_EVICT_IDLE_MS, TimeUnit.MILLISECONDS) // 60000
            .setConnectionTimeToLive(CONNECTION_TIME_TO_LIVE_MS, TimeUnit.MILLISECONDS) // 60000
            .setRedirectStrategy(LaxRedirectStrategy.INSTANCE)
            // .setKeepAliveStrategy() - The default implementation looks solely at the &#39;Keep-Alive&#39; header&#39;s timeout token.
            .build();
    return httpClient;
}

Every minute I have a thread that tries to reap connections

public static PoolStats performIdleConnectionReaper(Object source) {
    synchronized (source) {
        final PoolStats totalStats = cm.getTotalStats();
        Log.info(source, &quot;max:&quot; + totalStats.getMax() + &quot; avail:&quot; + totalStats.getAvailable() + &quot; leased:&quot; + totalStats.getLeased() + &quot; pending:&quot; + totalStats.getPending());
        cm.closeExpiredConnections();
        cm.closeIdleConnections(CONNECTION_CLOSE_IDLE_MS, TimeUnit.MILLISECONDS); // 60000
        return totalStats;
    }
}

This is the custom method that performs all HttpClient GET/POST, it does stats, pre-cache, post-cache and other useful stuff, but I&#39;ve stripped all of that out and this is the typical outline performed for each request. I&#39;ve tried to follow the pattern as per the HttpClient docs that tell you to consume the entity and close the response.  Note I don&#39;t close the httpClient because one instance is being used for all requests.

public static HttpHelperResponse execute(HttpHelperParams params) {

    boolean abortRetries = false;

    while (!abortRetries &amp;&amp; ret.getAttempts() &lt;= params.getMaxRetries()) {

        // 1 Create HttpClient
        // This is done once in the static init CloseableHttpClient httpClient = createHttpClient(params);

        // 2 Create one of the methods, e.g. HttpGet / HttpPost - Note this also adds HTTP headers 
        // (see separate method below)
        HttpRequestBase request = createRequest(params);

        // 3 Tell HTTP Client to execute the command
        CloseableHttpResponse response = null;
        HttpEntity entity = null;
        boolean alreadyStreamed = false;

        try {

            response = httpClient.execute(request);
            if (response == null) {
                throw new Exception(&quot;Null response received&quot;);
            } else {

                final StatusLine statusLine = response.getStatusLine();
                ret.setStatusCode(statusLine.getStatusCode());
                ret.setReasonPhrase(statusLine.getReasonPhrase());

                if (ret.getStatusCode() == 429) {
                    try {
                        final int delay = (int) (Math.random() * params.getRetryDelayMs());
                        Thread.sleep(500 + delay); // minimum 500ms + random amount up to delay specified
                    } catch (Exception e) {
                        Log.error(false, params.getSource(), &quot;HttpHelper Rate-limit sleep exception&quot;, e, params);
                    }
                } else {

                    // 4 Read the response
                    // 6 Deal with the response
                    // do something useful with the response body                        
                    entity = response.getEntity();

                    if (entity == null) {
                        throw new Exception(&quot;Null entity received&quot;);
                    } else {
                        ret.setRawResponseAsString(EntityUtils.toString(entity, params.getEncoding()));
                        ret.setSuccess();
                        if (response.getAllHeaders() != null) {
                            for (Header header : response.getAllHeaders()) {
                                ret.addResponseHeader(header.getName(), header.getValue());
                            }
                        }
                    }

                }
            }

        } catch (Exception ex) {
            
            if (ret.getAttempts() &gt;= params.getMaxRetries()) {
                Log.error(false, params.getSource(), ex);
            } else {
                Log.warn(params.getSource(), ex.getMessage());
            }
            
            ret.setError(ex); // If we subsequently get a response then the error will be cleared.                
        } finally {

            ret.incrementAttempts();

            // Any HTTP 2xx are considered successfull, so stop retrying, or if
            // a specifc HTTP code has been passed to stop retring
            if (ret.getStatusCode() &gt;= 200 &amp;&amp; ret.getStatusCode() &lt;= 299) {
                abortRetries = true;
            } else if (params.getDoNotRetryStatusCodes().contains(ret.getStatusCode())) {
                abortRetries = true;
            }

            if (entity != null) {
                try {
                    // and ensure it is fully consumed - hand it back to the pool
                    EntityUtils.consume(entity);
                } catch (IOException ex) {
                    Log.error(false, params.getSource(), &quot;HttpHelper Was unable to consume entity&quot;, params);
                }

            }

            if (response != null) {
                try {
                    // The underlying HTTP connection is still held by the response object
                    // to allow the response content to be streamed directly from the network socket.
                    // In order to ensure correct deallocation of system resources
                    // the user MUST call CloseableHttpResponse#close() from a finally clause.
                    // Please note that if response content is not fully consumed the underlying
                    // connection cannot be safely re-used and will be shut down and discarded
                    // by the connection manager.                     
                    response.close();
                } catch (IOException ex) {
                    Log.error(false, params.getSource(), &quot;HttpHelper Was unable to close a response&quot;, params);
                }
            }

			// When using connection pooling we don&#39;t want to close the client, otherwise the connection
			// pool will also be closed
			//                if (httpClient != null) {
			//                    try {
			//                        httpClient.close();
			//                    } catch (IOException ex) {
			//                        Log.error(false, params.getSource(), &quot;HttpHelper Was unable to close httpClient&quot;, params);
			//                    }
			//                }


        }
    }

    return ret;
}

private static HttpRequestBase createRequest(HttpHelperParams params) {

	...
    request.setConfig(RequestConfig.copy(RequestConfig.DEFAULT)
        // client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
        // The time to establish a connection with the remote host
        .setConnectTimeout(...) // typical 30s

        // Used when requesting a connection from the connection manager (pooling)
        // The time to fetch a connection from the connection pool
        .setConnectionRequestTimeout(...) // typical 30s

        // After establishing the connection, the client socket waits for response after sending the request. 
        // This is the time of inactivity to wait for packets to arrive
        .setSocketTimeout(...) // typical 30s

        .build()
    );

    return request;
}

</details>


huangapple
  • 本文由 发表于 2020年6月6日 00:24:52
  • 转载请务必保留本文链接:https://java.coder-hub.com/62219970.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定