如何在Java中通过分页处理大型JSON网络调用?

huangapple 未分类评论47阅读模式
英文:

How to handle huge json over network call through pagination in java?

问题

我有这个现有的 API 在循环中运行,它会创建一个连接,并且通过 restTemplate 从一个服务中获取巨大的 JSON(1 GB)。每次迭代都会从服务中获取 JSON 数据,然后通过以下方式将响应转换为复杂的 Java 对象:

ResponseEntity<String> response = restTemplate.exchange(uri.get().toString(), HttpMethod.POST, entity, String.class);

然后,响应会通过 GSON 转换为复杂的 Java 对象。以上方法的问题是,RestTemplate 会通过 StringBuffer 将输入流转换为字符串,最终会创建大量的 char[] 数组,当循环迭代次数过多时,内存会耗尽(OOM),而这通常是情况。我甚至尝试了使用 HttpClient 替代 RestTemplate,但它也会做相同的事情(扩展字符数组)。

为了解决内存耗尽的问题,我对 API 进行了重构,将数据流式写入文件。在每次迭代中创建一个临时文件,然后按以下方式转换为对象:

File targetFile = new File("somepath\\response.tmp");
FileUtils.copyInputStreamToFile(response.getEntity().getContent(), targetFile);
List<Object> objects = gson.fromJson(reader, new TypeToken<List<Object>>(){}.getType());

这种方式是否可行,还是有更有效的方法来解决这类问题?也许可以考虑在每次迭代中使用连接池来代替创建新连接(这是否会有显著改进)?

此外,在使用 4GB xmx、2GB xms 堆的情况下,对 API 进行分析,jvisual 显示如下:

如何在Java中通过分页处理大型JSON网络调用?

正如所示,运行的线程被分配了大量字节。

API 运行时的堆大小:

如何在Java中通过分页处理大型JSON网络调用?

英文:

I have this existing API running in loop which creates a connection and fetches huge json(1 GB) in each iteration from a service through restTemplate as follow:

ResponseEntity&lt;String&gt; response = restTemplate.exchange(uri.get().toString(), HttpMethod.POST,entity,
                    String.class);

The response is then converted to a complex java object through GSON. The problem with above approach is, rest template converts the inputstream to String through StringBuffer which ends up creating lots of char[] eventually running out of memory (OOM) when the loop is iterated for too long which usually is the case. In place of RestEntity, I even used HttpClient, it too does the same (expand char array).

To solve OOM issue, I refactored the API to stream the data to file. Creating a temp file in each iteration and converting to objects as follow:

File targetFile = new File(&quot;somepath\\response.tmp&quot;);
FileUtils.copyInputStreamToFile(response.getEntity().getContent(), targetFile);
List&lt;Object&gt; objects = gson.fromJson(reader, new TypeToken&lt;List&lt;Object&gt;&gt;(){}.getType());

is this the way to go or is there any effective approach to solve such problems? Maybe pooling connections instead of creating new in each iteration (will this be considerable change?)

Also, on analysing the API on 4GB xmx, 2GB xms heap, jvisual shows the below:

如何在Java中通过分页处理大型JSON网络调用?

As it can be seen, the running thread has been allocated huge bytes.

Heap size during the API runtime:

如何在Java中通过分页处理大型JSON网络调用?

huangapple
  • 本文由 发表于 2020年4月11日 06:07:08
  • 转载请务必保留本文链接:https://java.coder-hub.com/61149370.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定