问题

我正在收集来自同一网络上的一个服务的大量UDP数据包（与时间有关）。这些数据包被反序列化为包含数字（浮点数和整数）的结构，并在内存中进行处理。我们可以说我们正在收集时间序列数据。然而，这不是监控服务时得到的那种时间序列数据（在一段时间内基本上是相同的值）。这些值不断变化，确实变化不大，但仍然有变化。

除此之外，我想将这些数据发送到云服务器，并在该服务器上存储时间序列数据。

我的问题是：有哪些可能性可以压缩数据，以便将较小的数据包通过网络发送到服务器（我们可以将传入的UDP数据包分批通过TCP发送）并存储它们？我特别希望不使用我连接到服务器的整个存储空间。一个会话的数据接近32MB，我会同时有多个会话。一个会话的数据与另一个会话无关，它们是完全独立的。

英文:

I'm collecting a large number of UDP packets (time dependant) coming from a service on the same network. These packets are being deserialised into structures that contain numbers (float and int) in memory and processed. We could say we are collecting time series data. However, it's not the kind of time series data that you get from supervising a service (mostly the same value for a period of time). These values constantly vary, true, not by very much. But they vary, nevertheless.

Besides this, I would like to send that data to a server in the cloud and on that server store the time series data.

My question is: what possibilities are there to compress the data in order to send smaller packets over the wire to the server (we could send the incoming UDP packets in batches over TCP) and store them? I'm particularly interested in not using the whole storage I have attached to the server. The data for one session is close to 32MB and i would have multiple sessions at the same time. One session's data is not related to another session. They are totally independent.

答案1

得分: 0

你可以使用这个库来压缩时间序列数据：https://github.com/dgryski/go-tsz

它基于Facebook的这篇论文：
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

我们发现大约96%的时间戳可以压缩成一个比特。

[...]

大约51%的值由于当前值和前一个值相同，可以压缩成一个比特。约30%的值使用控制位'10'（情况b）进行压缩，平均压缩大小为26.6比特。剩下的19%使用控制位'11'进行压缩，平均大小为36.9比特，因为需要额外的13比特来编码前导零位和有效位的长度。

你可以使用像bolt这样的键值存储（或者更好的选择是支持压缩的rocksdb），为每个键存储多个数据点。例如，你可以每10分钟存储一个键值对，其中值是在该10分钟窗口内发生的所有数据点。

这样可以同时获得良好的性能和高压缩率。

英文:

You can compress time series data using this library: https://github.com/dgryski/go-tsz

It's based on this paper from Facebook:
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

> We have found that about 96% of all time stamps
can be compressed to a single bit.
>
> [...]
>
> Roughly 51% of all values are compressed to a single bit since
the current and previous values are identical. About 30% of
the values are compressed with the control bits ‘10’ (case b),
with an average compressed size of 26.6 bits. The remaining
19% are compressed with control bits ‘11’, with an average
size of 36.9 bits, due to the extra 13 bits of overhead required
to encode the length of leading zero bits and meaningful bits.

You can use a key-value store like bolt (or probably better: rocksdb which supports compression) and store multiple points for each key. For example you could store one key-value pair every 10 minutes, where the value would be all of the points that occurred during that 10 minute window.

This should give you both good performance and high compression.

专注分享java语言的经验与见解，让所有开发者获益！

时间序列数据存储

问题

答案1

Go like channels in Java

在低资源环境下使用Apache Cassandra和Go服务器

avatica-go客户端读取Phoenix查询服务器：[驱动程序：连接错误]

向Spring端点发送POST请求，返回状态码400。

Spring Boot控制器从Golang应用程序接收到的重定向请求会被重复执行两次。

可以在不将其读入内存的情况下多次重用HTTP请求体吗？

How to register my go lang microservice in Spring Eureka Service Discovery

在应用程序-go + BDD-java中模拟第三方服务

What is value, reference vs pointer and what these three example used to pass?

Do goroutines and light-weight Java threads mean we never need use thread pools and async code again?

发表评论