时间序列数据存储

huangapple 未分类评论68阅读模式
英文:

Time series data storage

问题

我正在收集来自同一网络上的一个服务的大量UDP数据包(与时间有关)。这些数据包被反序列化为包含数字(浮点数和整数)的结构,并在内存中进行处理。我们可以说我们正在收集时间序列数据。然而,这不是监控服务时得到的那种时间序列数据(在一段时间内基本上是相同的值)。这些值不断变化,确实变化不大,但仍然有变化。

除此之外,我想将这些数据发送到云服务器,并在该服务器上存储时间序列数据。

我的问题是:有哪些可能性可以压缩数据,以便将较小的数据包通过网络发送到服务器(我们可以将传入的UDP数据包分批通过TCP发送)并存储它们?我特别希望不使用我连接到服务器的整个存储空间。一个会话的数据接近32MB,我会同时有多个会话。一个会话的数据与另一个会话无关,它们是完全独立的。

英文:

I'm collecting a large number of UDP packets (time dependant) coming from a service on the same network. These packets are being deserialised into structures that contain numbers (float and int) in memory and processed. We could say we are collecting time series data. However, it's not the kind of time series data that you get from supervising a service (mostly the same value for a period of time). These values constantly vary, true, not by very much. But they vary, nevertheless.

Besides this, I would like to send that data to a server in the cloud and on that server store the time series data.

My question is: what possibilities are there to compress the data in order to send smaller packets over the wire to the server (we could send the incoming UDP packets in batches over TCP) and store them? I'm particularly interested in not using the whole storage I have attached to the server. The data for one session is close to 32MB and i would have multiple sessions at the same time. One session's data is not related to another session. They are totally independent.

答案1

得分: 0

你可以使用这个库来压缩时间序列数据:https://github.com/dgryski/go-tsz

它基于Facebook的这篇论文:
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

我们发现大约96%的时间戳可以压缩成一个比特。

[...]

大约51%的值由于当前值和前一个值相同,可以压缩成一个比特。约30%的值使用控制位'10'(情况b)进行压缩,平均压缩大小为26.6比特。剩下的19%使用控制位'11'进行压缩,平均大小为36.9比特,因为需要额外的13比特来编码前导零位和有效位的长度。

你可以使用像bolt这样的键值存储(或者更好的选择是支持压缩的rocksdb),为每个键存储多个数据点。例如,你可以每10分钟存储一个键值对,其中值是在该10分钟窗口内发生的所有数据点。

这样可以同时获得良好的性能和高压缩率。

英文:

You can compress time series data using this library: https://github.com/dgryski/go-tsz

It's based on this paper from Facebook:
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

> We have found that about 96% of all time stamps
can be compressed to a single bit.
>
> [...]
>
> Roughly 51% of all values are compressed to a single bit since
the current and previous values are identical. About 30% of
the values are compressed with the control bits ‘10’ (case b),
with an average compressed size of 26.6 bits. The remaining
19% are compressed with control bits ‘11’, with an average
size of 36.9 bits, due to the extra 13 bits of overhead required
to encode the length of leading zero bits and meaningful bits.

You can use a key-value store like bolt (or probably better: rocksdb which supports compression) and store multiple points for each key. For example you could store one key-value pair every 10 minutes, where the value would be all of the points that occurred during that 10 minute window.

This should give you both good performance and high compression.

huangapple
  • 本文由 发表于 2015年12月15日 17:32:10
  • 转载请务必保留本文链接:https://java.coder-hub.com/34285503.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定