
huangapple 未分类评论65阅读模式

Rules regarding auto encoding of messages into base64 while submitting to SQS









I am developing an application in which clients (written in multiple languages - Go, C++, Python, C#, Java, Perl and possibly more in the future) submit protobuf (and in some cases, JSON) messages to SQS. At the other end, the messages are read and decoded by Python and Go clients - depending on the message type. Boto seems to automatically encode the messages into base64, but other language libraries don't seem to do so. Or maybe there are some other rules?

Boto does have an option to submit raw messages.

What is the expected behavior here? Am I supposed to encode messages into base64 on my own - which makes boto an odd case - or am I missing something?

This has caused some subtle bugs in my application because an of extra layer of base64 encoding or decoding. As far as I know, there is no idiomatic way to detect whether a message is base64 encoded or not. The best option is to try to decode and see if it throws an exception - something I don't really like.

I tried to look for some documentation, but couldn't find anything with clear guidelines. Maybe I was looking at the wrong places?

Thanks in advance for any pointers.


得分: 6


以下列表显示了根据W3C XML规范允许在消息中使用的字符(Unicode表示)。要获取更多信息,请访问http://www.w3.org/TR/REC-xml/#charsets。如果发送的字符不在列表中,请求将被拒绝。

#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]


Base64字母表显然在这个范围内,这使得使用Base64编码的消息不可能被拒绝为无效。当然,它也会增加负载大小,因为Base64将原始消息的每3个字节扩展为4个输出字节(64个符号限制每个输出字节携带6位可用信息,3 x 8 → 4 x 6)。







You probably want to encode your messages as something because SQS does not accept every possible byte combination in message payload, at the API. Only valid UTF-8, tab, newline, and carriage return are supported.


>The following list shows the characters (in Unicode) allowed in your message, according to the W3C XML specification. For more information, go to http://www.w3.org/TR/REC-xml/#charsets If you send any characters not included in the list, your request will be rejected.

>#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]


The base64 alphabet clearly falls in this range, making it impossible for a message with base64 encoding to be rejected as invalid. Of course, it also bloats your payload, since base64 expands every 3 bytes of the original message into 4 bytes of output (64 symbols limits each output byte to carrying 6 bits of usable information, 3 x 8 → 4 x 6).

Presumably boto automatically base64-encodes and decodes messages for you in order to be "helpful."

But there is no reason why base64 has to be used at all.

An example that comes to mind... valid JSON would also comply with the restricted character ranges supported by SQS payloads. (Theoretically, I guess, JSON could be argued not to be an "encoding," but that would be a bit pedantic).

There is no clean way to determine whether a message needs to be decoded more than once, other than the sketchy one you proposed, but the argument could be made that if you are in a situation where the need to decode is ambiguous, then that should be eliminated.

If boto's behavior weren't documented and there were no way to make it behave otherwise, I'd say it is wrong behavior. But, as it is, I'll have to relent a bit and say it's just unusual.

  • 本文由 发表于 2015年10月8日 23:00:06
  • 转载请务必保留本文链接:https://java.coder-hub.com/33019426.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
