Difference between character and byte stream translation to character for characters that can be represented with 1byte

huangapple 未分类评论46阅读模式
英文:

Difference between character and byte stream translation to character for characters that can be represented with 1byte

问题

从Java文档中,在这里,它说道:
>字符流使用字节流执行物理I/O,而字符流处理字符和字节之间的转换

字符流类字节流类一样使用read()write()方法读取(2)字节时,这是如何发生的,但是每次只读取一个字节,并且它们都返回整数类型,可以映射为char

假设对于所有可以用1字节编码的字符

除了字节数量之外,这里是否有特殊的转换?与字节流转换为字符相比,又有什么不同?或者是什么处理了字节流中的字符和字节之间的转换?

英文:

From the java documentation
Here,
It says that:
>The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes

How does this happen,when the character stream class read() and write() bytes(2) just as the byte stream class does,but 1 at a time and they all return integer types which can be mapped to char.

Assuming for all characters that can be encoded with 1byte

What is the special translation here apart from the number of byte and how does it differ from byte stream conversion to character or what handles translation between character and byte for the byte stream?

-

答案1

得分: 0

  • 二进制数据:byte,InputStream/OutputStream
  • 文本(Unicode):String,char(UTF-16 2字节),Reader/Writer

这意味着两者之间应该始终存在转换,使用二进制数据的字符集编码。

令人烦恼的是:

  • InputStream.read() 返回一个int:一个字节值,或者-1 = 文件结尾。

正如在Windows下发生的那样,单字节编码具有相同的前128个字符,纯7位ASCII,StandardCharsets.US_ASCII。但是它们在剩余的128字节(希腊语,俄语,斯洛伐克语,挪威语等)上有所不同。

Java决定在内部使用Unicode,因此所有脚本都可以组合在一个字符串中。

> 因此,总是需要进行转换。String不应该用于非文本二进制数据。

英文:
  • Binary data: byte, InputStream/OutputStream;
  • Text (Unicode): String, char (UTF-16 2 bytes), Reader/Writer.

This means there should always be a conversion between both, using the Charset encoding of the binary data.

What is irritating:

  • InputStream.read() returns an int: a byte value, or -1 = end-of-file.

Single byte encodings as they happen under Windows have the same first 128 chars, in pure 7-bits ASCII, StandardCharsets.US_ASCII. But they differ for the remaining 128 bytes (Greek, Russian, Slowakian, Norwegian etcetera).

Java made the decision to internally use Unicode, so all scripts can be combined in one String.

> Hence there always is a conversion. And String should never be used for non-text binary data.

huangapple
  • 本文由 发表于 2020年4月9日 18:11:54
  • 转载请务必保留本文链接:https://java.coder-hub.com/61118790.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定