UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single octet encoding used only for the 128 US-ASCII characters.
I have no idea what they're talking about. And the more detailed description is even more baffling:
The UTF-8 encoding is variable-width, ranging from 1-4 bytes. Each byte has 0-4 leading 1 bits followed by a zero bit to indicate its type. N 1 bits indicates the first byte in a N-byte sequence, with the exception that zero 1 bits indicates a one-byte sequence while one 1 bit indicates a continuation byte in a multi-byte sequence (this was done for ASCII compatability). The scalar value of the Unicode code point is the concatenation of the non-control bits.
Well here's something that's comprehensible, if your screen can display it all: a web page that has been "encoded directly in UTF-8", which explains why you might not be able to see some of the languages.
Anyway, it's pretty cool that people are able to program in a universal computer language. Too bad my brain isn't big enough to understand it :(
No comments:
Post a Comment