The inscription requires four letters, so why does the symbol "𝛑" comply with the rules with just one character?

First the BRC-20 protocol domo-2.gitbook.io/brc-20-experim…

This link is the BRC-20 protocol document, which requires the ticker field to be a 4-letter identifier, such as "ordi", so why can "𝛑" be just one character? This is because BRC-20 is just a protocol standard. When this standard is turned into computer implementation, "4 character identification" is equated to "4 bytes". Therefore, it is actually judged that the BRC-20 inscription The standard for whether the name (Tick) is valid is that the name must be four bytes of characters. This involves knowledge of character encoding.

Next, we need to have a basic concept, which is character encoding. How computers represent letters, as shown in the following link

blog.csdn.net/zhusongziye/ar…

This article talks about information encoding, involving two knowledge points, ASCII code and Unicode standard.

The Unicode standard, which encodes most of the world's writing systems. Simply put, it is a numbering system. For example, the Unicode encoding of 𝛑 is U+1D6D1 (this is hexadecimal). When converted to our commonly used decimal system, it is 120529. That is, "𝛑" is the 120529th character in the Unicode standard. .

Now that there is Unicode, how does the number correspond to the binary representation? There are many solutions: mainly UTF-8, UTF-16, and UTF-32.

What we most commonly use now is UTF-8, so the binary representation of "ordi" encoded in UTF-8 is 01101111(o), 01110010(r), 01100100(d), 01101001(i), each 8 bits represents one character, Exactly four bytes

The UTF-8 binary representation of "𝛑" is, 11110000 10011101 10011011 10010001, which is also four bytes.

In fact, under the UTF-8 standard, how to determine how many bytes a character is? Just look at the number of "1"s in front of the binary representation. If it is 0, it is a single byte. The binary representation of "𝛑" has four "1"s in front of it, so it is four bytes.

UTF-8 is a variable-length character encoding method for the Unicode character set. A maximum of 4 bytes can be used for a character.

When a character requires 4 bytes to encode, its Unicode code point range is U+10000 to U+10FFFF.

Within this scope, “𝛑” is the most special and has the most universally known meaning.