A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it’d be easy enough to adjust the standard to allow for an extra byte(s) if necessary – it’s been done before.
This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)
Unicode is the standard that says “the thing we call captial A is the 65th character”, literally defining a mapping from numbers to concepts.
UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.
I wasn’t even aware those were characters you could like put into text
unicode is great isnt it,
it supports almost all writing systems
for example here is the transcript of one of the complaints about ea-nasirs shitty copper :
𒀀 𒈾 𒂍 𒀀 𒈾 𒍢 𒅕
𒀀 𒈾 𒂍 𒀀 𒈾 𒍢 𒅕
𒆠 𒉈 𒈠
𒌝 𒈠 𒈾 𒀭 𒉌 𒈠
𒀀 𒉡 𒌑 𒈠 𒋫 𒀠 𒇷 𒆪
𒆠 𒀀 𒄠 𒋫 𒀝 𒁉 𒄠
𒌝 𒈠 𒀜 𒋫 𒀀 𒈠
𒄖 𒁀 𒊑 𒁕 𒄠 𒆪 𒁴
𒀀 𒈾 𒄀 𒅖 𒀭 𒂗𒍪 𒀀 𒈾 𒀜 𒁲 𒅔
𒋫 𒀠 𒇷 𒅅 𒈠 𒋫 𒀝 𒁉 𒀀 𒄠
𒌑 𒆷 𒋼 𒁍 𒍑
𒄖 𒁀 𒊑 𒆷 𒁕 𒄠 𒆪 𒁴
𒀀 𒈾 𒈠 𒅈 𒅆 𒅁 𒊑 𒅀
𒋫 𒀸 𒆪 𒌦 𒈠 𒌝 𒈠 𒀜 𒋫 𒈠
𒋳 𒈠 𒋼 𒇷 𒆠 𒀀 𒇷 𒆠 𒀀
𒋳 𒈠 [𒆷] 𒋼 𒇷 𒆠 𒀀 𒀜 𒆷 𒅗
𒅀 𒋾 𒀀 𒈾 𒆠 𒈠 𒈠 𒀭 𒉌 𒅎
𒌅 𒅆 𒅎 𒈠 𒉌 𒈠
𒆠 𒀀 𒄠 𒋼 𒈨 𒊭 𒀭 𒉌
𒈠 𒊑 𒀀 𒉿 𒇷 𒀀 𒈾 𒆠 𒈠 𒅗 𒋾
𒀀 𒈾 𒆠 𒋛 𒅀 𒈠 𒄩 𒊑 𒅎
𒀸 𒁍 𒊏 𒄠 𒈠
𒌅 𒈨 𒄿 𒊭 𒄠 𒈠
𒄿 𒈾 𒂵 𒂵 𒅈 𒈾 𒀝 𒊑 𒅎
𒅖 𒋾 𒅖 𒋗 𒅇 𒅆 𒉌 𒋗
𒊑 𒆪 𒋢 𒉡 𒌅 𒋼 𒅕 𒊏 𒄠
𒄿 𒈾 𒀀 𒇷 𒅅 𒋼 𒂖 𒈬 𒌦
𒈠 𒀭 𒉡 𒌝 𒊭 𒆠 𒀀 𒄠
𒄿 𒁍 𒊭 𒀭 𒉌 𒄿 𒈠
𒀜 𒋫 𒈠 𒅈 𒅆 𒅁 𒊑 𒅀 𒌅 𒈨 𒂊 𒅖
𒀀 𒈾 𒈠 𒆷 𒅗 𒊍 𒉿 𒅎
𒊭 𒄿 𒈾 𒂵 𒋾 𒅀 𒌅 𒊺 𒍪 𒌑
𒆠 𒀀 𒄠 𒋫 𒁕 𒁍 𒌒
𒅇 𒀸 𒋳 𒄿 𒅗
𒀀 𒈾 𒂍 𒃲 𒇷
𒌋 𒐍 𒄘 𒍏 𒀀 𒈾 𒆪 𒀜 𒁲 𒅔
𒅇 𒋗 𒈪 𒀀 𒁍 𒌝
𒌋 𒐍 𒄘 𒍏 𒄿 𒁲 𒅔
𒂊 𒍣 𒅁 𒊭 𒀀 𒈾 𒂍 𒀭 𒌓
𒆪 𒉡 𒊌 𒅗 𒄠 𒉌 𒍣 𒁍
𒀀 𒈾 𒉿 𒊑 𒅎 𒊭 𒀀 𒋾
𒆠 𒄿 𒋼 𒁍 𒊭 𒀭 𒉌
𒆠 𒋛 𒄿 𒈾 𒂵 𒂵 𒅈 𒈾 𒀝 𒊑
𒌅 𒊌 𒋾 𒅋
𒆠 𒋛 𒀀 𒈾 𒂵 𒋾 𒅀
𒋗 𒇻 𒈠 𒄠 𒂊 𒇷 𒅗 𒄿 𒋗
𒆠 𒈠 𒀭 𒉌 𒆠 𒀀 𒄠
𒉿 𒊑 𒀀 𒄠 𒆷 𒁺 𒈬 𒂵 𒄠
𒆷 𒀀 𒈠 𒄩 𒊒 𒅗 𒋫 𒆷 𒈠 𒀜
𒄿 𒈾 𒆠 𒊓 𒇷 𒅀
𒅖 𒋾 𒈾 𒀀 𒌑 𒈾 𒍝 𒀝 𒈠
𒂊 𒇷 𒆠
𒅇 𒀀 𒈾 𒊭 𒌅 𒈨 𒄿 𒊭 𒀭 𒉌
𒈾 𒋛 𒄴 𒋫 𒄠 𒂊 𒁍 𒍑 𒅗
in the original cuneiform as a copypasta
I wonder if Nanni would be satisfied to know that over 3,700 years later, people still know that Ea-Nasir was an asshole?
How do I save comments on kbin?
Uninstall kbin and install something better
Screenshot it
Thanks to Unicode we have the many-eyes seraphim: ꙮ
biblically accurate Typography,
quite literally
How many unicode characters could you add to the standard until it becomes unreliable?
aparently unicode supports about 1.1 million characters, and we
currently only use 96,382 as of version 4.0EDIT: i just read that unicode 4.0 is very outdated, current version is unicode 15.1 with 149,878 characters.
I am developing a language consisting of only communicating in different versions of zip-archive bombs
A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it’d be easy enough to adjust the standard to allow for an extra byte(s) if necessary – it’s been done before.
This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)
Unicode is the standard that says “the thing we call captial A is the 65th character”, literally defining a mapping from numbers to concepts.
UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.