EXAM NOTES: Unicode

Monday, 16 December 2013

Unicode

Unicode is a system for encoding most of the world’s writing systems at this time. Unlike other encoding systems, “Unicode encodes the graphemes (for example, letters of the alphabet, numbers, characters and punctuation marks) rather than the glyphs (individual marks that contribute meaning).” ^[1] Unicode uses a “code point,” a unique number, for each character. It ignores the size, font and shape, leaving that for other software. Its latest version, Unicode 6.0, includes more than 109,000 characters and 93 scripts. It allows for the reading of right to left writing systems.

Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was enough for all the letters, punctuation and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings pr platforms, that data always runs the risk of character to be misinterpreted Unicode provides a unique number for every character regardless of the platform, program and the language.

EXAM NOTES

Pages

Monday, 16 December 2013

Unicode

No comments:

Post a Comment