ComputersInformation Technology

Unicode encoding: character encoding standard

Each user of the Internet in attempts to configure one or another of his functions even once saw on the display the word "Unicode" written in Latin letters . What is it, you will learn by reading this article.

Definition

The Unicode encoding is a character encoding standard. It was proposed by the non-profit organization Unicode Inc. In 1991. The standard is designed to combine as many different types of symbols as possible in one document. The page, which is created on its basis, can contain letters and hieroglyphs from different languages (from Russian to Korean) and mathematical signs. However, all characters in this encoding are displayed without problems.

Reasons for creating

Once, long before the Unicode system appeared, the encoding was chosen based on the preferences of the author of the document. For this reason, often to read one document, you had to use different tables. Sometimes it had to be done several times, which significantly complicated the life of an ordinary user. As already mentioned, the solution to this problem in 1991 was proposed by the non-profit organization Unicode Inc., which proposed a new type of character encoding. He was called upon to combine morally obsolete and diverse standards. "Unicode" - encoding, which allowed to achieve the unthinkable at that time: to create a tool that supports a huge number of characters. The result surpassed many expectations - documents appeared that simultaneously contained both English and Russian text, Latin and mathematical expressions.

But the creation of a unified coding was preceded by the need to resolve a number of problems that arose because of the huge variety of standards that already existed at the time. The most common ones are:

  • Elven letters, or "krakozyabry";
  • Limited character set;
  • The problem of encoding conversion;
  • Duplication of fonts.

A short historical digression

Imagine that the yard is 80's. Computer technology is not so widespread and has a look different from today. At that time, each OS in its own way is unique and has been finalized by each enthusiast for specific needs. The need for information exchange turns into an additional revision of everything in the world. Attempting to read a document created under another OS often displays an incomprehensible set of characters on the screen, and games with encoding begin. It's not always possible to do this quickly, and sometimes the required document can be opened in half a year, or even later. People who often exchange information create conversion tables for themselves. And here work on them reveals an interesting detail: they need to be created in two directions: "from my to yours" and back. To make a banal inversion of calculations the machine can not, for it in the right column a source code, and in the left column - result, but in any way on the contrary. If there was a need to use any special characters in the document, they needed to be added first, and then also explained to the partner what he needed to do so that these characters did not turn into "karkozyabry." And let's not forget that for each encoding we had to develop or implement our own fonts, which led to the creation of a huge number of duplicates in the OS.

Imagine also that on the font page you will see 10 pieces of identical Times New Roman with small notations: for UTF-8, UTF-16, ANSI, UCS-2. Now do you understand that the development of a universal standard was an urgent necessity?

"Fathers-creators"

The origins of the creation of Unicode should be sought in 1987, when Joe Becker of Xerox, along with Lee Collins and Mark Davis from Apple, began research into the practical creation of a universal character set. In August 1988, Joe Becker published a draft proposal for the creation of a 16-bit international multilingual coding system.

A few months later, the Unicode working group was expanded to include Ken Whistler and Mike Kernegan of RLG, Glenn Wright of Sun Microsystems and several other specialists, which allowed the completion of the work on the preliminary formation of a single coding standard.

general description

Unicode is based on the concept of a symbol. By this definition we mean an abstract phenomenon existing in a concrete form of writing and realized through graphemes (its "portraits"). Each character is set in Unicode by a unique code belonging to a specific block of the standard. For example, grapheme B is in both English and Russian alphabets, but in Unicode it corresponds to 2 different characters. They are converted to a lowercase letter, that is, each of them is described by a database key, a set of properties, and a full name.

Advantages of Unicode

From other contemporaries, the Unicode coding was characterized by a huge reserve of characters for "encrypting" characters. The fact is that his predecessors had 8 bits, that is, they supported 28 characters, but the new development had already 216 characters, which was a giant step forward. This allowed to encode almost all existing and distributed alphabets.

With the advent of Unicode, there was no need to use conversion tables: as a single standard, it simply nullified their need. Likewise, "krakozyabry" - the single standard made them impossible, as well as eliminated the need to create duplicate fonts.

Unicode development

Of course, progress does not stand still, and 25 years have passed since the first presentation. However, Unicode encoding stubbornly maintains its position in the world. In many respects this became possible due to the fact that it became easily implemented and spread, being recognized by the developers of proprietary (paid) and open source software.

At the same time, we should not assume that today we have the same Unicode encoding as a quarter of a century ago. At the moment, its version changed to 5.x.x, and the number of encoded characters increased to 231. From the possibility to use a larger stock of characters refused to still maintain support for Unicode-16 (encodings where the maximum number was limited to 216). Since its inception and up to version 2.0.0, "Unicode-standard" has increased the number of characters that it included, almost 2 times. The growth of opportunities continued in the following years. To version 4.0.0 there was already a need to increase the standard itself, which was done. As a result, Unicode has acquired the form in which we know it today.

What else is there in Unicode?

In addition to the huge, constantly increasing number of characters, Unicode-coding of textual information has one more useful feature. We are talking about the so-called normalization. Instead of scrolling through the entire document symbol by character and substituting the corresponding icons from the match table, one of the existing normalization algorithms is used. What are we talking about?

Instead of spending computer resources on regular checking of the same symbol, which can be similar in different alphabets, a special algorithm is used. It allows you to take out similar characters in a separate graph of the lookup table and refer to them already, and not repeatedly check all the data.

There are four such algorithms developed and implemented. In each of them, the transformation occurs according to a strictly defined principle, which differs from the others; therefore, it is not possible to name one of them the most effective. Each was developed for specific needs, was introduced and successfully used.

The spread of the standard

For 25 years of its history, Unicode encoding has probably received the greatest distribution in the world. Under this standard, programs and web pages are also adjusted. The breadth of application can be said by the fact that Unicode today uses more than 60% of Internet resources.

Now you know when the standard "Unicode" appeared. What it is, you also know and will be able to appreciate the whole value of the invention made by a group of specialists from Unicode Inc. More than 25 years ago.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.birmiss.com. Theme powered by WordPress.