ComputersProgramming

ASCII (American standard code for information interchange) - basic text encoding for the Latin alphabet

According to the International Telecommunication Union, in 2016 three and a half billion people used the Internet with this or that regularity. Most of them do not even think that any messages sent by them through PCs or mobile gadgets, as well as texts that are displayed on all kinds of monitors, are actually combinations of 0 and 1. Such a representation of information is called coding. It provides and greatly facilitates the implementation of its storage, processing and transmission. In 1963, the American encoding ASCII was developed, which is the subject of this article.

Presentation of information in the computer

From the point of view of any electronic computer, the text is a collection of individual symbols. To their number belong not only letters, including capital letters, but also punctuation marks, numbers. In addition, special symbols "=", "&", "(" and spaces are used.

The set of symbols that make up the text is called the alphabet, and their number is the power (denoted as N). To determine it, use the expression N = 2 ^ b, where b is the number of bits or the information weight of a particular symbol.

It is proved that an alphabet with a capacity of 256 characters allows to represent all necessary symbols.

Since 256 is the 8th power of two, the weight of each symbol is 8 bits.

A unit of 8 bits is called 1 byte, so it is customary to say that the binary code of any character in the text stored on the computer occupies one byte of memory.

How coding works

Any texts are entered into the memory of the personal computer using the keys of the keyboard, on which are written numbers, letters, punctuation marks and other symbols. In memory, they are transmitted in binary code, ie, each character is matched with the customary human decimal code, from 0 to 255, which corresponds to a binary code - from 00000000 to 11111111.

Byte-byte character encoding allows the processor performing text processing to access each character separately. At the same time, 256 characters is enough to represent any character information.

Character encoding ASCII

This abbreviation in English stands for American standard code for information interchange.

Even at the dawn of computerization it became obvious that you can come up with a wide variety of ways of encoding information. However, to transfer information from one computer to another, it was necessary to develop a single standard. So, in 1963 in the United States appeared the table of ASCII encoding. In it, any symbol of the computer alphabet is assigned its serial number in the binary representation. Initially, the ASCII encoding was used only in the United States, and then became the international standard for the PC.

Table Contents

The ASCII codes are divided into 2 parts. The international standard is only the first half of this table. It includes symbols with ordinal numbers from 0 (encoded as 00000000) to 127 (code 01111111).

Serial number

N

ASCII text encoding

Symbol

0 - 31

0000 0000 - 0001 1111

Symbols with N from 0 to 31 are called managers. Their function is to "guide" the process of outputting text to a monitor or a printing device, giving an audio signal, etc.

32 - 127

0010 0000 - 0111 1111

Characters with N from 32 to 127 (the standard part of the table) - upper and lowercase letters of the Latin alphabet, 10 digits, punctuation marks, as well as various brackets, commercial and other symbols. The symbol 32 denotes a space.

128 - 255

1000 0000 - 1111 1111

Symbols with N from 128 to 255 (an alternative part of the table or code page) can have different variants, each of which has its own number. Code page is used to specify the national alphabets, which are different from Latin. In particular, it is with its help that ASCII is encoded for Russian characters.

In the encoding table, uppercase and lowercase letters go one after the other in alphabetical order, and the numbers - in ascending order of values. This principle is also preserved for the Russian alphabet.

Control characters

The ASCII encoding table was originally created to receive and transmit information over a device that has not been used for a long time, like a teletype. In connection with this, nonprinting was included in the character set, used as commands for controlling this device. Similar commands were used in such precomputer messaging methods as the Morse code, and so on.

The most common "teletype" symbol is NUL (00, "zero"). It is still used in most programming languages, denoting the end-of-line character.

Where the ASCII encoding is used

American standard code is needed not only for typing text information from the keyboard. It is also used in graphics. In particular, in the ASCII Art Maker program, images of various extensions represent a range of ASCII character symbols.

Similar products are of two types: performing the function of graphic editors by converting images into text and converting "drawings" into ASCII graphics. For example, a well-known smiley is a vivid example of an encoding symbol.

ASCII can also be used when creating an HTML document. In this case, you can enter a certain set of characters, and when you view the page, a symbol appears on the screen that corresponds to this code.

ASCII is also needed to create multilingual sites, since characters that are not part of a specific national table are replaced with ASCII codes.

Some features

To encode text information in ASCII encoding, 7 bits were initially used (one was empty), but today it works as an 8-bit one.

The letters located in the columns at the top and bottom differ from each other only by one single bit. This greatly reduces the complexity of verification.

Using ASCII in Microsoft Office

If necessary, this kind of encoding of textual information can be used in Microsoft text editors, such as Notepad and Office Word. However, when typing, in this case it will be impossible to use some functions. For example, you can not perform bolding, since the ASCII encoding retains only the meaning of the information, ignoring its general appearance and form.

Standardization

ISO has adopted ISO 8859 standards. This group defines eight-bit encodings for different language groups. In particular, ISO 8859-1 is Extended ASCII, which is a table for the United States and Western Europe. And ISO 8859-5 is a table used for Cyrillic, including Russian.

For a number of historical reasons, the ISO 8859-5 standard was not used very long.

For the Russian language at the moment, encoding is really used:

  • CP866 (Code Page 866) or DOS, which is often called an alternative GOST coding. It was actively used until the mid-90s of the last century. At the moment, almost not used.
  • KOI-8. The encoding was developed in 1970-80s, and at the moment it is the standard for postal messages in RuNet. It is widely used in the OS of the Unix family, including Linux. The "Russian" version of KOI-8 is called KOI-8R. In addition, there are versions for other Cyrillic languages, for example, Ukrainian.
  • Code Page 1251 (CP 1251, Windows - 1251). It was developed by Microsoft Corporation to provide Russian language support in the Windows environment.

The main advantage of the first standard CP866 was the preservation of pseudographic characters in the same positions as in Extended ASCII. This allowed to run unchanged text programs, foreign production, such as the famous Norton Commander. At the moment, CP866 is used for programs developed under Windows that work in full-screen text mode or in text windows, including FAR Manager.

Computer texts written in the CP866 encoding are quite rare recently, but it is used for Russian file names in Vindous.

"Unicode"

At the moment, this encoding is the most widespread. Unicode codes are divided into regions. The first (from U + 0000 to U + 007F) includes the characters of the ASCII set with codes. Then follow the areas of the signs of various national scripts, as well as punctuation marks and technical symbols. In addition, part of the Unicode codes is reserved in case there is a need to include new symbols in the future.

Now you know that in ASCII encoding, each character is represented as a combination of 8 zeros and ones. To non-specialists, this information may seem unnecessary and uninteresting, but do not you want to know what is happening "in the brains" of your PC ?!

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.birmiss.com. Theme powered by WordPress.