Bielefeld University - Faculty of Technology

Networks and Distributed Systems

Research group of Prof. Peter Bernard Ladkin, Ph.D.

Networks and Distributed Systems

Research group of Prof. Peter Bernard Ladkin, Ph.D.

This is an archived page and is no longer updated.

Current content of Prof. Ladkin's research group is available at https://rvs-bi.de/

Computers `*store*' numbers in
hardware as digits: for example, the number `12493' consists of
five digits, each of which is between 0 and 9. We call it a `decimal
representation' (decimal = based on 10) because there are ten such
digits, and because this number represents

Computer hardware is built to store numbers in `binary'.
There are two digits, 0 and 1. So computer hardware works
mostly on *base two*. In base 2, a number 10010 represents

Computer hardware must be able to represent not only numbers, but also characters (for text) and all kinds of other `data'. To do this, the other kind of data is `coded', as I shall now explain.

First, I must describe `bytes'.
The bits are grouped together
physically into `bytes'. A byte has a fixed number of bits. Typical
sizes for bytes are 8 bits, and 16 bits. An example
of an 8-bit byte is **10011101** and of a 16-bit byte is
**1001001101001001**. There are 256 different 8-bit bytes and many
thousands of possible 16-bit bytes. The size of a byte is determined when
the computer hardware (the chip) is designed, and is physically fixed.
It cannot be altered. But of course different chips can have different
byte sizes. It is a convenience, and not a mathematical necessity, that
bytes are mostly designed to have 8 bits.

Bytes are used as codes in the following way. A character between
`a' and `z' is assigned a certain specific fixed binary number, say
between 0 and 255, as an identifier.
Say, a lower case `*k*' is assigned the number 107
(01101011 as an 8-bit byte), and an
upper case `*K*' the number 75 (01001011).
If a computer program or hardware `knows' that it should be seeing a
character, it can read the number and interpret it as the intended character.
A code between 0 and 255 can be represented in an 8-bit byte.
Examples of codes are EBCDIC (an IBM code from the 1960's), ASCII
(the `American standard' character set, from which the example with `k'
is taken) and ISO-8856 (the International standard character set).
ASCII actually uses numbers only between 0 and 127
to represent the rather restricted range of characters used in the US
(no ü, é, ç or ß). Because numbers between 0 and 127
can be represented using 7 bits only, this is often referred to as a
`7-bit code'. The ISO-8856 standard is an 8-bit code, using the
numbers between 0 and 255 as codes.

With Email, one often hears of `7-bit' and `8-bit' transmission problems: in many older computer communications, only 7 bits of a byte were used to code a digit, because ASCII required only 7 bits. So if one sends a message using ISO-8856 (as is likely if one is communicating in a language other than English), one bit of information will be lost, which means that another character could be read, different from the one that was sent, and the message thereby garbled. More recent Email communication standards (`MIME') also specify a way to code 8-bit data as 7 bits, so that it may be decoded by the receiver and correctly read. This is indeed a complicated but necessary fix to a design problem (deciding that 7 bits were enough) that could have been avoided. But we're stuck with it now. It's somewhat like the Year 2000 problem.

Bytes themselves are grouped into `*words*', consisting of usually
of two, four, or eight bytes. Words are used for storing computer
instructions and other data, such as numbers, for which more than
the size of a byte is required for coding.
One hears of `16-bit', `32-bit' and `64-bit' computers (meaning
processor chips). This means the word size is 16, 32 or 64 respectively.
One also hears, in communications, of 8-bit or 16-bit (or 32-bit or...)
data paths. This means that data is transferred in this size. If you have
a 32-bit processor with 8-bit data paths, it means that every word in the
processor must be broken up into 4 8-bit bytes to be tranferred to, say,
disk, and each of these 4 bytes are send separately.

The byte size and word size of computer hardware are fixed. They cannot be altered. That means that any data which doesn't standardly fit into this format must be coded so that it does. This is usually done by programs (that is, software). One way in which this can be done for dates is to code each decimal digit into bytes. One usually uses a single 8-bit byte for two decimal digits, using 4 bits for each digit. Here are the codings:

- 0 is 0000
- 1 is 0001
- 2 is 0010
- 3 is 0011
- 4 is 0100
- 5 is 0101
- 6 is 0110
- 7 is 0111
- 8 is 1000
- 9 is 1001

Using this method,
a date represented as DD-MM-YY, say 30-10-98, can be coded in
three bytes, as *00110000, 00010000, 10011000*. A date represented
as DD-MM-YYYY; better, as YYYY-MM-DD conforming to the International
Standard for dates, ISO-8601 (also EN 286001 and DIN 5008), would be
1998-10-30, thus *00011001, 10011000, 00010000, 00110000*.