Encoding

# 1. Objectives

• To define, bits, bytes, and encoding.
• To show several ways that binary data can be stored.
• To show how to encoding data.

# 2. Motivation

• Nearly all scientific data is stored electronically as a sequence of bits - a list of ones and zeros.
• Data storage is a critical element in how computers work. Before data is stored, it must be encoded.

# 3. Priming questions

• Give examples of an object being represented by a sequence of numbers.
• What are some motivations for representing an object by a sequence of numbers?

# 4. Notes

## 4.1. Bit Definition

• Bits are the individual zeros and ones that are stored by computers.
• A Bit can have one of two states.
• Any system that has two states can be thought of as a bit system: on or off, high or low, open or closed.

## 4.2. Byte Definition

A byte is a group of eight bits.

## 4.3. Methods of storing bits

• Placing holes and bumps on a piece of paper,
• charging a set of objects negative or positive,
• aligning magnets in the north or south direction, or
• creating a pit into a piece of plastic or metal.

## 4.4. Primitive memory storage

A simple memory stick with small round magnets attached to a piece of wood. If the person holding the large magnet sees the stick more toward him, he records a 1. If it moves away from him, he records a 0.

A CD or a DVD has small pits in it. In locations where there is a pit, the sensor stops receiving a reflected signal and this is interpreted as "zero". In locations where there is not a pit (a "land"), there is a reflected signal and this is interpreted as "one". What factors do you think control how many bits can be stored per unit area? What factors do you think control how quickly the bits can be read/written?

From img.tfd.com on June 24 2017 14:18:01.

## 4.6. Encoding Motivation

You are a "forensic computer scientist" and are given a DVD. You inspect it and find that it contains a list of 1s and 0s. How do you translate the list of bits into something useful?

 From upload.wikimedia.org on June 09 2012 14:33:50. = 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111

## 4.7. Encoding Definition

• A pattern of bits is a list of ones and zeros.
• Encoding is the translation of a character into a pattern of bits.
• More generally, encoding is the translation of a symbol or a sequence of characters into a pattern of bits.

## 4.8. Encoding Table

Encoding requires the use of an encoding table - a table that associates a bit pattern with a character (or sequence of characters).

Encoding Table 1

 bit pattern characters 00 red 01 green 10 blue 11 black

In this case the bit pattern 01 is associated with the sequence of characters that form the word "green".

If you and I agree to use Encoding Table 1 and I hand you a primitive memory stick with the pattern 00110011 you would decode this bit pattern to mean "red, black, red, black".

## 4.9. Encoding Table

Encoding Table 2

 bit pattern characters 00 zero 01 one 10 two 11 three

If you and I agree to use the Encoding Table 2 and I hand you a primitive memory stick with the pattern 00110011, you would decode bit pattern to mean "zero, three, zero, three".

## 4.10. Encoding Table

There is a special encoding table called the 7-bit ASCII table; an excerpt is given below

 bit pattern character 1100001 a

If a "forensic computer scientist" is analyzing your hard drive, knows that your computer encodes information using the 7-bit ASCII Table, and sees the bit pattern 1100001, he will know that you wrote the letter a.

In contrast to encoding Tables 1 and 2 in which a long sequence of characters were associated with a short bit pattern, the 7-bit ASCII table associates a single character with a long bit pattern (7 bits). Why?

## 4.11. Encoding

Instead of listing character/bit pattern associations, encoding tables often list only character/decimal value associations. To determine the bit pattern, the decimal number must be converted to binary.

 bit pattern character decimal value 1100001 a 97

If a forensic computer scientist tells you that he read "ASCII value 97" on a hard drive, he means that he read the bit pattern 1100001.

This is sometimes more convenient than the alternative of telling you "I read the binary value 1100001" on a hard drive.

## 4.12. ASCII Encoding

• The following is part of the "7-bit ASCII Decimal Encoding Table". (The table is also referred to as the 7-bit ASCII character set.)
• If I speak ASCII Decimal to you and say "72 73", you would know that I meant HI after looking at the table.
• If I speak 7-bit ASCII Binary to you and say 1001000 1001001, you would need to convert the binary numbers to their decimal representations of "72 73" before using the table.

7-bit ASCII Decimal Encoding Table

32 =   33 = ! 34 = " 35 = # 36 = \$ 37 = % 38 = & 39 = '
40 = ( 41 = ) 42 = * 43 = + 44 = , 45 = - 46 = . 47 = /
48 = 0 49 = 1 50 = 2 51 = 3 52 = 4 53 = 5 54 = 6 55 = 7
56 = 8 57 = 9 58 = : 59 = ; 60 = < 61 = = 62 = > 63 = ?
64 = @ 65 = A 66 = B 67 = C 68 = D 69 = E 70 = F 71 = G
72 = H 73 = I 74 = J 75 = K 76 = L 77 = M 78 = N 79 = O
80 = P 81 = Q 82 = R 83 = S 84 = T 85 = U 86 = V 87 = W
88 = X 89 = Y 90 = Z 91 = [ 92 = \ 93 = ] 94 = ^ 95 = _
96 =  97 = a 98 = b 99 = c 100 = d 101 = e 102 = f 103 = g
104 = h 105 = i 106 = j 107 = k 108 = l 109 = m 110 = n 111 = o
112 = p 113 = q 114 = r 115 = s 116 = t 117 = u 118 = v 119 = w
120 = x 121 = y 122 = z 123 = } 124 = | 125 = { 126 = ~

There is a shortcut when converting hexadecimal to binary: 3 1 = 0011 0001.

## 4.13. Encoding Method

To encode a sequence of characters, use a table-based algorithm:

1. In Table A., each cell in the table is a character.
2. In Table B, each cell value is the ASCII decimal encoding of the character in Table A obtained by referring to the ASCII decimal encoding table.
3. In Table C., replace the decimal values in Table B. with their binary representation. If the binary representation is shorter than seven bits, add zeros to the left side of the binary representation.

## 4.14. Encoding Example

Consider the string Hello CDS 130.

1. In Table A., each cell in the table is a character.

Table A
H e l l
o   C D
S   1 3
0

Note that spaces between characters have been given a location in the table.

## 4.15. Encoding Example

2. In the Table B., each cell value is the ASCII decimal encoding of the character in Table A obtained by referring to the ASCII decimal encoding table.

Table B
72 101 108 118
111 32 67 68
83 32 49 51
48

The empty spaces in the string Hello CDS 130 are themselves characters that are represented as the decimal number 32.

## 4.16. Encoding example

3. In Table C., replace the decimal values with their binary representation. If the binary representation is shorter than seven bits, add zeros to the left side of the binary number.

Table C
1000111 1110101 1110110 1110110
1101111 0100000 1000011 1000100
1010011 0100000 0110001 0110011
0110000

What a forensic computer scientist would see on the hard drive (spaces added for readability):

1000111 1110101 1110110 1110110 1101111 0100000 1000011 1000100 1010011 0100000 0110001 0110011 0110000 0100000 0100000 0100000

## 4.17. Unique combinations

• The question of "given N bits, how many unique bit patterns can you create?" will come up many times in this course.
• The reason is that we will want to assign a character (or sequence of characters) to each bit pattern. With two bits, I could create four associations.
• Given seven bits, 128 unique bit patterns can be created; this is about the same as the number of characters that you can create using a standard U.S. keyboard.

## 4.18. Unique combinations

Given N bits, how many unique bit patterns can you create?

Npatterns = 2Nbits

To check this formula, write all possible patterns for Nbits = 2. The formula predicts Npatterns = 4, which is equal to the number of possible unique patterns of length two:

00

01

10

11

# 5. Questions

## 5.1. Bits and Bytes

How many bytes are contained in 16 bits?

 A. 4 bytes B. 3 bytes C. 2 bytes D. 1 bytes E. None of the above

## 5.2. Bits and Bytes

How many bits are contained in 16 bytes?

 A. 96 bits B. 128 bits C. 256 bits D. 512 bits E. None of the above

## 5.3. Bits and Bytes

One byte equals how many bits?

 A. 16 B. 32 C. 8 D. 4

## 5.4. Bits and bytes

How many bits are contained in 47 bytes?

 A. 47 B. 37 C. 376 D. 188

## 5.5. Bits and Bytes

How many bytes are represented by 176 bits?

 A. 176 B. 44 C. 22 D. 32

## 5.6. Bits and Bytes

When we say that computers have "32 bit memory", we mean that each memory slot in that computer is composed of 32 bits. How many bytes of memory are represented by each 32 bit memory slot?

 A. 4 B. 2 C. 8 D. 16

## 5.7. Encoding

Using the following table, decode the message 00111011.

 bit pattern symbol 00 ☃ 01 ☠ 10 ⋙ 11 ❤

## 5.8. Encoding

Write "mom" in binary using 7-bit ASCII binary encoding. (Spaces added to improve readability.)

 A. 1101101 1101111 1101101 B. 1101100 0110011 0110011 1110100 C. 1101110 1101111 1110100 1101101 1100101 D. 1101000 1101001 E. 1100010 1111001 1100101

## 5.9. Encoding

A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 01001100 01001111 01001100. Assuming that the information is encoded as 7-bit ASCII, what does this series of numbers represent?

 A. :-) B. :-( C. LOL D. Woof E. oNaMoNaPiA

## 5.10. Decoding

Translate each of the following 7-bit binary patterns into their corresponding ASCII characters:

1000011
1010011
1001001
0101101
0110111
0110000
0110001
0100001


## 5.11. Unique Combinations

How many unique combinations of 1s and 0s are possible with 22 bits?

 A. 4,194,304 B. 4,194,303 C. 2,097,152 D. 2,097,151 E. None of the above

## 5.12. Unique Combinations

How many unique sounds would you have heard if this video was extended so that the last sound was that for all 1s?

## 5.13. Unique Combinations

1. How many unique combinations four zeros and ones can you create?
2. How many unique combinations six zeros and ones can you create?
3. How many unique combinations eight zeros and ones can you create?
4. How many unique combinations thirty-two zeros and ones can you create?

## 5.14. Bit range

How many 1s and 0s are required to represent any arbitrary 16 bit binary number?

 A. 17 B. 15 C. 16 D. 8 E. None of the above

If an arbitrary 16 bit binary number is multiplied by 2, what is the maximum number of bits required to write that product as a binary number?

 A. 18 bits B. 17 bits C. 16 bits D. 15 bits E. None of the above

## 5.15. Bit range

How many bits are required to write the binary representation of the decimal number 512?

 A. 8 B. 9 C. 10 D. 11 E. None of the above

## 5.16. Bit range

Using 16 bits, we can write positive binary numbers in the range 0 to LARGE. What is the decimal value of LARGE?

 A. 32,767 B. 32,768 C. 65,536 D. 65,535 E. None of the above

## 5.17. Writing Storage

Approximately how many sheets of notebook paper would you need to store the same number of characters that can be stored on a 4.7 GB DVD? Assume that (1) the characters on the DVD are only from the 7-bit ASCII character set, (2) that you could write 5,000 characters on one side of a sheet of notebook paper, and (3) you use both sides of the notebook paper.

 A. 470,000 B. 5,000 C. 470 D. 37 E. 470,000,000

## 5.18. Estimate

The 1984 science fiction novel Neuromancer by William Gibson contains 271 pages of text. Each page contains, on average, approximately 400 words. Each word, is on average, five ASCII characters long. Knowing that each ASCII character requires 7 bits of computer memory storage, how many bytes of computer memory storage are required to store all the words from Gibson's Neuromancer novel?

 A. 271,000 bytes B. 34,688,000 bytes C. 4,336,000 bytes D. 542,000 bytes

Personal computers in 2010 can come equipped with hard disk drives having 1 terabyte of storage capacity (a terabyte is one trillion bytes, or 1,000,000,000,000 bytes). Approximately how many copies of William Gibson's Neuromancer novel could you store on your 1 terabyte hard disk drive, assuming the entire disk is available for storage?

## 5.19. Estimate

If a single ASCII character (from the extended set) can be represented by 7 bits, and we have a 500 gigabyte hard drive available for storage, about how many ASCII characters can be stored on this hard drive? (NOTE: One gigabyte is about equal to one billion bytes)

 A. about 500 million ASCII characters B. about 8 billion ASCII characters C. about 64 billion ASCII characters D. about 500 billion ASCII characters E. None of the above

## 5.20. Estimate

A U.S. one dollar bill measures about 6 centimeters wide by 11 centimeters long. If there are 330 ASCII characters printed on one side of it, then what is the approximate data density (in bytes per square centimeter) of one side of a U.S. one dollar bill? (Assume each ASCII character is encoded with a 7 bit pattern.)

 A. about 4.4 bytes per square centimeter B. about 5.5 bytes per square centimeter C. about 6.6 bytes per square centimeter D. about 7.7 bytes per square centimeter E. None of the above

## 5.21. Estimate

A page from a book contains 500 words, and each word contains on average 4 ASCII characters. Considering ONLY the words on the page, how many bits of information does the page contain? (Assume that the ASCII characters are encoded using bit patterns of length 7.)

 A. 500 bits B. 2,000 bits C. 8,000 bits D. 6,000 bits E. None of the above

# 6. Activities

## 6.1. Make your own encoding

An encoding table is a table that associates a bit pattern with a character or object. For example, and ASCII table associates the bit pattern1001000 with the character H.

Create an encoding table with bit patterns each with three bits. Write a binary encoded message. Hand the message to your partner along with the decoding table and see if they determine what your binary encoded message means.

## 6.2. Discussion Question

How would a forensic computer scientist be able to figure out that the yellow numbers correspond to an Excel Spreadsheet document?

 From upload.wikimedia.org on June 24 2017 16:44:40. = 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111

## 6.3. Discussion Question

 From www.wiilovemario.com on June 24 2017 16:44:41. Explain in basic terms the meaning of the following: "The old monitor only supports 8-bit color, my monitor supports 24-bit color". (Related link: 8-bit art:[1])

## 6.4. Other Encoding

The 7-bit ASCII character set only has 128 possible characters. How would you encode the greek letter "beta" in binary on a computer? That is, suppose you were asked to come up with a scheme for encoding the greek letter "beta" in binary so that anyone who saw the particular list of zeros and ones would immediately know you meant the greek letter "beta".

## 6.5. Experiment

ASCII is one of many ways to encode numbers and characters.

Question: Compare this search: CCCP with this search: CCCP. Why are the results different? The text in the search box looks the same!

Experiment:

• Click the first link, copy the text CCCP from the search box, paste it into notepad, and save. How big is the saved file?
• Do the same for the second link. What happens when you choose ANSI as the encoding versus UTF-8 when saving? How big is the file when you chose ANSI? UTF-8? When you choose ANSI, exit and re-open the file you saved. What do you see?

## 6.6. Estimates

Note: In the problems that request an estimate, your answer can be in a fairly wide range of values. Your explanation is more important than your actual number. For example, if I asked to you to compare the area of a DVD to the area of a sheet of paper, a correct answer could be "I can lay four DVDs on a sheet of paper and it covers up most of the paper. Therefore the area of a DVD is about four times that of a sheet of paper." Another correct answer could be to compute the area of the sheet of paper and the area of the DVD using the formulas for the area of a rectangle and the area of a circle and then take the ratio of these areas. The first answer is less accurate, but the approach is more in the spirit of an estimate.

1. Estimate the number of characters (the numbers 0 through 9, letters a-z upper and lowercase) that you could write on a piece of paper by hand using a pen or pencil. Explain how you arrived at your estimate.
2. Estimate the number of bits of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
3. Estimate the number of bytes of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
4. How many sheets of notebook paper would you need to store the same number of characters that can be stored on a DVD? (Assume that the characters on the DVD are only from the 8-bit ASCII character set.)
5. Estimate the density of data stored on your sheet of notebook paper (in bytes per square centimeter).
6. Estimate the density of data stored on a DVD (in bytes per square centimeter).

## 6.7. Encoding Chinese

7 bits are required to represent the most commonly used written English language characters. The 7-bit ASCII character set relates 128 binary numbers to 128 commonly used characters in the English language.

For example, the bit pattern 1100001 corresponds to the character "a" and 1000001` corresponds to the character "A".

The Chinese character set is composed of unique characters that taken together comprise the written Chinese language. A college-educated Chinese adult is fluent with 6,000 to 7,000 unique Chinese characters.

How many bits are required to represent the entire set of written Chinese characters for a college-educated Chinese adult?

# 7. Resources

• Articles about how computer memory works:
• Using paper instead of computer memory:
• An advanced discussion of encoding [3].
• The binary alphabet is superior to the "best" alphabet (Korean) [4].