Encoding
From CDS 130
1. Objectives
 To define, bits, bytes, and encoding.
 To show several ways that binary data can be stored.
 To show how to encoding data.
2. Motivation
 Nearly all scientific data is stored electronically as a sequence of bits  a list of ones and zeros.
 Data storage is a critical element in how computers work. Before data is stored, it must be encoded.
3. Priming questions
 Give examples of an object being represented by a sequence of numbers.
 What are some motivations for representing an object by a sequence of numbers?
4. Notes
4.1. Bit Definition
 Bits are the individual zeros and ones that are stored by computers.
 A Bit can have one of two states.
 Any system that has two states can be thought of as a bit system: on or off, high or low, open or closed.
4.2. Byte Definition
A byte is a group of eight bits.
Joke 

Two bytes walk into a bar. The bartender says "Can I get you anything?" The bytes say "sure, make us a double." (This is funny if you are a heavy drinker and also happen to know that a sequence of sixteen bits is called a double.) 
4.3. Methods of storing bits
 Placing holes and bumps on a piece of paper,
 charging a set of objects negative or positive,
 aligning magnets in the north or south direction, or
 creating a pit into a piece of plastic or metal.
4.4. Primitive memory storage
4.5. Advanced Memory Storage
A CD or a DVD has small pits in it. In locations where there is a pit, the sensor stops receiving a reflected signal and this is interpreted as "zero". In locations where there is not a pit (a "land"), there is a reflected signal and this is interpreted as "one". What factors do you think control how many bits can be stored per unit area? What factors do you think control how quickly the bits can be read/written?
4.6. Encoding Motivation
You are a "forensic computer scientist" and are given a DVD. You inspect it and find that it contains a list of 1s and 0s. How do you translate the list of bits into something useful?
From upload.wikimedia.org on June 09 2012 14:33:50.

= 
001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 
4.7. Encoding Definition
 A pattern of bits is a list of ones and zeros.
 Encoding is the translation of a character into a pattern of bits.
 More generally, encoding is the translation of a symbol or a sequence of characters into a pattern of bits.
4.8. Encoding Table
Encoding requires the use of an encoding table  a table that associates a bit pattern with a character (or sequence of characters).
Encoding Table 1
bit pattern characters 00
red
01
green
10
blue
11
black
In this case the bit pattern 01
is associated with the sequence of characters that form the word "green".
If you and I agree to use Encoding Table 1 and I hand you a primitive memory stick with the pattern 00110011
you would decode this bit pattern to mean "red, black, red, black".
4.9. Encoding Table
Encoding Table 2
bit pattern characters 00
zero
01
one
10
two
11
three
If you and I agree to use the Encoding Table 2 and I hand you a primitive memory stick with the pattern 00110011
, you would decode bit pattern to mean "zero, three, zero, three".
4.10. Encoding Table
There is a special encoding table called the 7bit ASCII table; an excerpt is given below
bit pattern character 1100001
a
If a "forensic computer scientist" is analyzing your hard drive, knows that your computer encodes information using the 7bit ASCII Table, and sees the bit pattern 1100001
, he will know that you wrote the letter a
.
In contrast to encoding Tables 1 and 2 in which a long sequence of characters were associated with a short bit pattern, the 7bit ASCII table associates a single character with a long bit pattern (7 bits). Why?
4.11. Encoding
Instead of listing character/bit pattern associations, encoding tables often list only character/decimal value associations. To determine the bit pattern, the decimal number must be converted to binary.
bit pattern  character  decimal value 
1100001  a  97 
If a forensic computer scientist tells you that he read "ASCII value 97" on a hard drive, he means that he read the bit pattern 1100001
.
This is sometimes more convenient than the alternative of telling you "I read the binary value 1100001
" on a hard drive.
4.12. ASCII Encoding
 The following is part of the "7bit ASCII Decimal Encoding Table". (The table is also referred to as the 7bit ASCII character set.)
 If I speak ASCII Decimal to you and say "72 73", you would know that I meant
HI
after looking at the table.  If I speak 7bit ASCII Binary to you and say
1001000 1001001
, you would need to convert the binary numbers to their decimal representations of "72 73" before using the table.
7bit ASCII Decimal Encoding Table
32 =  33 = !  34 = "  35 = #  36 = $  37 = %  38 = &  39 = '

40 = (  41 = )  42 = *  43 = +  44 = ,  45 =   46 = .  47 = /

48 = 0  49 = 1  50 = 2  51 = 3  52 = 4  53 = 5  54 = 6  55 = 7

56 = 8  57 = 9  58 = :  59 = ;  60 = <  61 = =  62 = >  63 = ?

64 = @  65 = A  66 = B  67 = C  68 = D  69 = E  70 = F  71 = G

72 = H  73 = I  74 = J  75 = K  76 = L  77 = M  78 = N  79 = O

80 = P  81 = Q  82 = R  83 = S  84 = T  85 = U  86 = V  87 = W

88 = X  89 = Y  90 = Z  91 = [  92 = \  93 = ]  94 = ^  95 = _

96 = `  97 = a  98 = b  99 = c  100 = d  101 = e  102 = f  103 = g

104 = h  105 = i  106 = j  107 = k  108 = l  109 = m  110 = n  111 = o

112 = p  113 = q  114 = r  115 = s  116 = t  117 = u  118 = v  119 = w

120 = x  121 = y  122 = z  123 = }  124 =   125 = {  126 = ~ 
There is a shortcut when converting hexadecimal to binary: 3 1 = 0011 0001.
4.13. Encoding Method
To encode a sequence of characters, use a tablebased algorithm:
 In Table A., each cell in the table is a character.
 In Table B, each cell value is the ASCII decimal encoding of the character in Table A obtained by referring to the ASCII decimal encoding table.
 In Table C., replace the decimal values in Table B. with their binary representation. If the binary representation is shorter than seven bits, add zeros to the left side of the binary representation.
4.14. Encoding Example
Consider the string Hello CDS 130
.
1. In Table A., each cell in the table is a character.
Table A  

H  e  l  l
 
o   C  D
 
S   1  3
 
0 
Note that spaces between characters have been given a location in the table.
4.15. Encoding Example
2. In the Table B., each cell value is the ASCII decimal encoding of the character in Table A obtained by referring to the ASCII decimal encoding table.
Table B  

72  101  108  118  
111  32  67  68  
83  32  49  51  
48 
The empty spaces in the string Hello CDS 130
are themselves characters that are represented as the decimal number 32.
4.16. Encoding example
3. In Table C., replace the decimal values with their binary representation. If the binary representation is shorter than seven bits, add zeros to the left side of the binary number.
Table C  

1000111  1110101  1110110  1110110
 
1101111  0100000  1000011  1000100
 
1010011  0100000  0110001  0110011
 
0110000 
What a forensic computer scientist would see on the hard drive (spaces added for readability):
1000111 1110101 1110110 1110110 1101111 0100000 1000011 1000100 1010011 0100000 0110001 0110011 0110000 0100000 0100000 0100000
4.17. Unique combinations
 The question of "given N bits, how many unique bit patterns can you create?" will come up many times in this course.
 The reason is that we will want to assign a character (or sequence of characters) to each bit pattern. With two bits, I could create four associations.
 Given seven bits, 128 unique bit patterns can be created; this is about the same as the number of characters that you can create using a standard U.S. keyboard.
4.18. Unique combinations
Given N bits, how many unique bit patterns can you create?
N_{patterns} = 2^{Nbits}
To check this formula, write all possible patterns for Nbits = 2. The formula predicts N_{patterns} = 4, which is equal to the number of possible unique patterns of length two:
00
01
10
11
5. Questions
5.1. Bits and Bytes
How many bytes are contained in 16 bits?
A. 4 bytes 
B. 3 bytes 
C. 2 bytes 
D. 1 bytes 
E. None of the above 
5.2. Bits and Bytes
How many bits are contained in 16 bytes?
A. 96 bits 
B. 128 bits 
C. 256 bits 
D. 512 bits 
E. None of the above 
5.3. Bits and Bytes
One byte equals how many bits?
A. 16 
B. 32 
C. 8 
D. 4 
5.4. Bits and bytes
How many bits are contained in 47 bytes?
A. 47 
B. 37 
C. 376 
D. 188 
5.5. Bits and Bytes
How many bytes are represented by 176 bits?
A. 176 
B. 44 
C. 22 
D. 32 
5.6. Bits and Bytes
When we say that computers have "32 bit memory", we mean that each memory slot in that computer is composed of 32 bits. How many bytes of memory are represented by each 32 bit memory slot?
A. 4 
B. 2 
C. 8 
D. 16 
5.7. Encoding
Using the following table, decode the message 00111011
.
bit pattern symbol 00
☃ 01
☠ 10
⋙ 11
❤
5.8. Encoding
Write "mom" in binary using 7bit ASCII binary encoding. (Spaces added to improve readability.)
A. 1101101 1101111 1101101

B. 1101100 0110011 0110011 1110100

C. 1101110 1101111 1110100 1101101 1100101

D. 1101000 1101001

E. 1100010 1111001 1100101

5.9. Encoding
A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 01001100 01001111 01001100
. Assuming that the information is encoded as 7bit ASCII, what does this series of numbers represent?
A. :) 
B. :( 
C. LOL 
D. Woof 
E. oNaMoNaPiA 
5.10. Decoding
Translate each of the following 7bit binary patterns into their corresponding ASCII characters:
1000011 1010011 1001001 0101101 0110111 0110000 0110001 0100001
5.11. Unique Combinations
How many unique combinations of 1s and 0s are possible with 22 bits?
A. 4,194,304 
B. 4,194,303 
C. 2,097,152 
D. 2,097,151 
E. None of the above 
5.12. Unique Combinations
How many unique sounds would you have heard if this video was extended so that the last sound was that for all 1s?
5.13. Unique Combinations
 How many unique combinations four zeros and ones can you create?
 How many unique combinations six zeros and ones can you create?
 How many unique combinations eight zeros and ones can you create?
 How many unique combinations thirtytwo zeros and ones can you create?
5.14. Bit range
How many 1s and 0s are required to represent any arbitrary 16 bit binary number?
A. 17 
B. 15 
C. 16 
D. 8 
E. None of the above 
If an arbitrary 16 bit binary number is multiplied by 2, what is the maximum number of bits required to write that product as a binary number?
A. 18 bits 
B. 17 bits 
C. 16 bits 
D. 15 bits 
E. None of the above 
5.15. Bit range
How many bits are required to write the binary representation of the decimal number 512?
A. 8 
B. 9 
C. 10 
D. 11 
E. None of the above 
5.16. Bit range
Using 16 bits, we can write positive binary numbers in the range 0 to LARGE. What is the decimal value of LARGE?
A. 32,767 
B. 32,768 
C. 65,536 
D. 65,535 
E. None of the above 
5.17. Writing Storage
Approximately how many sheets of notebook paper would you need to store the same number of characters that can be stored on a 4.7 GB DVD? Assume that (1) the characters on the DVD are only from the 7bit ASCII character set, (2) that you could write 5,000 characters on one side of a sheet of notebook paper, and (3) you use both sides of the notebook paper.
A. 470,000 
B. 5,000 
C. 470 
D. 37 
E. 470,000,000 
5.18. Estimate
The 1984 science fiction novel Neuromancer by William Gibson contains 271 pages of text. Each page contains, on average, approximately 400 words. Each word, is on average, five ASCII characters long. Knowing that each ASCII character requires 7 bits of computer memory storage, how many bytes of computer memory storage are required to store all the words from Gibson's Neuromancer novel?
A. 271,000 bytes 
B. 34,688,000 bytes 
C. 4,336,000 bytes 
D. 542,000 bytes 
Personal computers in 2010 can come equipped with hard disk drives having 1 terabyte of storage capacity (a terabyte is one trillion bytes, or 1,000,000,000,000 bytes). Approximately how many copies of William Gibson's Neuromancer novel could you store on your 1 terabyte hard disk drive, assuming the entire disk is available for storage?
A. About 1,845,000 copies 
B. About 230,627 copies 
C. About 28,828 copies 
D. About 3,690,000 copies 
5.19. Estimate
If a single ASCII character (from the extended set) can be represented by 7 bits, and we have a 500 gigabyte hard drive available for storage, about how many ASCII characters can be stored on this hard drive? (NOTE: One gigabyte is about equal to one billion bytes)
A. about 500 million ASCII characters 
B. about 8 billion ASCII characters 
C. about 64 billion ASCII characters 
D. about 500 billion ASCII characters 
E. None of the above 
5.20. Estimate
A U.S. one dollar bill measures about 6 centimeters wide by 11 centimeters long. If there are 330 ASCII characters printed on one side of it, then what is the approximate data density (in bytes per square centimeter) of one side of a U.S. one dollar bill? (Assume each ASCII character is encoded with a 7 bit pattern.)
A. about 4.4 bytes per square centimeter 
B. about 5.5 bytes per square centimeter 
C. about 6.6 bytes per square centimeter 
D. about 7.7 bytes per square centimeter 
E. None of the above 
5.21. Estimate
A page from a book contains 500 words, and each word contains on average 4 ASCII characters. Considering ONLY the words on the page, how many bits of information does the page contain? (Assume that the ASCII characters are encoded using bit patterns of length 7.)
A. 500 bits 
B. 2,000 bits 
C. 8,000 bits 
D. 6,000 bits 
E. None of the above 
6. Activities
6.1. Make your own encoding
An encoding table is a table that associates a bit pattern with a character or object. For example, and ASCII table associates the bit pattern1001000
with the character H
.
Create an encoding table with bit patterns each with three bits. Write a binary encoded message. Hand the message to your partner along with the decoding table and see if they determine what your binary encoded message means.
6.2. Discussion Question
How would a forensic computer scientist be able to figure out that the yellow numbers correspond to an Excel Spreadsheet document?
From upload.wikimedia.org on June 24 2017 16:44:40.

= 
001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 
6.3. Discussion Question
From www.wiilovemario.com on June 24 2017 16:44:41.

Explain in basic terms the meaning of the following: "The old monitor only supports 8bit color, my monitor supports 24bit color". (Related link: 8bit art:[1]) 
6.4. Other Encoding
The 7bit ASCII character set only has 128 possible characters. How would you encode the greek letter "beta" in binary on a computer? That is, suppose you were asked to come up with a scheme for encoding the greek letter "beta" in binary so that anyone who saw the particular list of zeros and ones would immediately know you meant the greek letter "beta".
Comments 

Many students used the table Encoding#ASCII_Encoding and said that they would encode β as the binary representation of the decimal numbers The subtlety of this question is this: suppose someone read the * They would actually read a binary number because numbers are stored in computer memory in binary  the four numbers are the decimal versions of the 7bit binary numbers that they read. 
6.5. Experiment
ASCII is one of many ways to encode numbers and characters.
Question: Compare this search: CCCP with this search: CCCP. Why are the results different? The text in the search box looks the same!
Experiment:
 Click the first link, copy the text CCCP from the search box, paste it into notepad, and save. How big is the saved file?
 Do the same for the second link. What happens when you choose ANSI as the encoding versus UTF8 when saving? How big is the file when you chose ANSI? UTF8? When you choose ANSI, exit and reopen the file you saved. What do you see?
Partial answer  

Note that the size is listed as 1 byte (see text in the topmost window). Why do you think "size on disk" is listed at 4,096 bytes? 
6.6. Estimates
Note: In the problems that request an estimate, your answer can be in a fairly wide range of values. Your explanation is more important than your actual number. For example, if I asked to you to compare the area of a DVD to the area of a sheet of paper, a correct answer could be "I can lay four DVDs on a sheet of paper and it covers up most of the paper. Therefore the area of a DVD is about four times that of a sheet of paper." Another correct answer could be to compute the area of the sheet of paper and the area of the DVD using the formulas for the area of a rectangle and the area of a circle and then take the ratio of these areas. The first answer is less accurate, but the approach is more in the spirit of an estimate.
 Estimate the number of characters (the numbers 0 through 9, letters az upper and lowercase) that you could write on a piece of paper by hand using a pen or pencil. Explain how you arrived at your estimate.
 Estimate the number of bits of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
 Estimate the number of bytes of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
 How many sheets of notebook paper would you need to store the same number of characters that can be stored on a DVD? (Assume that the characters on the DVD are only from the 8bit ASCII character set.)
 Estimate the density of data stored on your sheet of notebook paper (in bytes per square centimeter).
 Estimate the density of data stored on a DVD (in bytes per square centimeter).
6.7. Encoding Chinese
7 bits are required to represent the most commonly used written English language characters. The 7bit ASCII character set relates 128 binary numbers to 128 commonly used characters in the English language.
For example, the bit pattern 1100001
corresponds to the character "a" and 1000001
corresponds to the character "A".
The Chinese character set is composed of unique characters that taken together comprise the written Chinese language. A collegeeducated Chinese adult is fluent with 6,000 to 7,000 unique Chinese characters.
How many bits are required to represent the entire set of written Chinese characters for a collegeeducated Chinese adult?
Answer 

2^{7} = 128, which means that with 7 bits you can create 128 unique patterns composed of seven ones and zeros. This question is asking 2^{?} = more than 7000. To answer, guess different values of 2^{11} = 2048 2^{12} = 4096 2^{13} = 8192 So the answer is that you would need Suppose that you wanted to store all of the bit patterns into memory. You would need to store 8,192 patterns, each of which are 13 bits. So you would need 106,496 bits or 13,312 bytes of memory. 
7. Resources
 Articles about how computer memory works:
 Using paper instead of computer memory: