This course will be an introduction to classical information theory, which is generally regarded as having been founded by Shannon in 1948. We will deal with the notions of data encoding and decoding; transmission of data over a communication channel; error correction; capacity of a communication channel; and data compression. One of the principle ideas is that of a measure of information, also known as entropy. Thus we would cover, at the very least, the first four chapters of Information Theory.
Depending on the interests of the audience and available time, we may get into somewhat more advanced topics. Such possibilities include: further discussion of error correcting codes; Boltzmann's formula from statistical mechanics and entropy; dynamical entropy; and quantum information theory in the context of quantum communication.
At least for the bulk of the course, the prerequisites are rather minor: students would be expected to know some linear algebra (115A) and very basic probability theory, which will be reviewed at the beginning of the course (we will only deal with random variables having a finite number of possible values). Students are also expected to have some mathematical maturity and familiarity with proofs (on the level of ones encountered in the linear algebra 115A and analysis 131A).
Grading will be discussed at the first class meeting. The plan is to have regular homework, and a final exam.
Due date | Assignment |
---|---|
Mon, Oct. 6 | pp. 24-26: 1.1; 1.2; 1.4; 1.6; 1.7*; 1.11*; |
Mon, Oct. 13 |
p. 45: 2.3; 2.4.
Problem A: given the letter frequencies in the English language found on Wikipedia, construct a Huffman code. Note: a related code, but one in which also space is represented, can be found also on Wikipedia. Problem B: Assume that your are given 20 coins, one of which is a fake (it weighs less than the others). You have a pair of balance scales, on which you can weigh two groups of coins and decide whether a given group is lighter, heavier, or of the same weight as another. What is the minimal number of times that coins must be weighted to find a fake coin? Prove your answer. Do the same if you start with N coins to begin with. Solution. Number the coins 1 ... N, and let X = the number of the coin that is fake. Assume that you can determine the value of X in at most S steps. Then you can encode the value of X by means of recording answers to the weightings, producing a trinary number with at most S digits. Let us now choose the fake coin according to some probability distribution (so that X now becomes a random variable). Then our encoding scheme produces at most S bits all the time, and so the average wordlength for the code is at most S. From the theorem in the book (2.5.2), we conclude that H(X)/log(3) ≤ S. Now H(X) is maximal (=log N) if X is uniformly distributed, so we conclude that S ≥ log N/log(3) Conversely, you can just write down the scheme directly: split the coins into three groups, A1, A2, A3 so that A1 and A2 have equal number of coins, and each has approximately 1/3 of the total number of coins. Put A1 and A2 on the scales. With one weighting you can tell if the coin is in A1 (if A2 is heavier), A2 (if A1 is heavier) or A3 (if A1 and A2 are the same). But then you repeat the scheme using whichever group the coin is in; you get your answer when the final group has ≤ 3 coins. This clearly requires less than logN/log3 + 1 steps. Thus the answer is logN/log3. |
Wed, Oct. 22nd |
pp. 84-85: 3.2, 3.3, 3.4, 3.5, 3.8;
Problem C: Assume that your are given 20 coins, two of which are fakes (a fake coin weighs less than the others). You have a pair of balance scales, on which you can weigh two groups of coins and decide whether a given group is lighter, heavier, or of the same weight as another. What is the minimal number of times that coins must be weighted to find both fake coins? Prove your answer. Do the same if you start with N coins to begin with. |
Mon, Nov 3rd |
pp. 85-86: 3.7, 3.9, 3.12, 3.13. Problem D: Assume that you are given 10 sacks of coins (each sack contains M coins, and M is very large). In each sack, either all coins are fake or all are genuine. You know that a fake coin has weight 9 grams and a real coin has weight 10 grams. You have a digital scale that tells you the actual weight in grams. (a) Show that you can determine which coin is fake by putting some coins on the scale and reading off the resulting weight once. (b) Explain why there is no contradiction with coding theory. Solution. Weigh simultaneously: 1 coin from bag 1; 10 coins from bag 2; 100 coins from bag 3 and so on (10^{k} coins from bag k). If all coins were genuine and you had n bags, you would get 10g + 100g + 1000g + ... + 10^{n+1}, so the total weight would have been the number 11....110 (as many 1's as you have bags, n+1 digits total). But if the k-th bag has fake coins in it, these coins would contribute 9×10^{k} rather than 10×10^{k}, so that the k-th digit would be zero, indicating which bag has fake coins (this works for all bags, except the first; if the first bag has fake coins in it, the least significant digit would be 9 rather than 0; you could avoid this exception if you were to add a 1g weight to the scales together with all the coins). There is no contradition with information theory, becase the output of your experiment is a number with as many digits as you want, so there is no limit to the number of scenarios (locations of bag with fake coins) that the outcome of the experiment can represent. |
Wed. Nov. 12th | pp. 130-131: 4.1, 4.2, 4.3. |
Wed. Nov. 12th | pp. 130-131: 4.1, 4.2, 4.3. |
Wed. Nov. 19th |
pp. 132-133: 4.4, 4.5, 4.6*, 4.7, 4.11*. Problem E: Consider the vector space P_{n} of all polynomials of degree at most n and whose coefficients are in F_{2} (the field with two elements). Let A(p) be the remainder obtained after dividing p by (x-1). Thus A is a map from P_{n} to constant polynomials. (a) Identify P_{n} with F_{2}^{n+1} using the basis consisting of monomials (i.e., powers of x). (b) Find the matrix and another interpretation of the map A. Solution. We first claim that x^{n} = 1 mod (x-1). This is easily seen by observing that x^{n-1} is divisible by x-1. It follows that if p=a_{0}+a_{1} x + ... + a_{n-1} x^{n-1}, then p = a_{0} + ... + a_{n-1} mod (x-1). In other words, the remainder after division by x-1 of a polynomial p is the same as the sum of its coefficients. In the basis given by monomials, the matrix of this map is [ 1 1 ... 1 ]; you can also think of this map as giving you the parity (=sum of the digits) of the vector made from the coefficients of the polynomial. this map as |
Wed. Nov. 26th |
pp. 163-164: 5.1, 5.2. Problem F: (CRC codes). Consider a file containing N binary digits (bits), a_{0}...a_{N-1}. We wish to detect (but not necessarily locate) changes in the file (you could think of them as errors when the file is transmitted, or alterations of the file). To do so we fix once and for all a polynomial g, which we'll call the generating polynomial. Then we correspond to the file a polynomial p given as the sum a_{0}+...+ a_{N-1} x^{N-1}. p is also called the data polynomial. Lastly, we compute the CRC checksum of the file as the remainder r after division of p by g, i.e., r=p mod g. (a) Show that if g=x-1, then the CRC checksum is the same as the parity (=sum of all digits in the file); (b) Assume that the file contains the digits 11100010100. Let g be the polynomial x^{3}+x+1. Compute the CRC of the file for this generating polynomial. (c) Let g be as in part (b). Show that, whatever the length of the file, if we change any 1 bit in the file, the CRC checksum will change. Hint: how does the data polynomial change when you alter a bit in the file? How does the CRC change? (d) Show that g of problem (c) is irreducible. Find the period of g, i.e., the smallest n so that x^{2n-1} has the same remainder as 1 mod g. (e) Let g be as in part (b). Show that if we change any two bits in the file in such a way that the CRC checksum does not change, then the two bits must be a mulitple of 7 digits from each other. (f) Show that in general if g has period M and has nonzero constant coefficient, then for the CRC checksum to fail to detect two bit errors, the errors must be a multiple of 2^{M}-1 digits apart. (g) What is M from part (f) in the case that g= x^{5}+x^{2}+1 (CRC for this polynomial is used by certain packets on the USB bus). Solution. You may find the following Wikipedia article useful as a reference on CRC codes. |
For the last two lectures, we were following the Notes on Expander Codes by D. Spielman. | |
There will be a take-home final exam handed out on Friday's lecture. The exam will be due at 11am on Wednesday Dec. 10th and should be handed to me personally before that deadline. | |
Final exam solutions |
Note: * denotes optional problems; if you are a more advanced student you should try them.