# Math 156: General Course Outline

## Catalog Description

**156. Machine Learning.** Lecture, three hours; discussion, one hour; laboratory, one hour. Requisite: course 115A, 164, 170E (or 170A or Statistics 100A) and Programming in Computing 10A of Computer Science 31. Strongly recommended requisite: Program in Computing 16A or Statistics 21. Introductory course on mathematical models for pattern recognition and machine learning. Topics include parametric and nonparametric probability distributions, curse of dimensionality, correlation analysis and dimensionality reduction, and concepts of decision theory. Advanced machine learning and pattern recognition problems, including data classification and clustering, regression, kernel methods, artificial neural networks, hidden Markov models, and Markov random fields. Projects in MATLAB to be part of final project presented in class. P/NP or letter grading.

## Textbook

?Pattern Recognition and Machine Learning?, by Christopher M. Bishop, Springer, 2006 (ISBN-13: 978-0387-31073-2), plus complementary sources where necessary (?n/a?).

## Schedule of Lectures

Lecture | Section | Topics |
---|---|---|

1 |
1.2, 1.5-1.6, 2.3-2.5 |
Introduction, Definitions, Pre-requisites. Course Introduction, recap on Linear Algebra, probabilities. Gaussian, exponential pdf; Learning parametric pdf. Learning non-metric pdf. |

2 |
12.1-12.4 |
Correlation Analysis, dimensionality reduction, PCA. PCA: maximum variance, minimum error, high-dimensional PCA. Probablilistic PCA (ML-PCA, EM, Bayesian PCA). Non-linear latent variable models: ICA, kPCA |

3 |
3.1, 3.3, 3.5 |
Regression. Linear Basis Function Models, least squares and maximum likelihood. Bayesian linear regression. Evidence Approximation. |

4 |
4.1, 4.3, 14.3 |
Classification. Disriminant functions; least squares. Logistic regression. Mixture of linear classifiers: Boosting and Bagging. |

5 |
9.1-9.2 |
Clustering. K-Means, Gaussian mixture model, Expectation-Maximization, Spectral clustering. |

6 |
6.1-6.2, 6.4, 7.1 |
Kernel methods. Dual representation, kernel trick; Constructing kernels. Gaussian processes, GP regression, GP classification. Support vector machines, k-SVM. |

7 |
4.1.7, 5.1-5.3 |
Artificial neural networks. Biological motivation; the perceptron; Feed-forward Network. Single Layer network training. Multi-layer perceptron training: Backpropogation. |

8 |
8.1, 8.3, 13.1-13.2 |
Markov models. Bayesian Networks. Markov Random Fields; Iterated conditional modes (SA, graph-cuts). Hidden Markov Models; forward-backward, Viterbi algorithm. |

9 |
N/A |
Advanced Topics (optional). Reinforcement learning, Bellman optimality. Vapnik-Chervonenkis (VC) dimension; overfit and underfit. Probably approximately correct (PAC) learning. |

10 |
N/A |
Leeway (to accommodate midterm and holidays in the preceding weeks). Review. |