IFT 6085: Theoretical principles for deep learning (new class)
Research in deep learning produces state-of-the-art results on a number of machine learning tasks. Most of those advances are driven by intuition and massive exploration through trial and error. As a result, theory is currently lagging behind practice. The ML community does not fully understand why the best methods work.
- Why can we reliably optimize non-convex objectives?
- How expressive are our architectures, in terms of the hypothesis class they describe?
- Why do some of our most complex models generalize to unseen examples when we use datasets orders of magnitude smaller than what the classic statistical learning theory deems sufficient?
A symptom of this lack of understanding is that deep learning methods largely lack guarantees and interpretability, two necessary properties for mission-critical applications. More importantly, a solid theoretical foundation can aid the design of a new generation of efficient methods—sans the need for blind trial-and-error-based exploration.
In this class we will go over a number of recent publications that attempt to shed light onto these questions. Before discussing the new results in each paper we will first introduce the necessary fundamental tools from optimization, statistics, information theory and statistical mechanics. The purpose of this class is to get students engaged with new research in the area. To that end, the majority of credit will be given for a class project report and presentation on a relevant topic.
Prerequisites: This is meant to be an advanced graduate class for students who want to engage in theory-driven deep learning research. We will introduce the theoretical tools necessary, but start with the assumption that students are comfortable with basic probability and linear algebra.
Lecturer: Ioannis Mitliagkas, Office: 3359, André-Aisenstadt
Winter 2018 semester:
- Wednesday 9h30-11h30
- Thursday 9h00-11h00
Note the earlier starting time on Thursday. This will allow students to also attend the Advanced RL class at McGill.
Homeworks and midterm: a small percentage of the grade will be based on homeworks and possibly a midterm exam testing knowledge of some basic tools we introduced.
Project: report handed in and presentation in April (TBA).
Tentative topics–to be updated as we go along
- Generalization: theoretical analysis and practical bounds
- Information theory and its applications in ML (information bottleneck, lower bounds etc.)
- Generative models beyond the pretty pictures: a tool for traversing the data manifold, projections, completion, substitutions etc.
- Taming adversarial objectives: Wasserstein GANs, regularization approaches and controlling the dynamics
- The expressive power of deep networks (deep information propagation, mean-field analysis of random networks etc.)