# IFT6085: Class bibliography

**This list has been maintained during the previous iterations on the class. Last update, February 2020.
Papers marked with an asterisk where added in 2019.
Papers marked with two asterisks where added in 2020.
Please feel free to make paper suggestions in the class’s discussion group.**

This is a non-exhaustive list of relatively recent papers that include theoretical or theory-driven work that applies on deep learning.

We will draw from this list for the seminar section of the class, including student paper presentations. Students are welcome to suggest other papers that they like as long as they are relevant to the class.

Note: some of these results are not given in a deep learning setting, however the ideas and techniques there are very useful for our exploration in class.

## Assorted papers

****(Review) Monte Carlo Gradient Estimation in Machine Learning
**, Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih, 2019.

****(Review) Causality for Machine Learning**, Bernhard Schölkopf, 2019.

****(Review) Computational Optimal Transport**, Gabriel Peyré, Marco Cuturi, 2019.

## Learning and Generalization

****Backward Feature Correction: How Deep Learning Performs Deep Learning**, Zeyuan Allen-Zhu, Yuanzhi Li, 2020.

****Do ImageNet Classifiers Generalize to ImageNet?**, Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar, 2019.

****Rethinking statistical learning theory: learning using statistical invariants**, Vladimir Vapnik & Rauf Izmailov, 2019.

****Memory capacity of neural networks with threshold and ReLU activations**, Roman Vershynin, 2020.

****The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks**, Jonathan Frankle, Michael Carbin

****More Data Can Hurt for Linear Regression: Sample-wise Double Descent**,

****Reconciling modern machine learning practice and the bias-variance trade-off**, Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal, 2019

***A Modern Take on the Bias-Variance Tradeoff in Neural Networks**,
Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

***Provable Bounds for Learning Some Deep Representations**,
Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

***On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning**,
Jian Li, Xuanyuan Luo, Mingda Qiao

***Data-Dependent Stability of Stochastic Gradient Descent**,
Ilja Kuzborskij, Christoph H. Lampert

***Understanding deep learning requires rethinking generalization**,
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

**Data-dependent path normalization in neural networks**,
Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

**High-dimensional dynamics of generalization error in neural networks**,
Madhu S. Advani, Andrew M. Saxe

**Robustness and Generalization**,
Huan Xu, Shie Mannor

**A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks**,
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro

**Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural
Networks with Many More Parameters than Training Data**,
Gintare Karolina Dziugaite, Daniel M. Roy

**Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy**,
Gintare Karolina Dziugaite, Daniel M. Roy

**Exponential convergence of testing error for stochastic gradient methods**,
Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

**Spectrally-normalized margin bounds for neural networks**,
Peter Bartlett, Dylan J. Foster, Matus Telgarsky

**Data-Dependent Stability of Stochastic Gradient Descent**,
Ilja Kuzborskij, Christoph H. Lampert

**Why and When Can Deep – but Not Shallow – Networks Avoid the Curse of Dimensionality: a Review**,
Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao

## Distributional robustness, domain adaptation, etc

****Invariant Risk Minimization**, Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz, 2019.

****Adversarial Examples Are Not Bugs, They Are Features**, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry, 2019.

**Certifiable distributional robustness with principled adversarial training**,
Aman Sinha, Hongseok Namkoong, John Duchi

## Optimization landscape of deep networks

**Deep Learning without Poor Local Minima**,
Kenji Kawaguchi

**Entropy-SGD: biasing gradient descent into wide valleys**,
Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

**Identity Matters in Deep Learning**,
Moritz Hardt, Tengyu Ma

**The Loss Surfaces of Multilayer Networks**,
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun

**Theoretical insights into the optimization landscape of over-parameterized shallow neural networks**,
Mahdi Soltanolkotabi, Adel Javanmard, Jason D. Lee

## *(Deep) Reinforcement Learning

****Reinforcement Learning via Fenchel-Rockafellar Duality**, Ofir Nachum, Bo Dai, 2020.

***Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?**,
Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

***A Distributional Perspective on Reinforcement Learning**,
Marc G. Bellemare, Will Dabney, Rémi Munos

***Equivalence Between Policy Gradients and Soft Q-Learning**,
John Schulman, Xi Chen, Pieter Abbeel

***Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review**,
Sergey Levine

## Optimization and Games

****Poly-time universality and limitations of deep learning**, Emmanuel Abbe, Colin Sandon, 2020.

****A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets**, Nicolas L. Roux
Mark Schmidt, Francis R. Bach, 2012

****Near Optimal Methods for Minimizing Convex Functions with
Lipschitz p-th Derivatives**, Alexander Gasnikov, et al, 2019.

***Negative Momentum for Improved Game Dynamics**,
Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

***The Mechanics of n-Player Differentiable Games**,
David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

***Qualitatively characterizing neural network optimization problems**,
Ian J. Goodfellow, Oriol Vinyals, Andrew M. Saxe

***SGD Converges to Global Minimum in Deep Learning via Star-convex Path**,
Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

**A geometric alternative to Nesterov’s accelerated gradient descent**,
Sébastien Bubeck, Yin Tat Lee, Mohit Singh

**Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent**,
Chi Jin, Praneeth Netrapalli, Michael I. Jordan

**Adaptive Restart for Accelerated Gradient Schemes**,
Brendan O’Donoghue, Emmanuel Candes

**Efficient Second Order Online Learning by Sketching**,
Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

**From Averaging to Acceleration, There is Only a Step-size**,
Nicolas Flammarion and Francis Bach

**Optimal rates for zero-order convex optimization: the power of two function evaluations**,
John C. Duchi, Michael I. Jordan, Martin J. Wainwright, Andre Wibisono

**Optimizing Neural Networks with Kronecker-factored Approximate Curvature**,
James Martens, Roger Grosse

**The Marginal Value of Adaptive Gradient Methods in Machine Learning**,
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht

**Statistical inference using SGD**,
Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis

**Stochastic Gradient Descent as Approximate Bayesian Inference**,
Stephan Mandt, Matthew D. Hoffman, David M. Blei

**Non-Convex Learning via Stochastic Gradient Langevin
Dynamics: A Nonasymptotic Analysis**,
Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky

**Trust Region Policy Optimization**,
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel

## Generative models (mostly GANs)

****Theoretical guarantees for approximate sampling from smooth and log-concave densities**, Arnak S. Dalalyan, 2014.

***Approximability of Discriminators Implies Diversity in GANs**,
Yu Bai, Tengyu Ma, Andrej Risteski

***Tighter Variational Bounds are Not Necessarily Better**,
Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh

**Stabilizing GAN Training with Multiple Random Projections**,
Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti

**Stabilizing Training of Generative Adversarial Networks through Regularization**,
Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

**Compressed Sensing using Generative Models**,
Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis

**Demystifying MMD GANs**,
Mikołaj Bińkowski, Dougal J. Sutherland, Michael Arbel, Arthur Gretton

**Generalization and Equilibrium in Generative Adversarial Nets (GANs)**,
Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang

**Do GANs actually learn the distribution? An empirical study**,
Sanjeev Arora, Yi Zhang

**The Numerics of GANs**,
Lars Mescheder, Sebastian Nowozin, Andreas Geiger

**PacGAN: The power of two samples in generative adversarial networks**,
Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh

## Meta-Learning

***Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm**
Chelsea Finn, Sergey Levine

***On First-Order Meta-Learning Algorithms**,
Alex Nichol, Joshua Achiam, John Schulman

***Understanding Short-Horizon Bias in Stochastic Meta-Optimization**,
Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse

## Information Theory

****An Information-Theoretic Analysis of Thompson Sampling**, Daniel Russo, Benjamin Van Roy, 2016.

***On variational lower bounds of mutual information**,
Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker

***Estimating Information Flow in Neural Networks**,
Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

**Opening the Black Box of Deep Neural Networks via Information**,
Ravid Shwartz-Ziv, Naftali Tishby

**On the information bottleneck theory of deep learning**,
Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox

**Emergence of Invariance and Disentanglement in Deep Representations**,
Alessandro Achille, Stefano Soatto

**MINE: Mutual information neural estimation**,
Ishmael Belghazi, Sai Rajeswar, Aristide Baratin, R Devon Hjelm, Aaron Courville