(Winter 2018) IFT6085: Class bibliography
This list was created in early 2018 for the first edition of this class. An updated list will be available by the end of February 2019 here.
This is a (non-exhaustive) list of recent papers that include theoretical or theory-driven work that applies on deep learning. We will draw from this list for the seminar section of the class, including student paper presentations. Students are welcome to suggest other papers that they like as long as they are relevant to the class.
Note: some of these results are not given in a deep learning setting, however the ideas and techniques there are very useful for our exploration in class.
Generalization
Data-dependent path normalization in neural networks, Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
High-dimensional dynamics of generalization error in neural networks, Madhu S. Advani, Andrew M. Saxe
*Robustness and Generalization, Huan Xu, Shie Mannor
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks, Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
*Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data, Gintare Karolina Dziugaite, Daniel M. Roy
*Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy, Gintare Karolina Dziugaite, Daniel M. Roy
Exponential convergence of testing error for stochastic gradient methods, Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
Spectrally-normalized margin bounds for neural networks, Peter Bartlett, Dylan J. Foster, Matus Telgarsky
*Data-Dependent Stability of Stochastic Gradient Descent, Ilja Kuzborskij, Christoph H. Lampert
*Why and When Can Deep – but Not Shallow – Networks Avoid the Curse of Dimensionality: a Review, Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
Optimization landscape of deep networks
Deep Learning without Poor Local Minima, Kenji Kawaguchi
Entropy-SGD: biasing gradient descent into wide valleys, Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina
Identity Matters in Deep Learning, Moritz Hardt, Tengyu Ma
The Loss Surfaces of Multilayer Networks, Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, Mahdi Soltanolkotabi, Adel Javanmard, Jason D. Lee
Optimization methods
A geometric alternative to Nesterov’s accelerated gradient descent, Sébastien Bubeck, Yin Tat Lee, Mohit Singh
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent, Chi Jin, Praneeth Netrapalli, Michael I. Jordan
Adaptive Restart for Accelerated Gradient Schemes, Brendan O’Donoghue, Emmanuel Candes
Efficient Second Order Online Learning by Sketching, Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford
From Averaging to Acceleration, There is Only a Step-size, Nicolas Flammarion and Francis Bach
Optimal rates for zero-order convex optimization: the power of two function evaluations, John C. Duchi, Michael I. Jordan, Martin J. Wainwright, Andre Wibisono
Optimizing Neural Networks with Kronecker-factored Approximate Curvature, James Martens, Roger Grosse
The Marginal Value of Adaptive Gradient Methods in Machine Learning, Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht
*Statistical inference using SGD, Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis
*Stochastic Gradient Descent as Approximate Bayesian Inference, Stephan Mandt, Matthew D. Hoffman, David M. Blei
*Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis, Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky
*Trust Region Policy Optimization, John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
Generative models
Certifiable distributional robustness with principled adversarial training, Aman Sinha, Hongseok Namkoong, John Duchi
Stabilizing GAN Training with Multiple Random Projections, Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
Stabilizing Training of Generative Adversarial Networks through Regularization, Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann
Compressed Sensing using Generative Models, Ashish Bora, Ajil Jalal, Eric Price, Alexandros G. Dimakis
Demystifying MMD GANs, Mikołaj Bińkowski, Dougal J. Sutherland, Michael Arbel, Arthur Gretton
Generalization and Equilibrium in Generative Adversarial Nets (GANs), Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang
*Do GANs actually learn the distribution? An empirical study, Sanjeev Arora, Yi Zhang
The Numerics of GANs, Lars Mescheder, Sebastian Nowozin, Andreas Geiger
PacGAN: The power of two samples in generative adversarial networks, Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh
Information Theory
Opening the Black Box of Deep Neural Networks via Information, Ravid Shwartz-Ziv, Naftali Tishby
On the information bottleneck theory of deep learning, Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox
Emergence of Invariance and Disentanglement in Deep Representations, Alessandro Achille, Stefano Soatto
MINE: Mutual information neural estimation, Ishmael Belghazi, Sai Rajeswar, Aristide Baratin, R Devon Hjelm, Aaron Courville