Projects
A rather comprehensive list of projects that I have worked on, grouped by area of interest.
Learning, generalization, and domain adaptation

A Modern Take on the BiasVariance Tradeoff in Neural Networks We measure prediction bias and variance in NNs. Both bias and variance decrease as the number of parameters grows. We decompose variance into variance due to sampling and variance due to initialization.
Lead: Brady Neal 
Adversarial representation learning for domain generalization We propose a process that enforces pairwise domain invariance while training a feature extractor over a diverse set of domains. We show that this process ensures invariance to any distribution that can be expressed as a mixture of the training domains.
Lead: Isabela Albuquerque, João Monteiro (INRS) 
In Support of OverParametrization in deep RL There is significant recent evidence in supervised learning that, in the overparametrized setting, wider networks achieve better test error. We experiment on four OpenAI Gym tasks and provide evidence that overparametrization is also beneficial in deep RL.
Lead: Brady Neal 
Connections between max margin classifiers and gradient penalties Maximummargin classifiers can be formulated as Integral Probability Metrics (IPMs) or classifiers with some form of gradient norm penalty. This implies a direct link to a class of Generative adversarial networks (GANs) which penalize a gradient norm.
Lead: Alexia JolicoeurMartineau, Image source: wikipedia
Differentiable games

Accelerating Smooth Games by Manipulating Spectral Shapes We use matrix iteration theory to characterize acceleration in smooth games. The spectral shape of a family of games is the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family.
Lead: Waiss Azizian 
Linear Lower Bounds and Conditioning of Differentiable Games We approach the question of fundamental iteration complexity for smooth, differentiable games by providing lower bounds to complement the linear (i.e. geometric) upper bounds observed in the literature on a wide class of problems.
Lead: Adam Ibrahim 
A Unified Analysis of gradient methods for a Whole Spectrum of Games We provide new analyses of the extragradient's local and global convergence properties and tighter rates for optimistic gradient and consensus optimization. Unlike in convex minimization, EG may be much faster than gradient descent.
Lead: Waiss Azizian 
Multiobjective training of GANs with multiple discriminators We study GANs with multiple discriminators by framing them as a multiobjective optimization problem. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.
Lead: Isabela Albuquerque, João Monteiro (INRS) 
Negative Momentum for Improved Game Dynamics Alternating updates are more stable than simultaneous updates on simple games. A negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs
Lead: Gauthier Gidel, Reyhane AskariHemmat
Optimization and numerical analysis

Reducing variance in online optimization by transporting past gradients Implicit gradient transport turns past gradients into gradients evaluated at the current iterate. It reduces the variance in online optimization and can be used as a dropin replacement for the gradient estimate in a number of wellunderstood methods such as heavy ball or Adam.
Lead: Sebastien Arnold 
YellowFin: Selftuning optimization for deep learning Simple insights on the momentum update yield an very efficient parameterfree algorithm that performs well across networks and datasets without the need to tune any parameters.
Lead: Jian Zhang 
Accelerated stochastic power iteration Exciting recent results on how adding a momentum term to the power iteration yields a numerically stabe, accelerated method.
Lead: Peng Xu 
Asynchrony begets momentum When training largescale systems asynchronously, you get a momentum surprise. We prove that system dynamics "bleed" into the algorithm introducing a momentum term even when the algorithm uses none. This theoretical result has very significant implications on largescale optimization systems.

Parallel SGD: When does averaging help? Averaging as a variancereducing mechanism. For convex objectives, we show the benefit of frequent averaging depends on the gradient variance envelope. For nonconvex objectives, we illustrate that this benefit depends on the presence of multiple optimal points.
Lead: Jian Zhang 
Memory Limited, Streaming PCA An algorithm that uses O(kp) memory and is able to compute the kdimensional spike with quasioptimal, O(plogp), samplecomplexity  the first algorithm of its kind.
Deep learning and applications

StateReification Networks We model the distribution of hidden states over the training data and then project test hidden states on this distribution. This method helps neural nets generalize better, and overcome the challenge of achieving robust generalization with adversarial training
Lead: Alex Lamb; oral presentation at ICML 2019 
Manifold Mixup: Better Representations by Interpolating Hidden States Simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold mixup improves strong baselines in supervised learning, robustness to singlestep adversarial attacks, and test loglikelihood.
Lead: Vikas Verma, Alex Lamb 
Deep representations and Adversarial Generation of 3D Point Clouds The first AutoEncoder design suited to 3D point cloud data beats state of the art in reconstruction accuracy. GANs trainined in the AE's latent space generate realistic objects from everyday classes.
Lead: Panos Achlioptas
MCMC methods
Largescale systems

MLSys: The New Frontier of Machine Learning Systems We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as
Whitepaper 
Deep Learning at 15 Petaflops 15PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We scale to 10,000 nodes by implementing a hybrid synchronous/asynchronous system and applying careful optimization of hyperparameters.
Collaboration with NERSC at Lawrence Berkeley Labs and Intel. 
Omnivore: Optimizer for multidevice deep learning A high performance system prototype combining a number of much needed algorithmic and software optmimizations. Importantly, we identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency.
Lead: Stefan Hadjis 
FrogWild!  Fast PageRank approximations on Graph Engines Using random walks and a simple modification of the GraphLab egine, we manage to get a 7x improvement compared to the state of the art.

Finding Dense Subgraphs via LowRank Bilinear Optimization Our method searches a lowdimensional space for provably dense subgraphs of graphs with billions of edges. We provide data dependent guarantees on the quality of the solution that depend on the graph spectrum.
Lead: Dimitris Papailiopoulos