Asynchrony begets Momentum
In a recent note, we show that asynchrony in SGD introduces an extra momentum term. In the companion systems paper, we use this theory to understand statistical efficiency and train deep networks faster.
Theory discussed more extensively in this blog post.