Abstract: Stochastic Gradient Descent (SGD) is the de facto
optimization algorithm for training neural networks in modern
machine learning, thanks to its unique scalability to problem sizes
where the data points, the number of data points, and the...