Stochastic Gradient Descent (SGD) is the de facto optimization
algorithm for training neural networks in modern machine learning,
thanks to its unique scalability to problem sizes where the data
points, the number of data points, and the number of...