Theoretical Machine Learning Seminar
Your Brain on Energy-Based Models: Applying and Scaling EBMs to Problems of Interest to the Machine Learning Community Today
In this talk, I will discuss my two recent works on Energy-Based Models. In the first work, I discuss how we can reinterpret standard classification architectures as class conditional energy-based models and train them using recently proposed methods for large-scale EBM training. We find that adding EBM training in this way provides many benefits while negligibly affecting discriminative performance, contrary to other hybrid generative/discriminative modeling approaches. These benefits include improved calibration, out-of-distribution detection, and robustness to adversarial examples. While methods for training EBMs at scale have improved drastically, they still lag behind other classes of generative models such as flows and GANs. Further, there has been little work on evaluating unnormalized models and comparing them with other model classes. My next work addresses these issues with the Stein Discrepancy (SD). The SD is a measure of distance between two distributions that is defined using samples from one distribution and an unnormalized model for the other. I explore how the stein discrepancy can be estimated in practice at scale and demonstrate applications related to goodness-of-fit testing and unnormalized model evaluation. Next, I show how my approach for SD estimation can be turned into a GAN-like training objective for EBMs which scales to high-dimensional data more gracefully than previous approaches.