Theoretical Machine Learning Seminar
Algorithm and Hardness for Kernel Matrices in Numerical Linear Algebra and Machine Learning
For a function K : R^d x R^d -> R, and a set P = {x_1, ..., x_n} in d-dimension, the K graph G_P of P is the complete graph on n nodes where the weight between nodes i and j is given by K(x_i, x_j). In this paper, we initiate the study of when efficient spectral graph theory is possible on these graphs. We investigate whether or not it is possible to solve the following problems in n^{1+o(1)} time for a K-graph G_P when d < n^{o(1)}:
1. Multiply a given vector by the adjacency matrix or Laplacian matrix of G_P
2. Find a spectral sparsifier of G_P
3. Solve a Laplacian system in G_P's Laplacian matrix
For each of these problems, we consider all functions of the form K(u,v) = f( || u - v ||_2^2 ) for a function f : R -> R. We provide algorithms and comparable hardness results for many such K, including Gaussian kernel, Neural tangent kernels and so on. For example, in dimension d = Omega( log n ), for each of these problems, we show that there is a parameter associated with the function f for which low parameter values imply n^{1+o(1)} time algorithms, and high parameter values imply the nonexistence of subquadratic time algorithms assuming Strong Exponential Time Hypothesis, given natural assumptions on f.
As part of our results, we also show that exponential dependence on the dimension d in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved, assuming Strong Exponential Time Hypothesis, for a broad class of functions f. To the best of our knowledge, this is the first formal limitation proven about fast multipole methods (which is one of the top-10 algorithms in 20th century).
Joint work with Josh Alman (Harvard University), Timothy Chu (Carnegie Mellon University), and Aaron Schild (University of Washington)