Theoretical Machine Learning Seminar

Stability and Generalization in Adaptive Data Analysis

Datasets are often used multiple times with each successive analysis depending on the outcomes of previous analyses on the same dataset. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. In this talk I will overview a recent line of work on adaptive data analysis that studies the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. I'll then focus on a new notion of algorithmic stability with the important property that it "composes" well when data is reused. I'll demonstrate that this notion of stability implies that the algorithm reveals little information about its input and, in particular, cannot lead to overfitting. Finally, I'll describe simple algorithms based on this notion that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries.

Based in part on a joint work with Thomas Steinke.

Date & Time

April 12, 2018 | 12:15pm – 1:45pm

Location

White-Levy Room

Speakers

Vitaly Feldman

Affiliation

Google Brain