IDSC was honored to present Claus Ekstrøm, professor and vice-chair at the Section of Biostatistics, University of Copenhagen, for a lecture at the Richter Library on Wednesday, December, 1, 2021. Dr. Ekstrøm’s primary research interests are centered on developing methods for the analysis of high-dimensional data problems and causal discovery. Dr. Ekstrøm has authored two books on statistics and is frequently used as an expert on statistics in Danish news media. He has, he says, been a grumpy old man from a young age.
Title of the lecture: “Validation of visual inference methods in statistics by use of deep learning”
When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visual inference is commonly used for various tasks in statistics—including model diagnostics and exploratory data analysis—and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback.
We propose a new validation method for visual inference. Our method trains deep neural networks to distinguish between plots simulated under two different data-generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data-generating mechanisms, thereby providing a meaningful scale that new visual inference procedures can be validated against.
We apply the method to three popular diagnostic plots for linear regression, namely the scatter plot, the quantile-quantile plot, and the residual plot. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity.