Table of Contents
Fetching ...

Loop Polarity Analysis to Avoid Underspecification in Deep Learning

Donald Martin,, David Kinney

TL;DR

The paper tackles the problem of brittle out-of-distribution generalization in deep learning when the causal structure of the data-generating process is underspecified. It introduces loop polarity analysis from system dynamics to create polarity-based data representations and couples this with LSTM-based latent-parameter inference to learn from time-series data. A polarity classifier trained on loop-polarity features demonstrates improved robustness to distribution shifts in a simulated SIR epidemic, outperforming a baseline trained on raw data. The work highlights the value of incorporating causal-dynamics knowledge into the ML development pipeline to mitigate underspecification and enhance real-world applicability.

Abstract

Deep learning is a powerful set of techniques for detecting complex patterns in data. However, when the causal structure of that process is underspecified, deep learning models can be brittle, lacking robustness to shifts in the distribution of the data-generating process. In this paper, we turn to loop polarity analysis as a tool for specifying the causal structure of a data-generating process, in order to encode a more robust understanding of the relationship between system structure and system behavior within the deep learning pipeline. We use simulated epidemic data based on an SIR model to demonstrate how measuring the polarity of the different feedback loops that compose a system can lead to more robust inferences on the part of neural networks, improving the out-of-distribution performance of a deep learning model and infusing a system-dynamics-inspired approach into the machine learning development pipeline.

Loop Polarity Analysis to Avoid Underspecification in Deep Learning

TL;DR

The paper tackles the problem of brittle out-of-distribution generalization in deep learning when the causal structure of the data-generating process is underspecified. It introduces loop polarity analysis from system dynamics to create polarity-based data representations and couples this with LSTM-based latent-parameter inference to learn from time-series data. A polarity classifier trained on loop-polarity features demonstrates improved robustness to distribution shifts in a simulated SIR epidemic, outperforming a baseline trained on raw data. The work highlights the value of incorporating causal-dynamics knowledge into the ML development pipeline to mitigate underspecification and enhance real-world applicability.

Abstract

Deep learning is a powerful set of techniques for detecting complex patterns in data. However, when the causal structure of that process is underspecified, deep learning models can be brittle, lacking robustness to shifts in the distribution of the data-generating process. In this paper, we turn to loop polarity analysis as a tool for specifying the causal structure of a data-generating process, in order to encode a more robust understanding of the relationship between system structure and system behavior within the deep learning pipeline. We use simulated epidemic data based on an SIR model to demonstrate how measuring the polarity of the different feedback loops that compose a system can lead to more robust inferences on the part of neural networks, improving the out-of-distribution performance of a deep learning model and infusing a system-dynamics-inspired approach into the machine learning development pipeline.
Paper Structure (7 sections, 10 equations, 4 figures)

This paper contains 7 sections, 10 equations, 4 figures.

Figures (4)

  • Figure 1: Simple neural network with a four-dimensional input layer, a single four-dimensional hidden layer, and one-dimensional output layer.
  • Figure 2: Stock-flow diagram representing the set of differential equations that define the epidemic system considered here. Note that $\Lambda$ is the birth rate in the population, $\mu$ is the death rate, $\beta$ is the average number of interactions between infected and susceptible people in the population per time step, and $\gamma$ is the recovery rate.
  • Figure 3: Mean accuracy across 20 iterations of 100 simulated OOD epidemics of both a neural network trained on raw data and a neural network trained on polarity data.
  • Figure 4: A high-level overview of the ML development pipeline and our proposed intervention.