Table of Contents
Fetching ...

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

Mario Leiva, Noel Ngu, Joshua Shay Kricheli, Aditya Taparia, Ransalu Senanayake, Paulo Shakarian, Nathaniel Bastian, John Corcoran, Gerardo Simari

TL;DR

The paper tackles robustness of perception under novel environmental shifts by introducing a consistency-based abductive reasoning framework that combines predictions from multiple pre-trained sources at test time using per-model metacognitive rules and domain constraints. It provides exact (IP) and greedy (HS) solution methods, plus a tie-breaker mechanism, and demonstrates substantial improvements over single-model baselines on a controlled AirSim aerial dataset. Across 15 test sets with varied distribution shifts, IP+TB achieves up to approximately 13.6% relative F1 improvement and 16.6% accuracy gains over the best individual method. The work validates consistency-based abductive reasoning as an effective approach to robustly fuse imperfect models in challenging, unseen environments, with practical implications for deployment in critical applications.

Abstract

The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem, building on the idea of abductive learning (ABL) but applying it to test-time instead of training. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation--a subset of model predictions--that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6\% in F1-score and 16.6\% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

TL;DR

The paper tackles robustness of perception under novel environmental shifts by introducing a consistency-based abductive reasoning framework that combines predictions from multiple pre-trained sources at test time using per-model metacognitive rules and domain constraints. It provides exact (IP) and greedy (HS) solution methods, plus a tie-breaker mechanism, and demonstrates substantial improvements over single-model baselines on a controlled AirSim aerial dataset. Across 15 test sets with varied distribution shifts, IP+TB achieves up to approximately 13.6% relative F1 improvement and 16.6% accuracy gains over the best individual method. The work validates consistency-based abductive reasoning as an effective approach to robustly fuse imperfect models in challenging, unseen environments, with practical implications for deployment in critical applications.

Abstract

The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem, building on the idea of abductive learning (ABL) but applying it to test-time instead of training. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation--a subset of model predictions--that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6\% in F1-score and 16.6\% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.

Paper Structure

This paper contains 17 sections, 13 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: Overview of our Consistency-based Abductive Reasoning Approach. Here $\eta$ machine learning models perceive a novel environment. Their results are considered with domain knowledge and metacognitive information about the models (learned independently and with the same training data) to abduce a set of results that exhibit consistency and reduce perceptual errors.
  • Figure 2: Images captured in the same position in AirSim under various weather conditions along with the distribution of weather conditions of the dataset that it represents. Bottom right: Histogram showing average intensity in selected datasets.
  • Figure 3: Left: Performance (F1 and Accuracy) across all test sets. Best values per test set in bold, the second-best are underlined. Right: Ablation Study -- Performance without Tie-Breaker (TB). Values show F1-score or Accuracy for the method without TB, with the percentage difference relative to the corresponding + TB version (w.r.t. values on the left, shown in parentheses).
  • Figure 4: F1-scores for IP+TB and HS+TB vs. baselines (Best Ind. Model, Avg. Models, and Maj. Vote) across the 15 test datasets under increasing average weather intensity.
  • Figure 5: Hyperparameter sensitivity for the MDS-A_1 test set. Left: Performance metrics and inconsistency rates of the Error Detection Rule (EDR) stage across varying $\epsilon$ values. Center: Heuristic Search (HS+TB) performance, as a function of the $\delta$ inconsistency threshold. Right: IP+TB Accuracy depicted as a surface plot, varying $\delta$ and internal $\epsilon$.
  • ...and 6 more figures