Table of Contents
Fetching ...

Learning Visual Information Utility with PIXER

Yash Turkar, Timothy Chase, Christo Aluckal, Karthik Dantu

TL;DR

PIXER addresses the lack of a universal, uncertainty-aware measure of visual information utility for feature extraction by introducing featureness, a per-pixel metric derived from a lightweight, single-shot Bayesian framework. The method combines a per-pixel feature-likelihood map $P$ with an uncertainty map $U$ through a three-stage training process, yielding a compact model (~1M parameters) that outputs both $P$ and $U$. Featureness $F$ is then computed to guide downstream processing, such as feature filtering, enabling robust visual odometry with up to ~${31}\%$ RMSE improvement while using ~${49}\%$ fewer features across multiple datasets and detectors. The results demonstrate that incorporating holistic visual uncertainty before feature processing can enhance perception reliability with modest computational overhead, suggesting broad applicability to autonomous robotics and other vision tasks.

Abstract

Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the concept of "Featureness," which reflects the inherent interest and reliability of visual information for robust recognition, independent of any specific feature type. Leveraging a generalization on Bayesian learning, our approach quantifies both the probability and uncertainty of a pixel's contribution to robust visual utility in a single-shot process, avoiding costly operations such as Monte Carlo sampling and permitting customizable featureness definitions adaptable to a wide range of applications. We evaluate PIXER on visual odometry with featureness selectivity, achieving an average of 31% improvement in RMSE trajectory with 49% fewer features.

Learning Visual Information Utility with PIXER

TL;DR

PIXER addresses the lack of a universal, uncertainty-aware measure of visual information utility for feature extraction by introducing featureness, a per-pixel metric derived from a lightweight, single-shot Bayesian framework. The method combines a per-pixel feature-likelihood map with an uncertainty map through a three-stage training process, yielding a compact model (~1M parameters) that outputs both and . Featureness is then computed to guide downstream processing, such as feature filtering, enabling robust visual odometry with up to ~ RMSE improvement while using ~ fewer features across multiple datasets and detectors. The results demonstrate that incorporating holistic visual uncertainty before feature processing can enhance perception reliability with modest computational overhead, suggesting broad applicability to autonomous robotics and other vision tasks.

Abstract

Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the concept of "Featureness," which reflects the inherent interest and reliability of visual information for robust recognition, independent of any specific feature type. Leveraging a generalization on Bayesian learning, our approach quantifies both the probability and uncertainty of a pixel's contribution to robust visual utility in a single-shot process, avoiding costly operations such as Monte Carlo sampling and permitting customizable featureness definitions adaptable to a wide range of applications. We evaluate PIXER on visual odometry with featureness selectivity, achieving an average of 31% improvement in RMSE trajectory with 49% fewer features.
Paper Structure (15 sections, 1 equation, 3 figures, 1 table)

This paper contains 15 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 2: The training of PIXER is a three-step process. First, we train a network with a general understanding of interestingness (i.e., feature point detection) where we make use of SiLK gleize_silk_2023 in this work (top left). Next, we convert this model to a Bayesian Neural Network (BNN) and train again using the addition of probabilistic losses (e.g., KL Divergence gal_dropout_2016, top middle). Finally, we train a specialized uncertainty head using feature variance computed by Monte Carlo supervision from the BNN (top right). The PIXER inference model is then the joint feature-point probability and uncertainty networks (bottom middle). The combination of pixel-wise probability and uncertainty forms our definition of featureness $F$ (bottom right), used to describe the general utility of the visual information.
  • Figure 3: Visual Odometry pipeline (grey blocks) with PIXER feature filtering based on featureness masks $F$ (blue blocks).
  • Figure 4: We evaluate PIXER aided visual odometry on a custom dataset collected using a ZED 2i camera + Mosaic X5 GNSS on a Boston Dynamics Spot Quadruped. Results in \ref{['tab:vo-eval']} show superior estimation performance with mean RMSE improvement of 34% and mean feature reduction of 41%.