Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

He Li; Wenyue He; Weihang Kong; Xingchen Zhang

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

He Li, Wenyue He, Weihang Kong, Xingchen Zhang

TL;DR

A joint position-color optimization framework (AP-PCO) for generating adversarial patches in visual-infrared settings that achieves consistently strong attack performance across multiple architectures, providing a practical benchmark for robustness evaluation in VI perception systems.

Abstract

Multimodal adversarial attacks for dense prediction remain largely underexplored. In particular, visual-infrared (VI) perception systems introduce unique challenges due to heterogeneous spectral characteristics and modality-specific intensity distributions. Existing adversarial patch methods are primarily designed for single-modal inputs and fail to account for crossspectral inconsistencies, leading to reduced attack effectiveness and poor stealthiness when applied to VI dense prediction models. To address these challenges, we propose a joint position-color optimization framework (AP-PCO) for generating adversarial patches in visual-infrared settings. The proposed method optimizes patch placement and color composition simultaneously using a fitness function derived from model outputs, enabling a single patch to perturb both visible and infrared modalities. To further bridge spectral discrepancies, we introduce a crossmodal color adaptation strategy that constrains patch appearance according to infrared grayscale characteristics while maintaining strong perturbations in the visible domain, thereby reducing cross-spectral saliency. The optimization procedure operates without requiring internal model information, supporting flexible black-box attacks. Extensive experiments on visual-infrared dense prediction tasks demonstrate that the proposed AP-PCO achieves consistently strong attack performance across multiple architectures, providing a practical benchmark for robustness evaluation in VI perception systems.

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

TL;DR

Abstract

Paper Structure (28 sections, 16 equations, 8 figures, 14 tables, 1 algorithm)

This paper contains 28 sections, 16 equations, 8 figures, 14 tables, 1 algorithm.

Introduction
Related work
Visual-infrared perception
Adversarial attacks
Positions and content optimization of patches
Methodology
Threat model
Problem formulation
Global Optimization Framework
Fitness function
Representation of patch position
Representation of patch color
Joint optimization
Optimization Coupling Analysis
Experiments
...and 13 more sections

Figures (8)

Figure 1: Comparison of existing adversarial attack work and this study. The security of visual–infrared dense prediction models, remains underexplored compared with single-modal settings. This study aims to fill this gap.
Figure 2: Comparison between traditional adversarial patches (“Trad-Patch”) and the proposed patch (“AP-PCO”). Traditional methods determine the patch position based on a single forward pass of the target model and do not optimize color for multimodal data, which limits their effectiveness and stealthiness in visual–infrared settings. In contrast, our approach performs iterative, gradient-free global search to jointly determine position and color, and incorporates a cross-modal color reuse strategy to achieve stronger attacks and better stealthiness.
Figure 3: Framework of cross-modal adversarial patches with position-color joint optimization. The initial population consists of a series of randomly generated circular samples. Then, through mutation, crossover, and boundary handling, a child population with diverse positions and colors is generated. A fitness function is then applied for cross-modal evaluation to compare the parent and child populations and repeat the iterative loop, ultimately obtaining the optimal patch deployed for dense prediction tasks.
Figure 4: Variation trajectory of patch positions across iterations. Yellow markers denote the patch center coordinates of all individuals in each generation, while red markers indicate the center coordinate of the individual with the highest fitness in that generation. As iterations progress, the population gradually converges toward crowd-dense regions and ultimately stabilizes at the optimal patch location.
Figure 5: Visualizations of patch attacks for the crowd counting, semantic segmentation, and image fusion tasks. As can be seen, the proposed method shows strong attack performance across three VI dense prediction tasks.
...and 3 more figures

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

TL;DR

Abstract

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)