Table of Contents
Fetching ...

Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection

Dongsu Song, Daehwa Ko, Jay Hoon Jung

TL;DR

This work tackles the realism gap in black-box vision attacks by introducing Remember and Forget Pixel Attack using Reinforcement Learning (RFPAR), a framework that perturbates only a small number of pixels under an $L_0$ budget to mislead image classifiers and object detectors. It combines a Remember phase that searches perturbations via a CNN-based policy and a Forget phase that resets exploration to prevent overfitting, guided by a one-step REINFORCE objective. RFPAR achieves state-of-the-art attack performance on ImageNet-1K for classification and delivers competitive mean Average Precision (mAP) reductions on MS-COCO and Argoverse for object detection, all with substantially fewer queries than prior pixel attacks. The results reveal that sparse, patch-independent perturbations can effectively compromise modern vision systems, highlighting the need for defenses such as adversarial training and query-rate protections to mitigate such black-box threats.

Abstract

It is well known that query-based attacks tend to have relatively higher success rates in adversarial black-box attacks. While research on black-box attacks is actively being conducted, relatively few studies have focused on pixel attacks that target only a limited number of pixels. In image classification, query-based pixel attacks often rely on patches, which heavily depend on randomness and neglect the fact that scattered pixels are more suitable for adversarial attacks. Moreover, to the best of our knowledge, query-based pixel attacks have not been explored in the field of object detection. To address these issues, we propose a novel pixel-based black-box attack called Remember and Forget Pixel Attack using Reinforcement Learning(RFPAR), consisting of two main components: the Remember and Forget processes. RFPAR mitigates randomness and avoids patch dependency by leveraging rewards generated through a one-step RL algorithm to perturb pixels. RFPAR effectively creates perturbed images that minimize the confidence scores while adhering to limited pixel constraints. Furthermore, we advance our proposed attack beyond image classification to object detection, where RFPAR reduces the confidence scores of detected objects to avoid detection. Experiments on the ImageNet-1K dataset for classification show that RFPAR outperformed state-of-the-art query-based pixel attacks. For object detection, using the MSCOCO dataset with YOLOv8 and DDQ, RFPAR demonstrates comparable mAP reduction to state-of-the-art query-based attack while requiring fewer query. Further experiments on the Argoverse dataset using YOLOv8 confirm that RFPAR effectively removed objects on a larger scale dataset. Our code is available at https://github.com/KAU-QuantumAILab/RFPAR.

Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection

TL;DR

This work tackles the realism gap in black-box vision attacks by introducing Remember and Forget Pixel Attack using Reinforcement Learning (RFPAR), a framework that perturbates only a small number of pixels under an budget to mislead image classifiers and object detectors. It combines a Remember phase that searches perturbations via a CNN-based policy and a Forget phase that resets exploration to prevent overfitting, guided by a one-step REINFORCE objective. RFPAR achieves state-of-the-art attack performance on ImageNet-1K for classification and delivers competitive mean Average Precision (mAP) reductions on MS-COCO and Argoverse for object detection, all with substantially fewer queries than prior pixel attacks. The results reveal that sparse, patch-independent perturbations can effectively compromise modern vision systems, highlighting the need for defenses such as adversarial training and query-rate protections to mitigate such black-box threats.

Abstract

It is well known that query-based attacks tend to have relatively higher success rates in adversarial black-box attacks. While research on black-box attacks is actively being conducted, relatively few studies have focused on pixel attacks that target only a limited number of pixels. In image classification, query-based pixel attacks often rely on patches, which heavily depend on randomness and neglect the fact that scattered pixels are more suitable for adversarial attacks. Moreover, to the best of our knowledge, query-based pixel attacks have not been explored in the field of object detection. To address these issues, we propose a novel pixel-based black-box attack called Remember and Forget Pixel Attack using Reinforcement Learning(RFPAR), consisting of two main components: the Remember and Forget processes. RFPAR mitigates randomness and avoids patch dependency by leveraging rewards generated through a one-step RL algorithm to perturb pixels. RFPAR effectively creates perturbed images that minimize the confidence scores while adhering to limited pixel constraints. Furthermore, we advance our proposed attack beyond image classification to object detection, where RFPAR reduces the confidence scores of detected objects to avoid detection. Experiments on the ImageNet-1K dataset for classification show that RFPAR outperformed state-of-the-art query-based pixel attacks. For object detection, using the MSCOCO dataset with YOLOv8 and DDQ, RFPAR demonstrates comparable mAP reduction to state-of-the-art query-based attack while requiring fewer query. Further experiments on the Argoverse dataset using YOLOv8 confirm that RFPAR effectively removed objects on a larger scale dataset. Our code is available at https://github.com/KAU-QuantumAILab/RFPAR.

Paper Structure

This paper contains 33 sections, 5 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Adversarial examples generated by RFPAR. The first column represents images from ImageNet (image classification), the second column from MS-COCO (object detection), and the third column from Argoverse (object detection). Each row represents a different condition: the first row shows clean images, the second row shows adversarially perturbed images, and the third row shows the perturbation levels with the ratio of attacked pixels to total pixels. Labels in the images indicate detected objects or classifications, such as "Cock" in ImageNet, "2 Objects" in MS-COCO, and "5 Objects" in Argoverse. In the adversarial row, labels are altered due to perturbations, resulting in misclassifications or undetected objects, such as "Coil" instead of "Cock" in ImageNet and no objects detected in MS-COCO and Argoverse. The perturbation row indicates the percentage of pixels attacked in the image. The percentages were 0.004% for ImageNet, 0.027% for MS-COCO, and 0.114% for Argoverse.
  • Figure 2: The model architecture of RFPAR: the Remember and Forget process. During the Remember process, the RL model generates perturbed images and corresponding rewards. Memory compares these with previously stored values and retains only the highest reward and its associated image. Once the rewards converge to a certain value, the Forget process starts and resets the RL agent and memory, then reintroduces the perturbed images that gained the highest reward to the Remember process. The process continues until an adversarial image is generated or a predefined number of cycles is reached, at which point it terminates.
  • Figure 3: Ablation study. The x and y axes show different victim models and the attack success rate, respectively. The notation $_I$ signifies the inclusion of the initialization step in the Forget process, and $_M$ denotes that the Remember process incorporates memory.
  • Figure 4: Adversarial examples generated by RFPAR on the ImageNet dataset. The "Original Image" is the original unaltered image, the "Delta" represents the difference between the Original Image and the Adversarial Image, and the "Adversarial Image" is the image with the altered prediction. The predicted labels are shown below the Original Image and the Adversarial Image.
  • Figure 5: Adversarial examples generated by RFPAR on the MS-COCO dataset. The Original Image represents the unaltered image, and the Delta shows the difference between the Original Image and the Adversarial Image. The parameter $\alpha$ is a hyperparameter that determines the attack level; a higher value of $\alpha$ attacks more pixels. We conducted experiments with $\alpha$ ranging from 0.01 to 0.05. The Delta Image resulting from $\alpha$ values of 0.01 to 0.05 is presented in columns 2 to 6, and the Adversarial Image generated from the same $\alpha$ values is shown in columns 7 to 11. The Adversarial Image typically indicates an image with a changed prediction, but in this context, it also includes unsuccessful attacks. We present the results of Delta and Adversarial Images according to different values of $\alpha$.