Table of Contents
Fetching ...

Improving Object Detection by Modifying Synthetic Data with Explainable AI

Nitish Mital, Simon Malzard, Richard Walters, Celso M. De Melo, Raghuveer Rao, Victoria Nockles

TL;DR

This work introduces a novel, human-in-the-loop framework that uses robust Explainable AI to guide the design of synthetic data for object detection. By applying SHAP saliency maps to identify unique and common features between classes, it iteratively modifies 3D mesh models in Unity to reinforce or disrupt features, improving detector performance even on unseen orientations. Across infrared vehicle detection tasks, initial synthetic data boosted mAP50 by 4.6%, and SHAP-guided refinements yielded an extra 1.5% gain, while also reducing misclassifications. The approach reduces human workload in dataset curation and holds promise for automation and broader application to robust, unbiased detection systems.

Abstract

Limited real-world data severely impacts model performance in many computer vision domains, particularly for samples that are underrepresented in training. Synthetically generated images are a promising solution, but 1) it remains unclear how to design synthetic training data to optimally improve model performance (e.g, whether and where to introduce more realism or more abstraction) and 2) the domain expertise, time and effort required from human operators for this design and optimisation process represents a major practical challenge. Here we propose a novel conceptual approach to improve the efficiency of designing synthetic images, by using robust Explainable AI (XAI) techniques to guide a human-in-the-loop process of modifying 3D mesh models used to generate these images. Importantly, this framework allows both modifications that increase and decrease realism in synthetic data, which can both improve model performance. We illustrate this concept using a real-world example where data are sparse; detection of vehicles in infrared imagery. We fine-tune an initial YOLOv8 model on the ATR DSIAC infrared dataset and synthetic images generated from 3D mesh models in the Unity gaming engine, and then use XAI saliency maps to guide modification of our Unity models. We show that synthetic data can improve detection of vehicles in orientations unseen in training by 4.6% (to mAP50 = 94.6%). We further improve performance by an additional 1.5% (to 96.1%) through our new XAI-guided approach, which reduces misclassifications through both increasing and decreasing the realism of different parts of the synthetic data. Our proof-of-concept results pave the way for fine, XAI-controlled curation of synthetic datasets tailored to improve object detection performance, whilst simultaneously reducing the burden on human operators in designing and optimising these datasets.

Improving Object Detection by Modifying Synthetic Data with Explainable AI

TL;DR

This work introduces a novel, human-in-the-loop framework that uses robust Explainable AI to guide the design of synthetic data for object detection. By applying SHAP saliency maps to identify unique and common features between classes, it iteratively modifies 3D mesh models in Unity to reinforce or disrupt features, improving detector performance even on unseen orientations. Across infrared vehicle detection tasks, initial synthetic data boosted mAP50 by 4.6%, and SHAP-guided refinements yielded an extra 1.5% gain, while also reducing misclassifications. The approach reduces human workload in dataset curation and holds promise for automation and broader application to robust, unbiased detection systems.

Abstract

Limited real-world data severely impacts model performance in many computer vision domains, particularly for samples that are underrepresented in training. Synthetically generated images are a promising solution, but 1) it remains unclear how to design synthetic training data to optimally improve model performance (e.g, whether and where to introduce more realism or more abstraction) and 2) the domain expertise, time and effort required from human operators for this design and optimisation process represents a major practical challenge. Here we propose a novel conceptual approach to improve the efficiency of designing synthetic images, by using robust Explainable AI (XAI) techniques to guide a human-in-the-loop process of modifying 3D mesh models used to generate these images. Importantly, this framework allows both modifications that increase and decrease realism in synthetic data, which can both improve model performance. We illustrate this concept using a real-world example where data are sparse; detection of vehicles in infrared imagery. We fine-tune an initial YOLOv8 model on the ATR DSIAC infrared dataset and synthetic images generated from 3D mesh models in the Unity gaming engine, and then use XAI saliency maps to guide modification of our Unity models. We show that synthetic data can improve detection of vehicles in orientations unseen in training by 4.6% (to mAP50 = 94.6%). We further improve performance by an additional 1.5% (to 96.1%) through our new XAI-guided approach, which reduces misclassifications through both increasing and decreasing the realism of different parts of the synthetic data. Our proof-of-concept results pave the way for fine, XAI-controlled curation of synthetic datasets tailored to improve object detection performance, whilst simultaneously reducing the burden on human operators in designing and optimising these datasets.

Paper Structure

This paper contains 14 sections, 25 figures, 1 table.

Figures (25)

  • Figure 1: Conceptual illustration of our proposed approach for improving the performance of object detection and classification algorithms that are trained on synthetic images, through use of SHAP saliency maps to guide modification of mesh models used to generate synthetic data. Numbered circles represent the six key steps in our approach outlined in Section \ref{['sec:procedure']} and demonstrated using examples in Section \ref{['sec:XAI_examples']}.
  • Figure 2: Illustration of the DSIAC ATR Dataset and Synthetic images. (a) Real example MWIR images showing different vehicle classes (BTR70 and 2S3) at different ranges ($2000m$ and $1000m$). (b) Screenshot of the Probuilder tool probuilder in Unity for the BTR70 class, where we set material properties of all the faces of the object mesh in order to generate synthetic MWIR images. Comparison of real and synthetic MWIR images for the BTR70 are shown in panels (c) and (d), respectively.
  • Figure 3: Experimental Setup. (left) Training data for only real data, showing vehicle orientations (blue segments) for all classes in training dataset from DSIAC ATR dataset. (middle) Testing data, showing vehicle orientations (gray segments) for all classes in test dataset. (right) Training data for real + synthetic data, showing vehicle orientations for real data (blue segments) and for synthetic data (red striped segments). Example cropped images shown for BTR70 class for orientations given in white text.
  • Figure 4: Confusion matrices calculated by taking an average over $4$ different seeds for models trained with real data plus (a) no synthetic data (b) synthetic data v0, (c) synthetic data v(R+D) which includes both reinforcing modification, vR, to unique features of SUV relative to BTR70 and disruptive modification, vD, to common features of ZSU23 and BTR70.
  • Figure 5: Illustration of the synthetic data modification process for reducing confusion. a,d: Ground-truth samples. b,e: Top shows confusion matrix for model trained on the initial synthetic data showing misclassifications highlighted in the confusion matrix. Bottom shows synthetic data samples (left), and SHAP contribution plots (right) for correct classifications capturing the unique features, and for misclassifications that capture the common features. c,f: Top shows confusion matrix for model trained on modified synthetic data, showing reduced misclassifications (see Supplementary Fig. 3 for full confusion matrices). Bottom shows samples of modified synthetic data (left), and SHAP contribution plots (right) for correct classifications illustrating an increased focus on unique features and reduced focus on common features. Pixels with a contribution score $<50\%$ (chosen to represent a trade-off between information and noise) of the highest score are masked in purple.
  • ...and 20 more figures