ACTesting: Automated Cross-modal Testing Method of Text-to-Image Software
Siqi Gu, Chunrong Fang, Quanjun Zhang, Zhenyu Chen
TL;DR
ACTesting introduces an automated black-box cross-modal testing framework for Text-to-Image (T2I) software, grounded in metamorphic testing to mitigate oracle scarcity. It defines cross-modal semantic consistency via Entity-Relationship (ER) triples as the metamorphic relation, and uses three adaptability-density-guided mutation operators (EC, ER_R, ER_A) to generate informative inputs. The method leverages object and scene-graph analyses to verify MR satisfaction on generated images, revealing entity/relationship errors and improving fault detection across five T2I engines on the MS-COCO dataset. Experimental results show mutation-based testing degrades text-image consistency and image realism relative to baselines, with ablations demonstrating the value of operator combinations. Overall, ACTesting demonstrates reliable error identification in T2I software and offers a path toward robustness improvements through targeted, mutation-driven data generation.
Abstract
Recently, creative generative artificial intelligence software has emerged as a pivotal assistant, enabling users to generate content and seek inspiration rapidly. Text-to-Image (T2I) software, one of the most widely used, synthesizes images with text input by engaging in a cross-modal process. However, despite substantial advancements in the T2I engine, T2I software still encounters errors when generating complex or non-realistic scenes, including omitting focal entities, low image realism, and mismatched text-image information. The cross-modal nature of T2I software complicates error detection for traditional testing methods, and the absence of test oracles further exacerbates the complexity of the testing process. To fill this gap, we propose ACTesting, an Automated Cross-modal Testing Method of Text-to-Image Software, the first testing method explicitly designed for T2I software. ACTesting utilizes the metamorphic testing principle to address the oracle problem and identifies cross-modal semantic consistency as its fundamental Metamorphic relation (MR) by employing the Entity-relationship (ER) triples. We design three kinds of mutation operators under the guidance of MR and the adaptability density constraint to construct the new input text. After generating the images based on the text, ACTesting verifies whether MR is satisfied by detecting the ER triples across two modalities to detect the errors of T2I software. In our experiments across five popular T2I software, ACTesting effectively generates error-revealing tests, resulting in a decrease in text-image consistency by up to 20% when compared to the baseline. Additionally, an ablation study demonstrates the efficacy of the proposed mutation operators. The experimental results validate that ACTesting can reliably identify errors within T2I software.
