Table of Contents
Fetching ...

Synergistic Perception and Generative Recomposition: A Multi-Agent Orchestration for Expert-Level Building Inspection

Hui Zhong, Yichun Gao, Luyan Liu, Xusen Guo, Zhaonian Kuang, Qiming Zhang, Xinhu Zheng

Abstract

Building facade defect inspection is fundamental to structural health monitoring and sustainable urban maintenance, yet it remains a formidable challenge due to extreme geometric variability, low contrast against complex backgrounds, and the inherent complexity of composite defects (e.g., cracks co-occurring with spalling). Such characteristics lead to severe pixel imbalance and feature ambiguity, which, coupled with the critical scarcity of high-quality pixel-level annotations, hinder the generalization of existing detection and segmentation models. To address gaps, we propose \textit{FacadeFixer}, a unified multi-agent framework that treats defect perception as a collaborative reasoning task rather than isolated recognition. Specifically,\textit{FacadeFixer} orchestrates specialized agents for detection and segmentation to handle multi-type defect interference, working in tandem with a generative agent to enable semantic recomposition. This process decouples intricate defects from noisy backgrounds and realistically synthesizes them onto diverse clean textures, generating high-fidelity augmented data with precise expert-level masks. To support this, we introduce a comprehensive multi-task dataset covering six primary facade categories with pixel-level annotations. Extensive experiments demonstrate that \textit{FacadeFixer} significantly outperforms state-of-the-art (SOTA) baselines. Specifically, it excels in capturing pixel-level structural anomalies and highlights generative synthesis as a robust solution to data scarcity in infrastructure inspection. Our code and dataset will be made publicly available.

Synergistic Perception and Generative Recomposition: A Multi-Agent Orchestration for Expert-Level Building Inspection

Abstract

Building facade defect inspection is fundamental to structural health monitoring and sustainable urban maintenance, yet it remains a formidable challenge due to extreme geometric variability, low contrast against complex backgrounds, and the inherent complexity of composite defects (e.g., cracks co-occurring with spalling). Such characteristics lead to severe pixel imbalance and feature ambiguity, which, coupled with the critical scarcity of high-quality pixel-level annotations, hinder the generalization of existing detection and segmentation models. To address gaps, we propose \textit{FacadeFixer}, a unified multi-agent framework that treats defect perception as a collaborative reasoning task rather than isolated recognition. Specifically,\textit{FacadeFixer} orchestrates specialized agents for detection and segmentation to handle multi-type defect interference, working in tandem with a generative agent to enable semantic recomposition. This process decouples intricate defects from noisy backgrounds and realistically synthesizes them onto diverse clean textures, generating high-fidelity augmented data with precise expert-level masks. To support this, we introduce a comprehensive multi-task dataset covering six primary facade categories with pixel-level annotations. Extensive experiments demonstrate that \textit{FacadeFixer} significantly outperforms state-of-the-art (SOTA) baselines. Specifically, it excels in capturing pixel-level structural anomalies and highlights generative synthesis as a robust solution to data scarcity in infrastructure inspection. Our code and dataset will be made publicly available.
Paper Structure (24 sections, 5 equations, 4 figures, 5 tables)

This paper contains 24 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Architectural framework of FacadeFixer. The Orchestrator Agent parses user instructions to coordinate specialized experts for defect detection, segmentation, and generative recomposition through a shared Memory Bank.
  • Figure 2: End-to-end reasoning and execution pipeline of the FacadeFixer agent. The agentic workflow transitions from natural language instruction parsing to hierarchical task decomposition, and final adjudicated perception and generative restoration.
  • Figure 3: Qualitative error analysis and comparative visualization of defect perception. The figure highlights the semantic ambiguities and localization failures of individual SOTA experts in complex facade environments, contrasted with the adjudication of the Gemini-guided selection strategy.
  • Figure 4: Illustrative assessment of synthesized facade defects. The figure showcases the high-fidelity fusion capabilities of the Generative Recomposition Agent alongside inherent limitations in physical-spatial logic observed in certain defect categories.