Table of Contents
Fetching ...

VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction and Recognition

Rahul Moorthy, Jun-Jee Chao, Volkan Isler

TL;DR

This work tackles the problem of reconstructing and characterizing simple polygons from visibility graphs, which lack a natural distance metric. It introduces VisDiff, a diffusion-based method that first predicts the polygon's signed distance function (SDF) conditioned on the visibility graph and then extracts the vertex coordinates from the SDF's zero level set. The approach achieves a 26% improvement in F1-score over strong baselines on visibility reconstruction, demonstrates the ability to generate diverse polygons for a given graph, and generalizes to triangulation graphs for polygon generation. Beyond reconstruction, VisDiff enables high-diversity sampling and interpolation in polygon space, with potential applications in data augmentation, privacy-conscious floorplan design, and planning under occlusion.

Abstract

The ability to capture rich representations of combinatorial structures has enabled the application of machine learning to tasks such as analysis and generation of floorplans, terrains, images, and animations. Recent work has primarily focused on understanding structures with well-defined features, neighborhoods, or underlying distance metrics, while those lacking such characteristics remain largely unstudied. Examples of these combinatorial structures can be found in polygons, where a small change in the vertex locations causes a significant rearrangement of the combinatorial structure, expressed as a visibility or triangulation graphs. Current representation learning approaches fail to capture structures without well-defined features and distance metrics. In this paper, we study the open problem of Visibility Reconstruction: Given a visibility graph $G$, construct a polygon $P$ whose visibility graph is $G$. We introduce VisDiff, a novel diffusion-based approach to generate polygon $P$ from the input visibility graph $G$. The main novelty of our approach is that, rather than generating the polygon's vertex set directly, we first estimate the signed distance function (SDF) associated with the polygon. The SDF is then used to extract the vertex location representing the final polygon. We show that going through the SDF allows VisDiff to learn the visibility relationship much more effectively than generating vertex locations directly. In order to train VisDiff, we create a carefully curated dataset. We use this dataset to benchmark our method and achieve 26% improvement in F1-Score over standard methods as well as state of the art approaches.

VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction and Recognition

TL;DR

This work tackles the problem of reconstructing and characterizing simple polygons from visibility graphs, which lack a natural distance metric. It introduces VisDiff, a diffusion-based method that first predicts the polygon's signed distance function (SDF) conditioned on the visibility graph and then extracts the vertex coordinates from the SDF's zero level set. The approach achieves a 26% improvement in F1-score over strong baselines on visibility reconstruction, demonstrates the ability to generate diverse polygons for a given graph, and generalizes to triangulation graphs for polygon generation. Beyond reconstruction, VisDiff enables high-diversity sampling and interpolation in polygon space, with potential applications in data augmentation, privacy-conscious floorplan design, and planning under occlusion.

Abstract

The ability to capture rich representations of combinatorial structures has enabled the application of machine learning to tasks such as analysis and generation of floorplans, terrains, images, and animations. Recent work has primarily focused on understanding structures with well-defined features, neighborhoods, or underlying distance metrics, while those lacking such characteristics remain largely unstudied. Examples of these combinatorial structures can be found in polygons, where a small change in the vertex locations causes a significant rearrangement of the combinatorial structure, expressed as a visibility or triangulation graphs. Current representation learning approaches fail to capture structures without well-defined features and distance metrics. In this paper, we study the open problem of Visibility Reconstruction: Given a visibility graph , construct a polygon whose visibility graph is . We introduce VisDiff, a novel diffusion-based approach to generate polygon from the input visibility graph . The main novelty of our approach is that, rather than generating the polygon's vertex set directly, we first estimate the signed distance function (SDF) associated with the polygon. The SDF is then used to extract the vertex location representing the final polygon. We show that going through the SDF allows VisDiff to learn the visibility relationship much more effectively than generating vertex locations directly. In order to train VisDiff, we create a carefully curated dataset. We use this dataset to benchmark our method and achieve 26% improvement in F1-Score over standard methods as well as state of the art approaches.
Paper Structure (44 sections, 2 equations, 20 figures, 11 tables)

This paper contains 44 sections, 2 equations, 20 figures, 11 tables.

Figures (20)

  • Figure 1: a): A polygon P is given by an ordered list of vertex locations $X$. Also shown are the visible edges of the polygon in green. b): The visibility graph $G$ of polygon P represented as an adjacency matrix where black denote non-visible edge while white denote visible edges. We seek to answer the question: How much information about $X$ can be recovered from $G$ alone? We show the output of VisDiff for the reconstruction, characterization and recognition problems associated with $G$. c): GNN output of $G$ for reconstruction problem. Clearly, standard GNN embedding methods are not sufficient to recover the vertex locations $X$ from $G$.
  • Figure 2: VisDiff architecture. The model consists of two main components: the U-Net SDF Diffusion block and the Vertex Extraction block. U-Net Diffusion Block: A noisy SDF, denoted as X$_{T}$, is first sampled from a Gaussian distribution. X$_{T}$ then passes through T timesteps of the reverse diffusion process to generate the clean SDF X$_{0}$. This denoising process is conditioned on the input graph G using transformer cross-attention blocks represented by K, Q, and V, which correspond to the key, query, and value terms, respectively. In our approach, Q is obtained from the learned spatial CNN features, while K and V are derived from G. An initial set of vertices P is then estimated from X$_{0}$ via contour extraction. Vertex Extraction Block: Given the predicted SDF X$_{0}$, the SDF encoder generates pixel-aligned features $Z_{pix}$ and global features $Z_{global}$. These features, along with the initial vertices P, are fed into the vertex prediction block to predict the final vertex locations. During Training, the model is supervised using both the ground-truth SDF and the corresponding polygon. During Testing, only the visibility graph G is provided as input.
  • Figure 3: Visibility Characterization: The top row shows multiple polygons generated by VisDiff for the same visibility graph $G$. The first vertex is represented by deep purple and the last vertex by yellow (anticlockwise ordering). The second row shows the visibility graph corresponding to the polygons where white denote visible edge and black denote non-visible edge. The caption shows the F1-Score compared to the ground truth (GT) visibility graph.
  • Figure 4: We provide qualitative and quantitative results of VisDiff on Visibility Recognition problem.
  • Figure 5: Train vs in-distribution test set analysis: \ref{['fig:density']}) The density is inversely proportional to the diameter. Uniform sampling of diameter results in bimodal density. \ref{['fig:diameter']}) Training and testing sets are uniform in terms of the link diameter of the visibility graph.
  • ...and 15 more figures