Table of Contents
Fetching ...

LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Mayu Otani, Naoto Inoue, Kotaro Kikuchi, Riku Togashi

TL;DR

The paper tackles the challenge of evaluating generated layouts by introducing LTSim, a layout similarity measure based on optimal transport that enables flexible, cross-category matching and robust comparison across diverse layout differences. It extends to collection-level evaluation via LTSim-MMD, avoiding reliance on dataset-specific feature extractors. The approach addresses key limitations of existing measures (e.g., DocSim, MeanIoU, FID, Max.IoU) by recognizing many-to-many and cross-category alignments, and it demonstrates superior reliability in distinguishing varying degrees of differences and generation quality. Empirical results on RICO and PubLayNet show LTSim provides more reliable and interpretable comparisons across unconditional and label-conditioned generation tasks, while reducing the need for learned representations. The work offers practical guidance for researchers and practitioners seeking robust, scalable layout evaluation tools applicable to UI and document layouts.

Abstract

We introduce a layout similarity measure designed to evaluate the results of layout generation. While several similarity measures have been proposed in prior research, there has been a lack of comprehensive discussion about their behaviors. Our research uncovers that the majority of these measures are unable to handle various layout differences, primarily due to their dependencies on strict element matching, that is one-by-one matching of elements within the same category. To overcome this limitation, we propose a new similarity measure based on optimal transport, which facilitates a more flexible matching of elements. This approach allows us to quantify the similarity between any two layouts even those sharing no element categories, making our measure highly applicable to a wide range of layout generation tasks. For tasks such as unconditional layout generation, where FID is commonly used, we also extend our measure to deal with collection-level similarities between groups of layouts. The empirical result suggests that our collection-level measure offers more reliable comparisons than existing ones like FID and Max.IoU.

LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

TL;DR

The paper tackles the challenge of evaluating generated layouts by introducing LTSim, a layout similarity measure based on optimal transport that enables flexible, cross-category matching and robust comparison across diverse layout differences. It extends to collection-level evaluation via LTSim-MMD, avoiding reliance on dataset-specific feature extractors. The approach addresses key limitations of existing measures (e.g., DocSim, MeanIoU, FID, Max.IoU) by recognizing many-to-many and cross-category alignments, and it demonstrates superior reliability in distinguishing varying degrees of differences and generation quality. Empirical results on RICO and PubLayNet show LTSim provides more reliable and interpretable comparisons across unconditional and label-conditioned generation tasks, while reducing the need for learned representations. The work offers practical guidance for researchers and practitioners seeking robust, scalable layout evaluation tools applicable to UI and document layouts.

Abstract

We introduce a layout similarity measure designed to evaluate the results of layout generation. While several similarity measures have been proposed in prior research, there has been a lack of comprehensive discussion about their behaviors. Our research uncovers that the majority of these measures are unable to handle various layout differences, primarily due to their dependencies on strict element matching, that is one-by-one matching of elements within the same category. To overcome this limitation, we propose a new similarity measure based on optimal transport, which facilitates a more flexible matching of elements. This approach allows us to quantify the similarity between any two layouts even those sharing no element categories, making our measure highly applicable to a wide range of layout generation tasks. For tasks such as unconditional layout generation, where FID is commonly used, we also extend our measure to deal with collection-level similarities between groups of layouts. The empirical result suggests that our collection-level measure offers more reliable comparisons than existing ones like FID and Max.IoU.
Paper Structure (30 sections, 11 equations, 6 figures, 3 tables)

This paper contains 30 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We propose LTSim, a layout similarity measure based on optimal transportation. This flexible measure allows us to define the similarity among arbitrary layout pairs. Since LTSim does not depend on learned representations, it can be applied to any dataset without the need for training.
  • Figure 2: Existing measures are unable to quantify certain differences between layouts. (a) DocSim and DocEMD fail to identify which layout has smaller differences from the anchor. (b) All measures except ours judge that two layouts have the same similarity to an anchor.
  • Figure 3: DocSim's drawbacks. Top: DocSim assigns much higher similarity values for layout pairs with larger elements than those with small elements. Bottom: DocSim happens to reward differences between layouts.
  • Figure 4: Retrieval examples on RICO and PubLayNet. The leftmost is the query layout and the others are top-1 retrieval by LTSim, DocSim, MeanIoU, and FID-FeatSim.
  • Figure 5: These box plots show the responses to perturbations by Max.IoU, FID, and LTSim-MMD. The overlaps imply that the measure may fail to distinguish the quality of layout collections.
  • ...and 1 more figures