Table of Contents
Fetching ...

Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design

Yuxuan Yang, Xiaotong Mao, Jingyao Wang, Fuchun Sun

Abstract

Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.

Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design

Abstract

Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.
Paper Structure (24 sections, 21 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 21 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Examples of critical deficiencies identified in our empirical analysis.
  • Figure 2: Empirical Analysis. (a) Geometric Viability: Models frequently violate hard constraints like collisions and circulation blockage. (b) Style Drift: Confusion matrix showing the probability of models hallucinating "Modern Luxury" elements when prompted for "Wabi-sabi" or "Minimalist" styles due to data bias. (c) Rationale-Execution Gap: A persistent decoupling where high-quality textual reasoning (CoT) does not translate into high-quality spatial execution.
  • Figure 3: Qualitative Comparison of Generated Layouts across Diverse Scenarios. We evaluate four methods on challenging prompts involving strict spatial constraints, irregular object geometries, and abstract themes.
  • Figure 4: Integrated Quantitative and Qualitative Robustness Analysis. Top Left: The bar charts quantify physical violations across all stress-test prompts. Bottom: Qualitative comparison across four challenging scenarios. Note that to make it easier to compare different methods, we use the same prompt in all aspects except for the description of the color scheme.
  • Figure 5: Qualitative comparison of 2D floor plans. The columns represent different methods (from left to right): LayoutGPT, I-Design, FlairGPT, and Design-MLLM.
  • ...and 2 more figures