MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Jonas Belouadi, Tamy Boubekeur, Adrien Kaiser
TL;DR
Procedural materials are represented as directed acyclic graphs that generate texture maps for physically-based rendering, but prior work largely relies on text-only graph representations. MultiMat introduces a multimodal program synthesis framework that conditions node-graph generation on visual feedback from intermediate graph states $G_t$ and outputs $I_t$, organized as a multimodal program tree $\mathcal{T}$, and uses a transpiler to convert graphs into Substance Designer formats. An incremental tree search with automatic error repair validates and修 backs up generations to ensure correctness and efficiency during inference. Trained on a large, production-grade Substance Designer dataset and evaluated on unconditional tasks, MultiMat achieves state-of-the-art visual fidelity and generation efficiency, offering a practical path toward accessible, production-grade procedural materials for artists.
Abstract
Material node graphs are programs that generate the 2D channels of procedural materials, including geometry such as roughness and displacement maps, and reflectance such as albedo and conductivity maps. They are essential in computer graphics for representing the appearance of virtual 3D objects parametrically and at arbitrary resolution. In particular, their directed acyclic graph structure and intermediate states enable a modular, interpretable workflow for interactive appearance modeling. However, creating such graphs remains challenging and typically requires professional training. While recent neural program synthesis approaches attempt to simplify this process, they solely represent graphs as textual programs, failing to capture the inherently visual-spatial nature of node graphs that makes them accessible to humans. To address this gap, we present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures static correctness while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.
