Table of Contents
Fetching ...

Teamwork: Collaborative Diffusion with Low-rank Coordination and Adaptation

Sam Sartor, Pieter Peers

TL;DR

Teamwork addresses the challenge of expanding input and output channels for pretrained diffusion models by coordinating multiple adapted base-model instances ('teammates') through a novel low-rank offset that jointly models adaptation and coordination. The core idea extends LoRA to a shared, non-block-diagonal DeltaW, enabling cross-teammate information flow without increasing architectural changes or training cost, and it supports dynamic (de)activation of channels. The approach delivers competitive or superior results across inpainting, SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis, while reducing training time and enabling flexible operation on heterogeneous data. This has practical impact for graphics pipelines requiring richer conditioning and outputs without retraining large models, offering a scalable path to complex, multi-channel diffusion tasks.

Abstract

Large pretrained diffusion models can provide strong priors beneficial for many graphics applications. However, generative applications such as neural rendering and inverse methods such as SVBRDF estimation and intrinsic image decomposition require additional input or output channels. Current solutions for channel expansion are often application specific and these solutions can be difficult to adapt to different diffusion models or new tasks. This paper introduces Teamwork: a flexible and efficient unified solution for jointly increasing the number of input and output channels as well as adapting a pretrained diffusion model to new tasks. Teamwork achieves channel expansion without altering the pretrained diffusion model architecture by coordinating and adapting multiple instances of the base diffusion model (\ie, teammates). We employ a novel variation of Low Rank-Adaptation (LoRA) to jointly address both adaptation and coordination between the different teammates. Furthermore Teamwork supports dynamic (de)activation of teammates. We demonstrate the flexibility and efficiency of Teamwork on a variety of generative and inverse graphics tasks such as inpainting, single image SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis.

Teamwork: Collaborative Diffusion with Low-rank Coordination and Adaptation

TL;DR

Teamwork addresses the challenge of expanding input and output channels for pretrained diffusion models by coordinating multiple adapted base-model instances ('teammates') through a novel low-rank offset that jointly models adaptation and coordination. The core idea extends LoRA to a shared, non-block-diagonal DeltaW, enabling cross-teammate information flow without increasing architectural changes or training cost, and it supports dynamic (de)activation of channels. The approach delivers competitive or superior results across inpainting, SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis, while reducing training time and enabling flexible operation on heterogeneous data. This has practical impact for graphics pipelines requiring richer conditioning and outputs without retraining large models, offering a scalable path to complex, multi-channel diffusion tasks.

Abstract

Large pretrained diffusion models can provide strong priors beneficial for many graphics applications. However, generative applications such as neural rendering and inverse methods such as SVBRDF estimation and intrinsic image decomposition require additional input or output channels. Current solutions for channel expansion are often application specific and these solutions can be difficult to adapt to different diffusion models or new tasks. This paper introduces Teamwork: a flexible and efficient unified solution for jointly increasing the number of input and output channels as well as adapting a pretrained diffusion model to new tasks. Teamwork achieves channel expansion without altering the pretrained diffusion model architecture by coordinating and adapting multiple instances of the base diffusion model (\ie, teammates). We employ a novel variation of Low Rank-Adaptation (LoRA) to jointly address both adaptation and coordination between the different teammates. Furthermore Teamwork supports dynamic (de)activation of teammates. We demonstrate the flexibility and efficiency of Teamwork on a variety of generative and inverse graphics tasks such as inpainting, single image SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis.

Paper Structure

This paper contains 26 sections, 3 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Schematic overview of different common input and output channel expansion techniques for pretrained diffusion models. Input expansion: (a) Zero-convolution expands the input head with zero-initialized weights and subsequently finetunes the model. The latter operation increases the risk of overfitting, and thus destroy potentially valuable embedded priors. (b) ControlNet reduces the risk of overfitting by finetuning a copy of the (frozen) pretrained diffusion model while injecting weight offsets to each layer in the original diffusion model. Output expansion: (c) Batching, a common training optimization step, allows to run multiple instances of a model in parallel at inference. However, each model instance in a batch is unaware of the others, and thus no coordination occurs between the different instances. (d) Joint Attention coordinates between different instances of adaptations of a diffusion model by replacing self-attention layers with joint-attention layers over the multiple instances. Conceptually, Joint Attention is the dual of Teamwork (e) which keeps attention computations within each instance, but instead shares the features from the linear layers.
  • Figure 2: Teamwork performs qualitative similarly to different Stable Diffusion 3 based inpainting methods.
  • Figure 3: Qualitative comparison of SVBRDFs estimated from real-world captured photographs of Teamwork variants and MatFusion Sartor:2023:MGD under colocated ($1$st row), environment ($2$nd row), and flash/no-flash ($3$rd row) lighting.
  • Figure 4: Qualitative comparison of Teamwork variants and MatFusion Sartor:2023:MGD on simulated captures under colocated lighting for three synthetic SVBRDFs. For each SVBRDF we show a rerendering under novel lighting and the estimated diffuse albedo, specular albedo, roughness, and normal maps.
  • Figure 5: Qualitative comparison (on examples from HyperSim Roberts:2021:HPS) of the pretrained Instrinsic Image Diffusion Kocsis:2023:IID and pretrained RGB$\rightarrow$X Zeng:2024:RGB against a Stable Diffusion 3 based Teamwork variant trained on the heterogeneous training set. For each method the resulting intrinsic components (if available) are organized as: ($1$st row): summed albedo, diffuse albedo, and specular albedo; ($2$nd row): shading, diffuse shading, specular residual; and ($3$rd row): roughness, normals, and depth.
  • ...and 4 more figures