Table of Contents
Fetching ...

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

Yifei Gao, Yong Chen, Chen Zhang

TL;DR

FAME tackles function-on-function regression by learning a continuous, data-driven operator directly on irregular functional data. It combines a bidirectional NCDE-based continuous attention block with aMixture-of-Experts router to capture intra-function dynamics and a multi-head cross-attention module to fuse inter-function interactions, followed by an NCDE decoder that outputs continuous targets at arbitrary query points. The framework is proven to be Lipschitz-stable, sampling-invariant, and universally expressive, with empirical results showing state-of-the-art accuracy on synthetic and real FoFR benchmarks and robustness to irregular sampling and heterogeneity. Overall, FAME offers a principled, end-to-end approach that overcomes the limitations of fixed-basis and grid-discretization methods while enabling accurate, flexible function-to-function mappings in challenging settings.

Abstract

Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional-regression benchmarks show that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily sampled discrete observations of functions.

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

TL;DR

FAME tackles function-on-function regression by learning a continuous, data-driven operator directly on irregular functional data. It combines a bidirectional NCDE-based continuous attention block with aMixture-of-Experts router to capture intra-function dynamics and a multi-head cross-attention module to fuse inter-function interactions, followed by an NCDE decoder that outputs continuous targets at arbitrary query points. The framework is proven to be Lipschitz-stable, sampling-invariant, and universally expressive, with empirical results showing state-of-the-art accuracy on synthetic and real FoFR benchmarks and robustness to irregular sampling and heterogeneity. Overall, FAME offers a principled, end-to-end approach that overcomes the limitations of fixed-basis and grid-discretization methods while enabling accurate, flexible function-to-function mappings in challenging settings.

Abstract

Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional-regression benchmarks show that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily sampled discrete observations of functions.

Paper Structure

This paper contains 43 sections, 12 theorems, 29 equations, 7 figures, 8 tables.

Key Result

Theorem 1

Under Assumption ass:bidir-lip, both the forward and backward CDEs in eq:bidirCDE admit unique solutions on $[t_0,T_j]$. Moreover, letting $L_j=\max\{L^{\text{fwd}}_j,\,L^{\text{bwd}}_j\}$, the latent paths satisfy

Figures (7)

  • Figure 1: Architecture of the FAME.
  • Figure 2: Visualization of Functional Features.
  • Figure 3: Parameter-sensitivity curves for FAME.
  • Figure 4: Sensitivity to $K$—test MSE (left axis) and $\tilde{H}$ (right axis).
  • Figure 5: Training and validation loss over epochs. The monotonic convergence illustrates stable optimisation behaviour.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Theorem 1: Existence and uniqueness of the Bi-NCDE
  • Proposition 1: Lipschitz stability
  • Proposition 2: Universal approximation
  • Lemma 1: Mixed-field Lipschitz bound
  • Theorem 2: Well-posed MoE-Driven Bi-NCDE
  • Theorem 3: Well-Posedness and Lipschitz stability of the decoder
  • Lemma 2: Soft-max contraction
  • proof
  • Theorem 4: Global Lipschitz bound
  • proof
  • ...and 8 more