Table of Contents
Fetching ...

Separable neural architectures as a primitive for unified predictive and generative intelligence

Reza T. Batley, Apurba Sarker, Rajib Mostakim, Andrew Klichine, Sourav Saha

Abstract

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

Separable neural architectures as a primitive for unified predictive and generative intelligence

Abstract

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.
Paper Structure (9 sections, 9 equations, 6 figures, 1 table)

This paper contains 9 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The separable neural architecture (SNA) as a unified primitive for predictive and generative intelligence. The SNA formalises a representational class that constructs high-dimensional mappings by combining lower-arity learnable components (atoms) selected by an interaction tensor. By constraining interaction order and tensor rank, this formalism subsumes generalised additive, quadratic and tensor-decomposed neural models.
  • Figure 2: Prediction and inversion with a canonical separable neural architecture.a, A schematic of the experimental setup. A laser directed energy deposition machine builds thin-walled structures layer by layer on a stainless steel 304 substrate, whilst an infrared camera records the evolving thermal field during the build. These measurements are subsequently linked to the mechanical response of the material through tensile testing of extracted coupons. b, Predictive performance versus trainable parameters on the Inconel 718 thermal-history dataset. KHRONOS achieves state-of-the-art accuracy in both yield stress (YS) and ultimate tensile strength (UTS) with up to five orders-of-magnitude fewer parameters than prior models from the literature xie2021mechanisticfang2022data, and a thousand times fewer than XGBoost chen2016xgboost. c, Generative inversion of target mechanical properties to thermal histories. KHRONOS' lightweight structure enables rapid recovery of entire ensembles of plausible histories consistent with queried YS and UTS targets. Here, 47 trajectories converged for YS (399.9MPa) recovered in 47.3ms and 64 for UTS (670.4MPa) recovered in 39.5ms. The illustrated mean and range of converged trajectories closely match the ground-truth thermal history.
  • Figure 3: Variational separable neural architectures recover high-dimensional PDE solution manifolds with favourable scaling.a, Spatiotemporal evolution of the field for fixed $\omega=\frac{\pi}{3}$ and $D=0.001$. The top and middle rows compare KHRONOS's predicted solution with the exact, and the bottom shows KHRONOS's relative error. b, The six-dimensional spatiotemporal-parametric advection-diffusion field learned by KHRONOS. Stacked ($x-y$) spatial slices across time $t$ are shown across rotation-diffusivity ($\omega-D$) parameter space, illustrating the recovery of the entire solution manifold in a single global representation. c, Approximation ($L^2$) error versus trainable parameters for the same system under refinement of rank $R$ and resolution $C$. Rank-isolines are connected and colour-coded. Along rank-isolines, errors decrease with resolution at slope $=-4$ before saturating at the rank capacity limit. Across ranks, an efficient frontier emerges (fitted slope $\approx-0.68$ in log-log space), sustained across four orders of magnitude in parameter count.
  • Figure 4: Bidirectional generative framework and realisation of seamless, multiscale metamaterials.a, Schematic of Janus's architecture. A three-dimensional convolutional autoencoder encodes a unit cell voxelised microstructure into a 64-dimensional latent space from which it learns to reconstruct them. A separable neural architecture head, similar to KHRONOS, predicts physical properties from the latent. This head is readily inverted to generate a new microstructure given target properties. b, Forward prediction accuracy of Janus on key components of the stress tensor from a held-out test set, demonstrating near-perfect correlation. c, Principal component analysis (PCA) of the latent space coloured by axial stiffness $C_{1111}$, highlighting the smooth manifold learned by the network. d, Macroscale $C_{1111}$ stiffness targets as prescribed by the cantilever beam model, volumetric rendering of the 40-cell multiscale beam with Janus-designed microstructures, and rendering shaded with local relative errors as determined by FFT homogenisation. e, Beamwise validation of the designed property field. Actual volume fraction exactly tracks the target, and axial stiffness closely agrees across the beam with low relative errors. f, Summary of local stiffness-field and global beam-level metrics. Local $C_{1111}$ errors remain below $3.5\%$, whilst global metrics remain below $2\%$, confirming the intended structural-level response.
  • Figure 5: Analysis of Leviathan as a foundation model for turbulence across three rollout seeds.a-c, three-dimensional principal components of the embeddings of the entire vocabulary set. a, Leviathan generates a continuous embedding manifold of low intrinsic dimensionality, with the visualised components explaining $85\%$ of the variance. b, A dense Transformer embeds isotropically, explaining only $14\%$ of the variance. c, the isotropic cloud of Leviathan's embedding space when trained on the unstructured o200k_base tokeniser. Despite the mathematical structure of quantised vorticity, the dense embedding space in b more closely resembles that of an unstructured language tokeniser. d, Quantitative validation of long-horizon -- 20 timestep -- physical consistency. Leviathan, under four sampling techniques (expectation, top-50, top-5, greedy) outperforms deterministic operators (DeepONet, Fourier neural operator, U-Net) across all metrics (left to right: enstrophy log-ratio error, total spectral energy log-ratio error, spectral slope error, Jensen--Shannon divergence) when controlling for parameters. The dense Transformer is competitive in enstrophy and Jensen-Shannon divergence. e, Evolution of radial energy spectra in time, with Leviathan best maintaining inertial-range statistics. The deterministic operators rapidly fall away from the direct numerical simulation (DNS) ground truth. The Fourier neural operator fades to a constant field in a single step, with flat spectrum. f, Evolution of the probability density function $P(\omega)$ of vorticity. Deterministic models drift catastrophically to a non-physical mean state -- a delta distribution -- whereas Leviathan preserves the heavy-tailed structure of the chaotic attractor. The dense Transformer retains some structure, avoiding collapse to a mean state.
  • ...and 1 more figures