Table of Contents
Fetching ...

A Tutorial on Dimensionless Learning: Geometric Interpretation and the Effect of Noise

Zhengtao Jake Gan, Xiaoyu Xie

TL;DR

This work addresses the challenge of automatically discovering dimensionless numbers and scaling laws from experimental data by marrying Buckingham's π theorem with a geometric, data-driven approach. It introduces a five-module pipeline that computes a null-space basis of the dimension matrix, reduces dimensionality with PCA or SIR, and discovers dimensionless groups via learnable coefficients in a neural-network framework, aided by a quantization regularizer that enforces simple, interpretable coefficients. The approach is validated on synthetic cases, demonstrating robustness to noise and discrete sampling, and extended to multiple dominant dimensionless numbers, where the learned representations form low-dimensional manifolds and subspaces of equivalent forms. An open-source Streamlit-based interface (PyDimension) is provided to make dimensionless learning accessible to experimentalists, with discussions of current limitations and directions for improving input selection, scalability, and user accessibility.

Abstract

Dimensionless learning is a data-driven framework for discovering dimensionless numbers and scaling laws from experimental measurements. This tutorial introduces the method, explaining how it transforms experimental data into compact physical laws that reveal compact dimensional invariance between variables. The approach combines classical dimensional analysis with modern machine learning techniques. Starting from measurements of physical quantities, the method identifies the fundamental ways to combine variables into dimensionless groups, then uses neural networks to discover which combinations best predict the experimental output. A key innovation is a regularization technique that encourages the learned coefficients to take simple, interpretable values like integers or half-integers, making the discovered laws both accurate and physically meaningful. We systematically investigate how measurement noise and discrete sampling affect the discovery process, demonstrating that the regularization approach provides robustness to experimental uncertainties. The method successfully handles cases with single or multiple dimensionless numbers, revealing how different but equivalent representations can capture the same underlying physics. Despite recent progress, key challenges remain, including managing the computational cost of identifying multiple dimensionless groups, understanding the influence of data characteristics, automating the selection of relevant input variables, and developing user-friendly tools for experimentalists. This tutorial serves as both an educational resource and a practical guide for researchers seeking to apply dimensionless learning to their experimental data.

A Tutorial on Dimensionless Learning: Geometric Interpretation and the Effect of Noise

TL;DR

This work addresses the challenge of automatically discovering dimensionless numbers and scaling laws from experimental data by marrying Buckingham's π theorem with a geometric, data-driven approach. It introduces a five-module pipeline that computes a null-space basis of the dimension matrix, reduces dimensionality with PCA or SIR, and discovers dimensionless groups via learnable coefficients in a neural-network framework, aided by a quantization regularizer that enforces simple, interpretable coefficients. The approach is validated on synthetic cases, demonstrating robustness to noise and discrete sampling, and extended to multiple dominant dimensionless numbers, where the learned representations form low-dimensional manifolds and subspaces of equivalent forms. An open-source Streamlit-based interface (PyDimension) is provided to make dimensionless learning accessible to experimentalists, with discussions of current limitations and directions for improving input selection, scalability, and user accessibility.

Abstract

Dimensionless learning is a data-driven framework for discovering dimensionless numbers and scaling laws from experimental measurements. This tutorial introduces the method, explaining how it transforms experimental data into compact physical laws that reveal compact dimensional invariance between variables. The approach combines classical dimensional analysis with modern machine learning techniques. Starting from measurements of physical quantities, the method identifies the fundamental ways to combine variables into dimensionless groups, then uses neural networks to discover which combinations best predict the experimental output. A key innovation is a regularization technique that encourages the learned coefficients to take simple, interpretable values like integers or half-integers, making the discovered laws both accurate and physically meaningful. We systematically investigate how measurement noise and discrete sampling affect the discovery process, demonstrating that the regularization approach provides robustness to experimental uncertainties. The method successfully handles cases with single or multiple dimensionless numbers, revealing how different but equivalent representations can capture the same underlying physics. Despite recent progress, key challenges remain, including managing the computational cost of identifying multiple dimensionless groups, understanding the influence of data characteristics, automating the selection of relevant input variables, and developing user-friendly tools for experimentalists. This tutorial serves as both an educational resource and a practical guide for researchers seeking to apply dimensionless learning to their experimental data.

Paper Structure

This paper contains 26 sections, 27 equations, 14 figures.

Figures (14)

  • Figure 1: Schematic diagram of the dimensionless learning pipeline workflow showing the five main modules that progressively transform data from high-dimensional inputs through basis space to compact scaling laws.
  • Figure 2: Neural network architecture used in the example walkthrough, following the same structure as DimensionNet saha2021. The network has seven inputs, one linear combination layer (with no activation function) that learns the $\boldsymbol{\gamma}$ coefficients, four hidden layers with ten nodes each using ReLU activation, and one output. The linear layer weights directly represent the $\boldsymbol{\gamma}$ values that combine the basis dimensionless groups.
  • Figure 3: Learned $\boldsymbol{\gamma}$ coefficients from twenty training runs with different random seeds, normalized so the first component is one. The first two components cluster around one, while the third clusters around zero, successfully recovering the target $\boldsymbol{\gamma}$ vector of $[1, 1, 0]^T$.
  • Figure 4: Correlation analysis between output and dimensionless groups. Subfigures (a), (b), and (c) show the correlations between the output $p^*$ and the three basis dimensionless groups $\log \Pi_{b1}$, $\log \Pi_{b2}$, and $\log \Pi_{b3}$ respectively. These plots reveal scattered relationships, indicating that no single basis dimensionless group alone captures the output structure. Subfigure (d) shows the discovered scaling law using the optimal combination $\log \Pi = \log \Pi_{b1} + \log \Pi_{b2}$, where the data points align with the polynomial relationship $p^* = 2 + \Pi + 2\Pi^2$, demonstrating successful dimension reduction from seven inputs to a single dimensionless group.
  • Figure 5: Three dimensional visualization of learned $\boldsymbol{\gamma}$ coefficient vectors from multiple training runs, showing the effect of quantization regularization. The plot displays the coefficients in the three dimensional space spanned by the basis vectors. The green dashed line represents the true direction $[1, 1, 0]^T$ (and all its scalar multiples $[c, c, 0]^T$), along which equivalent solutions lie. Without regularization, solutions are randomly distributed along this line. With quantization regularization, solutions cluster at three distinct half integer points: $[-1, -1, 0]^T$, $[-0.5, -0.5, 0]^T$, and $[0.5, 0.5, 0]^T$, making the discovered dimensionless groups more interpretable while maintaining predictive accuracy.
  • ...and 9 more figures