Beyond Language: Applying MLX Transformers to Engineering Physics

Stavros Kassinos; Alessio Alexiadis

Beyond Language: Applying MLX Transformers to Engineering Physics

Stavros Kassinos, Alessio Alexiadis

TL;DR

A physics-informed Transformer model for solving the heat conduction problem in a 2D plate with Dirichlet boundary conditions is introduced, implemented in the machine learning framework MLX and leverages the unified memory of Apple M-series processors.

Abstract

Transformer Neural Networks are driving an explosion of activity and discovery in the field of Large Language Models (LLMs). In contrast, there have been only a few attempts to apply Transformers in engineering physics. Aiming to offer an easy entry point to physics-centric Transformers, we introduce a physics-informed Transformer model for solving the heat conduction problem in a 2D plate with Dirichlet boundary conditions. The model is implemented in the machine learning framework MLX and leverages the unified memory of Apple M-series processors. The use of MLX means that the models can be trained and perform predictions efficiently on personal machines with only modest memory requirements. To train, validate and test the Transformer model we solve the 2D heat conduction problem using central finite differences. Each finite difference solution in these sets is initialized with four random Dirichlet boundary conditions, a uniform but random internal temperature distribution and a randomly selected thermal diffusivity. Validation is performed in-line during training to monitor against over-fitting. The excellent performance of the trained model is demonstrated by predicting the evolution of the temperature field to steady state for the unseen test set of conditions.

Beyond Language: Applying MLX Transformers to Engineering Physics

TL;DR

Abstract

Paper Structure (31 sections, 29 equations, 31 figures, 1 table)

This paper contains 31 sections, 29 equations, 31 figures, 1 table.

Introduction and motivation
A High-Level Introduction to Transformers
Understanding Transformers One Step Further: The Encoder and Decoder Architecture
Inside the ‘Input Transformation’ Box: Self-Attention Mechanism
The engineering perspective: Self-Attention Mechanism
Inside the ‘Output Transformation’ Box: Cross-Attention Mechanism
The Engineering Perspective: Cross-Attention Mechanism
Multi-Headed Attention
Positional embeddings
The final touches (inference)
Training
The final touches (training)
Methods
The physical problem: basic configuration
The physical problem: challenge configuration 1
...and 16 more sections

Figures (31)

Figure 1: Step-by-step process of text generation using a Transformer model. At each iteration, the model takes the initial input ("I am") and the previously generated output to predict the next token. The model adds a special token <s> to the user's input, indicating the start of the sequence. The process continues, adding tokens to the sequence, until the end-of-sequence token <eos> is generated.
Figure 2: Encoder-Decoder architecture. This figure corresponds to Iteration 3 in Figure \ref{['fig:iterations']}, where the encoder has processed the input ("I am") into an encoded representation. The decoder uses this encoded input and the previously generated output (<s> a teacher) to predict the next word in the sequence ("and").
Figure 3: Encoder-Decoder architecture. Building on Figure \ref{['fig:encoder_decoder']}, this figure shows how the decoder transforms its input sequence (<s> a teacher) into an internal representation before generating the next token ("and"). This process is similar to having an 'encoder within the decoder,' as the input transformation in the decoder works in a way that is comparable to how the encoder processes the original input.
Figure 4: Self-attention and cross-attention. This figure expands on Figure \ref{['fig:encoder_decoder2']} by 'opening up' the Input Transformation and Output Generation boxes to show the self-attention mechanism inside the encoder and decoder, and the cross-attention mechanism inside the decoder. Self-attention helps capture dependencies within the input, while cross-attention allows the decoder to focus on relevant parts of the encoder's output.
Figure 5: Ineffective methods for adding positional information to token embeddings. (a) Index-based encoding assigns a simple index to each token (e.g., 0, 1, 2, 3), which can lead to large gradients at higher positions. (b) Normalized index encoding scales positional values between 0 and 1 by dividing each index by the sequence length, but it may cause ambiguity across sequences of different lengths. (c) Binary encoding represents positions using fixed-length binary vectors, but lacks smoothness in positional transitions.
...and 26 more figures

Beyond Language: Applying MLX Transformers to Engineering Physics

TL;DR

Abstract

Beyond Language: Applying MLX Transformers to Engineering Physics

Authors

TL;DR

Abstract

Table of Contents

Figures (31)