Table of Contents
Fetching ...

Multitask-Informed Prior for In-Context Learning on Tabular Data: Application to Steel Property Prediction

Dimitrios Sinodinos, Bahareh Nikpour, Jack Yi Wei, Sushant Sinha, Xiaoping Ma, Kashif Rehman, Stephen Yue, Narges Armanfard

Abstract

Accurate prediction of mechanical properties of steel during hot rolling processes, such as Thin Slab Direct Rolling (TSDR), remains challenging due to complex interactions among chemical compositions, processing parameters, and resultant microstructures. Traditional empirical and experimental methodologies, while effective, are often resource-intensive and lack adaptability to varied production conditions. Moreover, most existing approaches do not explicitly leverage the strong correlations among key mechanical properties, missing an opportunity to improve predictive accuracy through multitask learning. To address this, we present a multitask learning framework that injects multitask awareness into the prior of TabPFN--a transformer-based foundation model for in-context learning on tabular data--through novel fine-tuning strategies. Originally designed for single-target regression or classification, we augment TabPFN's prior with two complementary approaches: (i) target averaging, which provides a unified scalar signal compatible with TabPFN's single-target architecture, and (ii) task-specific adapters, which introduce task-specific supervision during fine-tuning. These strategies jointly guide the model toward a multitask-informed prior that captures cross-property relationships among key mechanical metrics. Extensive experiments on an industrial TSDR dataset demonstrate that our multitask adaptations outperform classical machine learning methods and recent state-of-the-art tabular learning models across multiple evaluation metrics. Notably, our approach enhances both predictive accuracy and computational efficiency compared to task-specific fine-tuning, demonstrating that multitask-aware prior adaptation enables foundation models for tabular data to deliver scalable, rapid, and reliable deployment for automated industrial quality control and process optimization in TSDR.

Multitask-Informed Prior for In-Context Learning on Tabular Data: Application to Steel Property Prediction

Abstract

Accurate prediction of mechanical properties of steel during hot rolling processes, such as Thin Slab Direct Rolling (TSDR), remains challenging due to complex interactions among chemical compositions, processing parameters, and resultant microstructures. Traditional empirical and experimental methodologies, while effective, are often resource-intensive and lack adaptability to varied production conditions. Moreover, most existing approaches do not explicitly leverage the strong correlations among key mechanical properties, missing an opportunity to improve predictive accuracy through multitask learning. To address this, we present a multitask learning framework that injects multitask awareness into the prior of TabPFN--a transformer-based foundation model for in-context learning on tabular data--through novel fine-tuning strategies. Originally designed for single-target regression or classification, we augment TabPFN's prior with two complementary approaches: (i) target averaging, which provides a unified scalar signal compatible with TabPFN's single-target architecture, and (ii) task-specific adapters, which introduce task-specific supervision during fine-tuning. These strategies jointly guide the model toward a multitask-informed prior that captures cross-property relationships among key mechanical metrics. Extensive experiments on an industrial TSDR dataset demonstrate that our multitask adaptations outperform classical machine learning methods and recent state-of-the-art tabular learning models across multiple evaluation metrics. Notably, our approach enhances both predictive accuracy and computational efficiency compared to task-specific fine-tuning, demonstrating that multitask-aware prior adaptation enables foundation models for tabular data to deliver scalable, rapid, and reliable deployment for automated industrial quality control and process optimization in TSDR.
Paper Structure (18 sections, 18 equations, 4 figures, 2 tables)

This paper contains 18 sections, 18 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic of Direct Strip Processing Complex
  • Figure 2: An overview of our proposed multitask fine-tuning techniques and the standard TabPFN inference pipeline. (A) Fine-tuning: $X_{\text{train}} \in \mathbb{R}^{N_\text{train},D}$ and $Y_{\text{train}} \in \mathbb{R}^{N_{\text{train}},T}$ represent the train data and train labels respectively, where $N_{\text{train}}$ is the number of training samples, $D$ is the number of features per sample, and $T$ is the number of tasks. To obtain $\bar{Y}_{\text{train}}\in \mathbb{R}^{N_{\text{train}},1}$, we average (Avg) $Y_{\text{train}}$ along the task dimension. Using $X_{\text{train}}$ and $\bar{Y}_{\text{train}}$ as inputs to the underlying transformer of TabPFN, we obtain the predicted $\hat{\bar{Y}}_{\text{train}}$. In the standard multitask fine-tuning (M.F.T), we use only the regression loss $\mathcal{L}_{\text{reg}}$ as the training signal. In multitask adapter fine-tuning (M.A.F.T), we also use the task-specific losses after obtaining the task-specific predictions $\hat{Y}_{\text{train},i}\in \mathbb{R}^{N_{\text{train}},1}$ for $i\in[1,T]$ by passing $\hat{\bar{Y}}_{\text{train}}$ through an MLP adapter. (B) Inference: We sequentially predict the $i$th task target by following the standard TabPFN inference pipeline. This involves first updating the prior statistics of our pre-trained TabPFN using the $\texttt{.fit()}$ function with $X_{\text{train}}$ and $Y_{\text{train},i}$ as inputs, followed by a forward pass using $\texttt{.predict()}$ with $X_{\text{test}}\in \mathbb{R}^{N_{\text{test}},D}$ as input to obtain $\hat{Y}_{\text{test},i}\in \mathbb{R}^{N_{\text{test}},1}$, which is the target for the $i$th task and $N_{\text{test}}$ is the number of test samples.
  • Figure 3: Spearman correlation matrix of the five mechanical property targets (LYS, OYS, EYS, UTS, ELO). Values represent pairwise rank correlations, with positive correlations shown in red and negative correlations in blue.
  • Figure 4: Multitask gain ($\Delta_m$) as a function of fine-tuning time budget for four strategies: no fine-tuning (n.f.t), single-task fine-tuning (s.f.t), multitask fine-tuning (m.f.t), and multitask adapter fine-tuning (m.a.f.t).