Interpretable Multi-task Learning with Shared Variable Embeddings
Maciej Żelaszczyk, Jacek Mańdziuk
TL;DR
The paper tackles learning across multiple tasks with heterogeneous input/output spaces by introducing shared variable embeddings (SVE) that reuse a common embedding base through attention. The method enables predictions using a cross-attention mechanism where raw variable embeddings query a compact set of shared embeddings, followed by a shared encoder–decoder pipeline, trained end-to-end with a squared hinge loss. Key contributions include the SVE architecture, demonstrations that it matches vanilla variable embeddings in accuracy while offering interpretability benefits, and systematic ablations showing gains in training efficiency with sparse attention at some interpretability cost. The work advances interpretable multi-task learning on tabular data and suggests practical trade-offs between interpretability and performance, with potential extensions to other domains and self-supervised settings.
Abstract
This paper proposes a general interpretable predictive system with shared information. The system is able to perform predictions in a multi-task setting where distinct tasks are not bound to have the same input/output structure. Embeddings of input and output variables in a common space are obtained, where the input embeddings are produced through attending to a set of shared embeddings, reused across tasks. All the embeddings are treated as model parameters and learned. Specific restrictions on the space of shared embedings and the sparsity of the attention mechanism are considered. Experiments show that the introduction of shared embeddings does not deteriorate the results obtained from a vanilla variable embeddings method. We run a number of further ablations. Inducing sparsity in the attention mechanism leads to both an increase in accuracy and a significant decrease in the number of training steps required. Shared embeddings provide a measure of interpretability in terms of both a qualitative assessment and the ability to map specific shared embeddings to pre-defined concepts that are not tailored to the considered model. There seems to be a trade-off between accuracy and interpretability. The basic shared embeddings method favors interpretability, whereas the sparse attention method promotes accuracy. The results lead to the conclusion that variable embedding methods may be extended with shared information to provide increased interpretability and accuracy.
