How to Encode Domain Information in Relation Classification
Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank
TL;DR
This work tackles the challenge of domain-specific relation classification by exploring multi-domain training with explicit domain encoding. It introduces CrossRE 2.0 to balance data across six domains and compares three encoding strategies: dataset embeddings, special domain markers, and domain-aware entity types. The study finds that simple domain markers outperform other methods, yielding a Macro-F1 gain of over $2$ points on average, particularly improving domain-dependent relations, while many labels with stable cross-domain interpretations benefit less. Overall, the results demonstrate that domain-aware input conditioning can significantly enhance cross-domain RC performance and offer a practical path for leveraging heterogeneous data sources. The CrossRE 2.0 extension and analysis provide a valuable benchmark and insight into how domain signals influence RC across domains.
Abstract
Current language models require a lot of training data to obtain high performance. For Relation Classification (RC), many datasets are domain-specific, so combining datasets to obtain better performance is non-trivial. We explore a multi-domain training setup for RC, and attempt to improve performance by encoding domain information. Our proposed models improve > 2 Macro-F1 against the baseline setup, and our analysis reveals that not all the labels benefit the same: The classes which occupy a similar space across domains (i.e., their interpretation is close across them, for example "physical") benefit the least, while domain-dependent relations (e.g., "part-of'') improve the most when encoding domain information.
