Table of Contents
Fetching ...

How to Encode Domain Information in Relation Classification

Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank

TL;DR

This work tackles the challenge of domain-specific relation classification by exploring multi-domain training with explicit domain encoding. It introduces CrossRE 2.0 to balance data across six domains and compares three encoding strategies: dataset embeddings, special domain markers, and domain-aware entity types. The study finds that simple domain markers outperform other methods, yielding a Macro-F1 gain of over $2$ points on average, particularly improving domain-dependent relations, while many labels with stable cross-domain interpretations benefit less. Overall, the results demonstrate that domain-aware input conditioning can significantly enhance cross-domain RC performance and offer a practical path for leveraging heterogeneous data sources. The CrossRE 2.0 extension and analysis provide a valuable benchmark and insight into how domain signals influence RC across domains.

Abstract

Current language models require a lot of training data to obtain high performance. For Relation Classification (RC), many datasets are domain-specific, so combining datasets to obtain better performance is non-trivial. We explore a multi-domain training setup for RC, and attempt to improve performance by encoding domain information. Our proposed models improve > 2 Macro-F1 against the baseline setup, and our analysis reveals that not all the labels benefit the same: The classes which occupy a similar space across domains (i.e., their interpretation is close across them, for example "physical") benefit the least, while domain-dependent relations (e.g., "part-of'') improve the most when encoding domain information.

How to Encode Domain Information in Relation Classification

TL;DR

This work tackles the challenge of domain-specific relation classification by exploring multi-domain training with explicit domain encoding. It introduces CrossRE 2.0 to balance data across six domains and compares three encoding strategies: dataset embeddings, special domain markers, and domain-aware entity types. The study finds that simple domain markers outperform other methods, yielding a Macro-F1 gain of over points on average, particularly improving domain-dependent relations, while many labels with stable cross-domain interpretations benefit less. Overall, the results demonstrate that domain-aware input conditioning can significantly enhance cross-domain RC performance and offer a practical path for leveraging heterogeneous data sources. The CrossRE 2.0 extension and analysis provide a valuable benchmark and insight into how domain signals influence RC across domains.

Abstract

Current language models require a lot of training data to obtain high performance. For Relation Classification (RC), many datasets are domain-specific, so combining datasets to obtain better performance is non-trivial. We explore a multi-domain training setup for RC, and attempt to improve performance by encoding domain information. Our proposed models improve > 2 Macro-F1 against the baseline setup, and our analysis reveals that not all the labels benefit the same: The classes which occupy a similar space across domains (i.e., their interpretation is close across them, for example "physical") benefit the least, while domain-dependent relations (e.g., "part-of'') improve the most when encoding domain information.
Paper Structure (16 sections, 2 figures, 2 tables)

This paper contains 16 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Domain Representation. PCA plot of the untrained embeddings of the instances in the development set, colored by domain.
  • Figure 2: Relation Representation. PCA plot of the trained embeddings of the most frequent relation labels in the development set, colored by relation labels and shaped by domain.