MetaphorShare: A Dynamic Collaborative Repository of Open Metaphor Datasets
Joanne Boisson, Arif Mehmood, Jose Camacho-Collados
TL;DR
MetaphorShare addresses fragmentation in metaphor research by offering an open, unified repository that unifies diverse metaphor datasets under a minimal CSV-based format with a robust tagging scheme. The platform provides four core functions—upload, download, search, and online labeling—paired with an Elasticsearch-backed search and a validation workflow to ensure data quality and interoperability. It demonstrates practical value through a cross-dataset evaluation using RoBERTa, illustrating how researchers can fine-tune models on specific datasets and generalize across resources. The work aims to foster interdisciplinary collaboration, expand multilingual coverage, and enable automatic or semi-automatic labeling of metaphors to accelerate NLP metaphor processing research.
Abstract
The metaphor studies community has developed numerous valuable labelled corpora in various languages over the years. Many of these resources are not only unknown to the NLP community, but are also often not easily shared among the researchers. Both in human sciences and in NLP, researchers could benefit from a centralised database of labelled resources, easily accessible and unified under an identical format. To facilitate this, we present MetaphorShare, a website to integrate metaphor datasets making them open and accessible. With this effort, our aim is to encourage researchers to share and upload more datasets in any language in order to facilitate metaphor studies and the development of future metaphor processing NLP systems. The website has four main functionalities: upload, download, search and label metaphor datasets. It is accessible at www.metaphorshare.com.
