Table of Contents
Fetching ...

Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work

Calvin Bao, Yow-Ting Shiue, Marine Carpuat, Joel Chan

TL;DR

This work reframes cross-domain scholarly information seeking as translation work between domain-specific communities and tests a jargon-preserving, cross-domain embedding alignment approach. It develops a prototype cross-domain search engine that aligns domain-specific word embeddings using unsupervised methods (MUSE and VecMap) and validates the concept through two case studies with interdisciplinary researchers. Case Study 1 shows MUSE can yield novel yet relevant cross-domain mappings, while Case Study 2 finds VecMap mappings to be more reliable for qualitative exploration, with think-aloud assessments comparing against a GPT-4 baseline. The results suggest that separating domains and aligning them in a shared embedding space can reveal novel conceptual bridges, informing interface designs for cross-domain information seeking and guiding future multilingual NLP explorations in scholarly translation tasks.

Abstract

Scholars often explore literature outside of their home community of study. This exploration process is frequently hampered by field-specific jargon. Past computational work often focuses on supporting translation work by removing jargon through simplification and summarization; here, we explore a different approach that preserves jargon as useful bridges to new conceptual spaces. Specifically, we cast different scholarly domains as different language-using communities, and explore how to adapt techniques from unsupervised cross-lingual alignment of word embeddings to explore conceptual alignments between domain-specific word embedding spaces.We developed a prototype cross-domain search engine that uses aligned domain-specific embeddings to support conceptual exploration, and tested this prototype in two case studies. We discuss qualitative insights into the promises and pitfalls of this approach to translation work, and suggest design insights for future interfaces that provide computational support for cross-domain information seeking.

Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work

TL;DR

This work reframes cross-domain scholarly information seeking as translation work between domain-specific communities and tests a jargon-preserving, cross-domain embedding alignment approach. It develops a prototype cross-domain search engine that aligns domain-specific word embeddings using unsupervised methods (MUSE and VecMap) and validates the concept through two case studies with interdisciplinary researchers. Case Study 1 shows MUSE can yield novel yet relevant cross-domain mappings, while Case Study 2 finds VecMap mappings to be more reliable for qualitative exploration, with think-aloud assessments comparing against a GPT-4 baseline. The results suggest that separating domains and aligning them in a shared embedding space can reveal novel conceptual bridges, informing interface designs for cross-domain information seeking and guiding future multilingual NLP explorations in scholarly translation tasks.

Abstract

Scholars often explore literature outside of their home community of study. This exploration process is frequently hampered by field-specific jargon. Past computational work often focuses on supporting translation work by removing jargon through simplification and summarization; here, we explore a different approach that preserves jargon as useful bridges to new conceptual spaces. Specifically, we cast different scholarly domains as different language-using communities, and explore how to adapt techniques from unsupervised cross-lingual alignment of word embeddings to explore conceptual alignments between domain-specific word embedding spaces.We developed a prototype cross-domain search engine that uses aligned domain-specific embeddings to support conceptual exploration, and tested this prototype in two case studies. We discuss qualitative insights into the promises and pitfalls of this approach to translation work, and suggest design insights for future interfaces that provide computational support for cross-domain information seeking.

Paper Structure

This paper contains 43 sections, 1 equation, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Term search user interface for exploring potential home $\rightarrow$ target bridges
  • Figure 2: Overview of the pipeline from researcher-provided seed papers, through corpus development, mono-domain embedding training, and alignment to support retrieval across domains.
  • Figure 3: Visual example using a cross-lingual alignment method to align two research communities (Psychology and Management / Organization Science research on creativity and innovation). The intuition is to find a "rotation" that can align two domain-specific embedding spaces such that terms that are semantically similar across the two embedding spaces are close together in the aligned space. Here we show an example from our data, where the MUSE alignment mapped the term "examples" (from Psychology) to the conceptually related term "substitutes" (from Management) in the aligned space. Visual of rotating spaces adapted from Conneau2017WordTW.
  • Figure 4: Results and corresponding relevance and novelty ratings from each pipeline for the query: Psychology[stimulus] $\rightarrow$ Management. More terms in the MUSE pipeline were rated as having both higher relevance and novelty, compared to other pipelines, which were highly relevant but not as novel.
  • Figure 5: Results and corresponding relevance and novelty ratings from each retrieval pipeline for the query: Psychology[examples] $\rightarrow$ Management. Across retrieval systems, many of the terms were relevant but not very novel.
  • ...and 1 more figures