Table of Contents
Fetching ...

VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking

Rui Qiu, Yamei Tu, Po-Yin Yen, Han-Wei Shen

TL;DR

VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query that empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience.

Abstract

In the biomedical domain, visualizing the document embeddings of an extensive corpus has been widely used in information-seeking tasks. However, three key challenges with existing visualizations make it difficult for clinicians to find information efficiently. First, the document embeddings used in these visualizations are generated statically by pretrained language models, which cannot adapt to the user's evolving interest. Second, existing document visualization techniques cannot effectively display how the documents are relevant to users' interest, making it difficult for users to identify the most pertinent information. Third, existing embedding generation and visualization processes suffer from a lack of interpretability, making it difficult to understand, trust and use the result for decision-making. In this paper, we present a novel visual analytics pipeline for user driven document representation and iterative information seeking (VADIS). VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query. To effectively visualize these two pieces of information, we design a new document map that leverages a circular grid layout to display documents based on both their relevance to the query and the semantic similarity. Additionally, to improve the interpretability, we introduce a corpus-level attention visualization method to improve the user's understanding of the model focus and to enable the users to identify potential oversight. This visualization, in turn, empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience. We evaluated VADIS quantitatively and qualitatively on a real-world dataset of biomedical research papers to demonstrate its effectiveness.

VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking

TL;DR

VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query that empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience.

Abstract

In the biomedical domain, visualizing the document embeddings of an extensive corpus has been widely used in information-seeking tasks. However, three key challenges with existing visualizations make it difficult for clinicians to find information efficiently. First, the document embeddings used in these visualizations are generated statically by pretrained language models, which cannot adapt to the user's evolving interest. Second, existing document visualization techniques cannot effectively display how the documents are relevant to users' interest, making it difficult for users to identify the most pertinent information. Third, existing embedding generation and visualization processes suffer from a lack of interpretability, making it difficult to understand, trust and use the result for decision-making. In this paper, we present a novel visual analytics pipeline for user driven document representation and iterative information seeking (VADIS). VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query. To effectively visualize these two pieces of information, we design a new document map that leverages a circular grid layout to display documents based on both their relevance to the query and the semantic similarity. Additionally, to improve the interpretability, we introduce a corpus-level attention visualization method to improve the user's understanding of the model focus and to enable the users to identify potential oversight. This visualization, in turn, empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience. We evaluated VADIS quantitatively and qualitatively on a real-world dataset of biomedical research papers to demonstrate its effectiveness.

Paper Structure

This paper contains 29 sections, 10 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of traditional document map compared with document map for a specific perspective. $m$ and $n$ are two documents that study the same population, as shown in figure $b$, but with different treatments being tested, as shown in figure $c$.
  • Figure 2: The illustration of VADIS's iterative pipeline, which consists of the prompt-attention model and subsequent visualizations.
  • Figure 3: Relevance of documents in a corpus based on query "side-effect of methylphenidate for Children with ADHD."
  • Figure 4: The learning pipeline of the relevance-preserving mapping
  • Figure 5: Illustration of attention decomposition. Matrix $V$ is the attention matrix of $\text{documents } (d_i) \times \text{tokens } (w_j)$, $W$ is the weights matrix of documents $(d_i)$$\times$attention topics $(z_k)$, and $H$ is each attention topic's focus of attention topics $(z_k)$$\times$ tokens $(w_j)$.
  • ...and 5 more figures