Table of Contents
Fetching ...

SlopeSeeker: A Search Tool for Exploring a Dataset of Quantifiable Trends

Alexander Bendeck, Dennis Bromley, Vidya Setlur

TL;DR

The paper tackles the gap in natural-language interfaces for exploring time-series data by introducing a quantified semantic label dataset that maps slope/angle information to semantically meaningful trend terms. It then operationalizes this dataset in SlopeSeeker, a tool that supports NL queries over quantifiable trends via a semantic parser, a graph-based index, and a visual-saliency–aware ranking framework. The authors validate the approach through a design-focused evaluation with 12 participants, showing intuitive use and the ability to distinguish nuanced trend descriptors while acknowledging limitations in pragmatics and context integration. The work provides a publicly available dataset and a scalable search architecture, enabling broader research and potential extensions with LLMs, narratives, and domain-specific trend descriptions.

Abstract

Natural language and search interfaces intuitively facilitate data exploration and provide visualization responses to diverse analytical queries based on the underlying datasets. However, these interfaces often fail to interpret more complex analytical intents, such as discerning subtleties and quantifiable differences between terms like "bump" and "spike" in the context of COVID cases, for example. We address this gap by extending the capabilities of a data exploration search interface for interpreting semantic concepts in time series trends. We first create a comprehensive dataset of semantic concepts by mapping quantifiable univariate data trends such as slope and angle to crowdsourced, semantically meaningful trend labels. The dataset contains quantifiable properties that capture the slope-scalar effect of semantic modifiers like "sharply" and "gradually," as well as multi-line trends (e.g., "peak," "valley"). We demonstrate the utility of this dataset in SlopeSeeker, a tool that supports natural language querying of quantifiable trends, such as "show me stocks that tanked in 2010." The tool incorporates novel scoring and ranking techniques based on semantic relevance and visual prominence to present relevant trend chart responses containing these semantic trend concepts. In addition, SlopeSeeker provides a faceted search interface for users to navigate a semantic hierarchy of concepts from general trends (e.g., "increase") to more specific ones (e.g., "sharp increase"). A preliminary user evaluation of the tool demonstrates that the search interface supports greater expressivity of queries containing concepts that describe data trends. We identify potential future directions for leveraging our publicly available quantitative semantics dataset in other data domains and for novel visual analytics interfaces.

SlopeSeeker: A Search Tool for Exploring a Dataset of Quantifiable Trends

TL;DR

The paper tackles the gap in natural-language interfaces for exploring time-series data by introducing a quantified semantic label dataset that maps slope/angle information to semantically meaningful trend terms. It then operationalizes this dataset in SlopeSeeker, a tool that supports NL queries over quantifiable trends via a semantic parser, a graph-based index, and a visual-saliency–aware ranking framework. The authors validate the approach through a design-focused evaluation with 12 participants, showing intuitive use and the ability to distinguish nuanced trend descriptors while acknowledging limitations in pragmatics and context integration. The work provides a publicly available dataset and a scalable search architecture, enabling broader research and potential extensions with LLMs, narratives, and domain-specific trend descriptions.

Abstract

Natural language and search interfaces intuitively facilitate data exploration and provide visualization responses to diverse analytical queries based on the underlying datasets. However, these interfaces often fail to interpret more complex analytical intents, such as discerning subtleties and quantifiable differences between terms like "bump" and "spike" in the context of COVID cases, for example. We address this gap by extending the capabilities of a data exploration search interface for interpreting semantic concepts in time series trends. We first create a comprehensive dataset of semantic concepts by mapping quantifiable univariate data trends such as slope and angle to crowdsourced, semantically meaningful trend labels. The dataset contains quantifiable properties that capture the slope-scalar effect of semantic modifiers like "sharply" and "gradually," as well as multi-line trends (e.g., "peak," "valley"). We demonstrate the utility of this dataset in SlopeSeeker, a tool that supports natural language querying of quantifiable trends, such as "show me stocks that tanked in 2010." The tool incorporates novel scoring and ranking techniques based on semantic relevance and visual prominence to present relevant trend chart responses containing these semantic trend concepts. In addition, SlopeSeeker provides a faceted search interface for users to navigate a semantic hierarchy of concepts from general trends (e.g., "increase") to more specific ones (e.g., "sharp increase"). A preliminary user evaluation of the tool demonstrates that the search interface supports greater expressivity of queries containing concepts that describe data trends. We identify potential future directions for leveraging our publicly available quantitative semantics dataset in other data domains and for novel visual analytics interfaces.
Paper Structure (32 sections, 5 equations, 18 figures, 1 algorithm)

This paper contains 32 sections, 5 equations, 18 figures, 1 algorithm.

Figures (18)

  • Figure 1: The interface for the data collection web tool used in Experiment 1. (1) The participant is prompted with a word and asked to select all arrows that best visualize the word. Once complete, the participant can click the "Next" button to proceed to the next word. (2) The participant is shown $13$ arrows corresponding to an array of angles between -90° and 90°. Clicking anywhere inside an arrow's box applies the current word as a label to the clicked arrow. Note that the interface is similar in Experiment 2, except that in (1), participants are first shown an individual anchor word and then four compound labels, which can be assigned to arrows in turn.
  • Figure 2: Experiment 1: One-dimensional KDEs indicating probability density for each label over the range of -90° to 90°. Peak probability density was used to sort the labels from the most negative angle (steepest down) to the most positive angle (steepest up) from the top left ("tanking") to the bottom right ("booming"), respectively. Note that the distributions are not normal.
  • Figure 3: Experiment 2: One-dimensional KDEs indicating probability density for each label over the range of -90° to 90°. Peak probability density was used to sort the labels from the most negative angle to the most positive angle from the top left ("sharply collapsing") to the bottom right ("sharply booming"), respectively. Note that the distributions are not normal.
  • Figure 4: One-dimensional KDEs indicting the scalar range over which different label modifiers scaled the base angle of the label. The solid green line indicates the 1.0 line, and the labeled dotted blue line indicates the scalar value at the peak probability density. Notice that "slowly" and "gradually" have scalar values between $0$ and $1$ ($0.4$ and $0.6$, respectively), i.e., they reduce the steepness of a label's angle, while "quickly" and "sharply" have scalar values greater than $1$ ($1.3$ and $1.5$ respectively), i.e., they increase the steepness of a label's angle.
  • Figure 5: Screenshot of the user interface for Experiment 3. Participants dragged descriptive labels from the left onto shapes on the right. Top Inset: Shapes were generated by transforming two-segment angles: angles became more obtuse from top to bottom and were rotated from left to right. Non-monotonic shapes (shown in red) were removed and not shown to the user. Bottom Inset: User labels were recorded in a PostgreSQL database; a snapshot of illustrative data rows is shown.
  • ...and 13 more figures