Table of Contents
Fetching ...

WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis

Burak Yetiştiren, Hong Jin Kang, Miryung Kim

TL;DR

WhyFlow addresses the gap in end-user sensemaking for taint analysis by offering an interrogative debugging interface that supports why/why-not/what-if questions about dataflows and third-party library models. It combines CodeQL-based taint analysis with Soufflé-backed logic queries to provide rapid speculative analysis and generates a graphical, color-coded global view of taint paths. In a within-subject study with 11 participants completing tasks after a drop-out, WhyFlow improved accuracy by about 21% and reduced NASA-TLX cognitive load substantially, while also increasing confidence and perceived usability compared to CodeQL's visualizer. The findings demonstrate WhyFlow’s potential to enhance end-user debugging, sensemaking, and trust in taint-analysis results, and the approach is designed to be extensible with templated queries and interactive visualization.

Abstract

Taint analysis is a security analysis technique used to track the flow of potentially dangerous data through an application and its dependent libraries. Investigating why certain unexpected flows appear and why expected flows are missing is an important sensemaking process during end-user taint analysis. Existing taint analysis tools often do not provide this end-user debugging capability, where developers can ask why, why-not, and what-if questions about dataflows and reason about the impact of configuring sources and sinks, and models of third-party libraries that abstract permissible and impermissible data flows. Furthermore, the tree-view or list-view used in existing taint analyzer visualizations makes it difficult to reason about the global impact on connectivity between multiple sources and sinks. Inspired by the insight that sensemaking tool-generated results can be significantly improved by a QA inquiry process, we propose WhyFlow, the first end-user question-answer style debugging interface for taint analysis. It enables a user to ask why, why-not, and what-if questions to investigate the existence of suspicious flows, the non-existence of expected flows, and the global impact of third-party library models. WhyFlow performs speculative what-if analysis, to help a user in debugging how different connectivity assumptions affect overall results. A user study with 12 participants shows that participants using WhyFlow achieved 21% higher accuracy on average, compared to CodeQL. They also reported a 45% reduction in mental demand (NASA-TLX) and rated higher confidence in identifying relevant flows using WhyFlow.

WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis

TL;DR

WhyFlow addresses the gap in end-user sensemaking for taint analysis by offering an interrogative debugging interface that supports why/why-not/what-if questions about dataflows and third-party library models. It combines CodeQL-based taint analysis with Soufflé-backed logic queries to provide rapid speculative analysis and generates a graphical, color-coded global view of taint paths. In a within-subject study with 11 participants completing tasks after a drop-out, WhyFlow improved accuracy by about 21% and reduced NASA-TLX cognitive load substantially, while also increasing confidence and perceived usability compared to CodeQL's visualizer. The findings demonstrate WhyFlow’s potential to enhance end-user debugging, sensemaking, and trust in taint-analysis results, and the approach is designed to be extensible with templated queries and interactive visualization.

Abstract

Taint analysis is a security analysis technique used to track the flow of potentially dangerous data through an application and its dependent libraries. Investigating why certain unexpected flows appear and why expected flows are missing is an important sensemaking process during end-user taint analysis. Existing taint analysis tools often do not provide this end-user debugging capability, where developers can ask why, why-not, and what-if questions about dataflows and reason about the impact of configuring sources and sinks, and models of third-party libraries that abstract permissible and impermissible data flows. Furthermore, the tree-view or list-view used in existing taint analyzer visualizations makes it difficult to reason about the global impact on connectivity between multiple sources and sinks. Inspired by the insight that sensemaking tool-generated results can be significantly improved by a QA inquiry process, we propose WhyFlow, the first end-user question-answer style debugging interface for taint analysis. It enables a user to ask why, why-not, and what-if questions to investigate the existence of suspicious flows, the non-existence of expected flows, and the global impact of third-party library models. WhyFlow performs speculative what-if analysis, to help a user in debugging how different connectivity assumptions affect overall results. A user study with 12 participants shows that participants using WhyFlow achieved 21% higher accuracy on average, compared to CodeQL. They also reported a 45% reduction in mental demand (NASA-TLX) and rated higher confidence in identifying relevant flows using WhyFlow.

Paper Structure

This paper contains 43 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: WhyFlow: "Why is there a taint flow from a source to a sink?"
  • Figure 2: WhyNotFlow: "Why is there no taint flow from a source to a sink?"
  • Figure 3: AffectedSinks: "If we alter a third-party library's model, which sinks are affected?
  • Figure 4: DivergentSinks: "Which third-party library model could influence multiple taint flows from the same source?"
  • Figure 5: GlobalImpact: "Which third-party library model could have the largest global influence on dataflows from a source to a sink?"
  • ...and 4 more figures