Table of Contents
Fetching ...

Understanding Documentation Use Through Log Analysis: An Exploratory Case Study of Four Cloud Services

Daye Nam, Andrew Macvean, Brad Myers, Bogdan Vasilescu

TL;DR

This study demonstrates that documentation page-view logs can be mined at scale to reveal meaningful patterns in how developers use API documentation. Using a two-phase approach on four Google services, the authors first cluster usage patterns from monthly dwell-time vectors across 11 page types, then test hypotheses linking user experience, product type, and intent to both documentation usage and subsequent API adoption via logistic regression. Key findings show that experience and product context shape which documentation genres are consulted, and that engagement with guide documentation strongly predicts future API use, with variations across products. The authors argue for the feasibility of log-based documentation reviews and outline practical design recommendations and a longer-term personalization vision to improve developer onboarding and information foraging.

Abstract

Almost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries or software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an exploratory, multi-phase, mixed methods empirical study of documentation page-view logs from four cloud-based industrial services. By analyzing page-view logs for over 100,000 users, we find diverse patterns of documentation page visits. Moreover, we show statistically that which documentation pages people visit often correlates with user characteristics such as past experience with the specific product, on the one hand, and with future adoption of the API on the other hand. We discuss the implications of these results on documentation design and propose documentation page-view log analysis as a feasible technique for design audits of documentation, from ones written for software developers to ones designed to support end users (e.g., Adobe Photoshop).

Understanding Documentation Use Through Log Analysis: An Exploratory Case Study of Four Cloud Services

TL;DR

This study demonstrates that documentation page-view logs can be mined at scale to reveal meaningful patterns in how developers use API documentation. Using a two-phase approach on four Google services, the authors first cluster usage patterns from monthly dwell-time vectors across 11 page types, then test hypotheses linking user experience, product type, and intent to both documentation usage and subsequent API adoption via logistic regression. Key findings show that experience and product context shape which documentation genres are consulted, and that engagement with guide documentation strongly predicts future API use, with variations across products. The authors argue for the feasibility of log-based documentation reviews and outline practical design recommendations and a longer-term personalization vision to improve developer onboarding and information foraging.

Abstract

Almost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries or software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an exploratory, multi-phase, mixed methods empirical study of documentation page-view logs from four cloud-based industrial services. By analyzing page-view logs for over 100,000 users, we find diverse patterns of documentation page visits. Moreover, we show statistically that which documentation pages people visit often correlates with user characteristics such as past experience with the specific product, on the one hand, and with future adoption of the API on the other hand. We discuss the implications of these results on documentation design and propose documentation page-view log analysis as a feasible technique for design audits of documentation, from ones written for software developers to ones designed to support end users (e.g., Adobe Photoshop).
Paper Structure (25 sections, 2 equations, 15 figures, 3 tables)

This paper contains 25 sections, 2 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Overview of our data collection and analysis.
  • Figure 2: The heatmap of centroids of the 320 clusters (left), and a subset of them highlighted (right). Each row represents the documentation usage of each cluster (see Table \ref{['tab:type']} for the documentation type codes). The color indicates the dwell time in minutes, with the intensity encoded in $e^n$ of time. The average total counts (# of documentation pages visited in May) and the average total dwell time (sum of dwell time on 11 documentation types) are also shown for the selected clusters (right) to help with interpretation, and the rows are sorted by the average total dwell time. For example, users of Cluster 18 (2nd row from the selected clusters) spent 3.28 minutes on average on the product documentation among 2.27 page visits on average, and spent $\approx e^1=2.7$ minutes on Concept type documentation.
  • Figure 3: Highlights of the clustering analysis. Each polar plot displays the average time spent on each type of documentation (see Table \ref{['tab:type']} for the documentation type codes). The small polar plots show the average dwell time in the previous three months. Note that the ranges of the axes of the plots vary. Bar charts below the polar plots show the proportions (%) of each group in the cluster. For example, the charts of Cluster 21 can be interpreted as "In cluster 21, users without platform and product experience predominantly used Tutorial documentation ($\approx$ 6 minutes) of P2 (81.1%) and P1 (18.9%), mostly for clarification purposes, without subsequent API requests."
  • Figure 4: Top: Estimated odds ratios from the regression modeling $\text{dwell time} > 0$ for our four documentation genres. For example, the odds of accessing Dev type documentation (pink) are 1.01 times higher among users with one extra year of platform experience. Bottom: Estimated odds ratios from the regression modeling $\text{subsequent requests} > 0$. Variables without statistically significant coefficients (adjusted $p \geq 0.01$) are omitted.
  • Figure A1: Distribution of the log-transformed total dwell time (in minutes) on documentation.
  • ...and 10 more figures