Table of Contents
Fetching ...

Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operation Orchestration

Yanbin Zhang, Hanhui Ye, Yue Bai, Qiming Zhang, Liao Xiang, Wu Mianzhi, Renjun Hu

TL;DR

AutoDW tackles the challenge of automating long-horizon, interdependent document workflows by combining stepwise planning with adaptive rollback. The framework incrementally selects and verifies API actions conditioned on evolving document state, reducing error propagation. Using the DWBench benchmark, AutoDW achieves state-of-the-art instruction- and session-level completion rates and demonstrates robustness across LLM backbones and increasing task difficulty. A single-round dual-level rollback provides most performance gains with reasonable cost, and the work offers open-source code and a direction for broader, production-grade document automation.

Abstract

Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API candidates, and the evolving states of the document. It further employs robust rollback mechanisms at both the argument and API levels, enabling dynamic correction and fault tolerance. These designs together ensure that the execution trajectory of AutoDW remains aligned with user intent and document context across long-horizon workflows. To assess its effectiveness, we construct a comprehensive benchmark of 250 sessions and 1,708 human-annotated instructions, reflecting realistic document processing scenarios with interdependent instructions. AutoDW achieves 90% and 62% completion rates on instruction- and session-level tasks, respectively, outperforming strong baselines by 40% and 76%. Moreover, AutoDW also remains robust for the decision of backbone LLMs and on tasks with varying difficulty. Code and data will be open-sourced. Code: https://github.com/YJett/AutoDW

Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operation Orchestration

TL;DR

AutoDW tackles the challenge of automating long-horizon, interdependent document workflows by combining stepwise planning with adaptive rollback. The framework incrementally selects and verifies API actions conditioned on evolving document state, reducing error propagation. Using the DWBench benchmark, AutoDW achieves state-of-the-art instruction- and session-level completion rates and demonstrates robustness across LLM backbones and increasing task difficulty. A single-round dual-level rollback provides most performance gains with reasonable cost, and the work offers open-source code and a direction for broader, production-grade document automation.

Abstract

Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API candidates, and the evolving states of the document. It further employs robust rollback mechanisms at both the argument and API levels, enabling dynamic correction and fault tolerance. These designs together ensure that the execution trajectory of AutoDW remains aligned with user intent and document context across long-horizon workflows. To assess its effectiveness, we construct a comprehensive benchmark of 250 sessions and 1,708 human-annotated instructions, reflecting realistic document processing scenarios with interdependent instructions. AutoDW achieves 90% and 62% completion rates on instruction- and session-level tasks, respectively, outperforming strong baselines by 40% and 76%. Moreover, AutoDW also remains robust for the decision of backbone LLMs and on tasks with varying difficulty. Code and data will be open-sourced. Code: https://github.com/YJett/AutoDW

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the AutoDW framework, which comprises three core modules: stepwise planning, API execution & state tracking, and adaptive rollback. The overview also includes an illustrative example showing how AutoDW selects one API call at a time and corrects its mistakes (i.e., APIs A and A') through rollback.
  • Figure 2: Distributional statistics of DWBench, which includes 250 sessions and 1,708 instructions. Left: Number of instructions per session (range: 4--8, mean=6.8), with a peak at 8 (89 sessions, 35.6%). Middle: Number of API calls per session (range: 15--75, mean=34.8), peak at 32 (78 sessions, 31.2%). Right: Number of API calls per instruction (range: 2--22, mean=5.1), while most instructions require 2-–4 API calls, complex instructions with $ge 10$ calls account for 14.8%.
  • Figure 3: Accuracy and token usage of different rollback strategies.