TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

Rong Fu; Yibo Meng; Guangzhen Yao; Jiaxuan Lu; Zeyu Zhang; Zhaolu Kang; Ziming Guo; Jia Yee Tan; Xiaojing Du; Simon James Fong

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

Rong Fu, Yibo Meng, Guangzhen Yao, Jiaxuan Lu, Zeyu Zhang, Zhaolu Kang, Ziming Guo, Jia Yee Tan, Xiaojing Du, Simon James Fong

TL;DR

TempoNet is presented, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation, establishing a practical framework for Transformer-based decision making in high-throughput real-time scheduling.

Abstract

Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

TL;DR

Abstract

Paper Structure (157 sections, 10 theorems, 99 equations, 12 figures, 20 tables, 1 algorithm)

This paper contains 157 sections, 10 theorems, 99 equations, 12 figures, 20 tables, 1 algorithm.

Introduction
Related Work
Classical real-time scheduling
Learning-based and RL schedulers
Transformer-based RL and explicit comparisons
Transformers, sparse attention and efficient architectures
Where TempoNet stands
Methodology
Problem formulation
Urgency Tokenizer (UT): a pluggable learnable quantization layer
Unified algorithm: UT-enabled training and online decision
Encoder, attention and positional strategy
Action-value projection and multicore mapping
Learning objective and optimization
Interpretability diagnostics
...and 142 more sections

Key Result

Theorem A.1

Given a task distribution $\mathcal{D}$ and a target scheduling function that is $L$-Lipschitz with respect to slack, there exists a non-zero lower bound on the expected miss rate gap between continuous and quantized architectures: where $\mathcal{M}(\pi)$ denotes the miss rate of policy $\pi$, $L$ is the Lipschitz continuity constant of the optimal scheduling manifold, and $\Delta = S_{\max}/Q$

Figures (12)

Figure 1: Overview of the TempoNet architecture for adaptive deadline-centric real-time dispatching. The pipeline initiates with the Urgency Tokenizer (UT), which transforms continuous per-job slack $s_i(t)$ into a discrete vocabulary via Slack Quantization (clip and floor) and retrieves learned Urgency Tokens$\mathbf{x}_i(t)$ from an embedding matrix $\mathbf{E}$. These tokens are gathered into a Token Assembly matrix $\mathbf{X}(t)$, maintaining permutation invariance. At the core, a Transformer Encoder stacks $L$ blocks of Multi-Head Attention and Position-wise Feed-Forward Networks to generate contextualized task representations $\mathbf{H}^{(L)}$. The Q-Value Projection layer maps these representations to per-token Q-scores $\mathbf{q}(t)$, which are then passed through a Multicore Mapping module that utilizes an Iterative Masked-Greedy or bipartite matching strategy to determine the final action $a_t$. The framework is optimized via a Deep Q-Learning loop, where experiences are stored in a Replay Buffer$\mathcal{D}$ to update the primary network $Q_\theta$ against a soft-updated Target Network$Q_{\theta^-}$.
Figure 2: Attention-Criticality Correlation Analysis
Figure 3: Attention Focus Distribution Across Tasks heatmap
Figure 4: Computational Time Scaling with System Size
Figure 5: Entropy Distribution Across Transformer Layers
...and 7 more figures

Theorems & Definitions (17)

Theorem A.1
proof
Theorem C.1
proof
Lemma D.1
proof
Theorem D.2
proof
Lemma G.1: Regret-to-Bellman residual decomposition
Lemma G.2: Approximation bias from slack quantization
...and 7 more

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

TL;DR

Abstract

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (17)