Table of Contents
Fetching ...

A Tensor Low-Rank Approximation for Value Functions in Multi-Task Reinforcement Learning

Sergio Rozada, Santiago Paternain, Juan Andres Bazerque, Antonio G. Marques

TL;DR

This paper tackles data efficiency in reinforcement learning for physical environments by formulating multitask value function learning as a low-rank tensor problem. It represents the collection of task-specific Q-functions as a single Q-tensor $\mathbf{Q}$ with PARAFAC rank $K$, enabling a compact factorization into state, action, and task components. The authors propose the online S-TLR-Q algorithm, which performs stochastic block-coordinate updates of the factors $Q_1$, $Q_2$, and $Q_3$ using semi-gradients and an $\varepsilon$-greedy policy to handle the max operator, thereby learning all tasks jointly with reduced data. Empirical results on inverted pendulums and a wireless scheduling scenario show faster convergence and lower sample complexity than per-task learning and naive sharing, highlighting the method's practical value for data-limited multitask RL. Overall, the approach demonstrates that exploiting a low-rank Q-tensor structure can capture cross-task similarities and improve learning efficiency in real-world RL settings.

Abstract

In pursuit of reinforcement learning systems that could train in physical environments, we investigate multi-task approaches as a means to alleviate the need for massive data acquisition. In a tabular scenario where the Q-functions are collected across tasks, we model our learning problem as optimizing a higher order tensor structure. Recognizing that close-related tasks may require similar actions, our proposed method imposes a low-rank condition on this aggregated Q-tensor. The rationale behind this approach to multi-task learning is that the low-rank structure enforces the notion of similarity, without the need to explicitly prescribe which tasks are similar, but inferring this information from a reduced amount of data simultaneously with the stochastic optimization of the Q-tensor. The efficiency of our low-rank tensor approach to multi-task learning is demonstrated in two numerical experiments, first in a benchmark environment formed by a collection of inverted pendulums, and then into a practical scenario involving multiple wireless communication devices.

A Tensor Low-Rank Approximation for Value Functions in Multi-Task Reinforcement Learning

TL;DR

This paper tackles data efficiency in reinforcement learning for physical environments by formulating multitask value function learning as a low-rank tensor problem. It represents the collection of task-specific Q-functions as a single Q-tensor with PARAFAC rank , enabling a compact factorization into state, action, and task components. The authors propose the online S-TLR-Q algorithm, which performs stochastic block-coordinate updates of the factors , , and using semi-gradients and an -greedy policy to handle the max operator, thereby learning all tasks jointly with reduced data. Empirical results on inverted pendulums and a wireless scheduling scenario show faster convergence and lower sample complexity than per-task learning and naive sharing, highlighting the method's practical value for data-limited multitask RL. Overall, the approach demonstrates that exploiting a low-rank Q-tensor structure can capture cross-task similarities and improve learning efficiency in real-world RL settings.

Abstract

In pursuit of reinforcement learning systems that could train in physical environments, we investigate multi-task approaches as a means to alleviate the need for massive data acquisition. In a tabular scenario where the Q-functions are collected across tasks, we model our learning problem as optimizing a higher order tensor structure. Recognizing that close-related tasks may require similar actions, our proposed method imposes a low-rank condition on this aggregated Q-tensor. The rationale behind this approach to multi-task learning is that the low-rank structure enforces the notion of similarity, without the need to explicitly prescribe which tasks are similar, but inferring this information from a reduced amount of data simultaneously with the stochastic optimization of the Q-tensor. The efficiency of our low-rank tensor approach to multi-task learning is demonstrated in two numerical experiments, first in a benchmark environment formed by a collection of inverted pendulums, and then into a practical scenario involving multiple wireless communication devices.
Paper Structure (7 sections, 13 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 7 sections, 13 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Trajectories sampled using the estimated optimal policy learned with LR-Q for different scenarios. From left to right: the pendulum problem with $(g_1=0.01,d_1=1.0)$ (1st panel) and $(g_1=0.01,d_1=1.0)$ (2nd panel); and the wireless setup with $\alpha_1=1.0$, $b_1=0.5$, $p_{\alpha_1}=p_{b_1}=0.2$ (3rd panel) and $\alpha_4=2.0$, $b_4=3.0$, $p_{\alpha_4}=p_{b_4}=0.8$ (4th panel). While the optimal trajectories differ across the scenarios, the estimated policies show structural similarities.
  • Figure 2: Performance of S-TLR-Q in (a) the pendulum problem and (b) the wireless setup, measured in terms of average return over $100$ experiments. S-TLR-Q requires fewer samples to converge than LR-Q and achieves higher returns than C-LR-Q consistently across all tasks.