Table of Contents
Fetching ...

Hybrid Feedback-Guided Optimal Learning for Wireless Interactive Panoramic Scene Delivery

Xiaoyi Wu, Juaren Steiger, Bin Li, R. Srikant

TL;DR

This work addresses efficient wireless panoramic scene delivery by formulating viewport-portion selection as online learning under a novel 2/F/B hybrid feedback, where prediction outcomes provide full-information and transmission outcomes provide bandit feedback. The authors propose AdaPort, which combines empirical prediction-rate estimates with Thompson Sampling for transmission-rate estimates, and prove a 2/F/B regret lower bound that is smaller than that of 2/B/B and 1/B settings. AdaPort achieves an asymptotically matching upper bound, establishing asymptotic optimality, and real-world trace experiments show AdaPort consistently outperforms state-of-the-art baselines. The results indicate that incorporating full-information feedback on predictions significantly enhances learning efficiency in edge-assisted panoramic delivery with practical, trace-driven wireless channels.

Abstract

Immersive applications such as virtual and augmented reality impose stringent requirements on frame rate, latency, and synchronization between physical and virtual environments. To meet these requirements, an edge server must render panoramic content, predict user head motion, and transmit a portion of the scene that is large enough to cover the user viewport while remaining within wireless bandwidth constraints. Each portion produces two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether the corresponding packets are successfully delivered. Prior work models this problem as a multi-armed bandit with two-level bandit feedback, but fails to exploit the fact that prediction feedback can be retrospectively computed for all candidate portions once the user head pose is observed. As a result, prediction feedback constitutes full-information feedback rather than bandit feedback. Motivated by this observation, we introduce a two-level hybrid feedback model that combines full-information and bandit feedback, and formulate the portion selection problem as an online learning task under this setting. We derive an instance-dependent regret lower bound for the hybrid feedback model and propose AdaPort, a hybrid learning algorithm that leverages both feedback types to improve learning efficiency. We further establish an instance-dependent regret upper bound that matches the lower bound asymptotically, and demonstrate through real-world trace driven simulations that AdaPort consistently outperforms state-of-the-art baseline methods.

Hybrid Feedback-Guided Optimal Learning for Wireless Interactive Panoramic Scene Delivery

TL;DR

This work addresses efficient wireless panoramic scene delivery by formulating viewport-portion selection as online learning under a novel 2/F/B hybrid feedback, where prediction outcomes provide full-information and transmission outcomes provide bandit feedback. The authors propose AdaPort, which combines empirical prediction-rate estimates with Thompson Sampling for transmission-rate estimates, and prove a 2/F/B regret lower bound that is smaller than that of 2/B/B and 1/B settings. AdaPort achieves an asymptotically matching upper bound, establishing asymptotic optimality, and real-world trace experiments show AdaPort consistently outperforms state-of-the-art baselines. The results indicate that incorporating full-information feedback on predictions significantly enhances learning efficiency in edge-assisted panoramic delivery with practical, trace-driven wireless channels.

Abstract

Immersive applications such as virtual and augmented reality impose stringent requirements on frame rate, latency, and synchronization between physical and virtual environments. To meet these requirements, an edge server must render panoramic content, predict user head motion, and transmit a portion of the scene that is large enough to cover the user viewport while remaining within wireless bandwidth constraints. Each portion produces two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether the corresponding packets are successfully delivered. Prior work models this problem as a multi-armed bandit with two-level bandit feedback, but fails to exploit the fact that prediction feedback can be retrospectively computed for all candidate portions once the user head pose is observed. As a result, prediction feedback constitutes full-information feedback rather than bandit feedback. Motivated by this observation, we introduce a two-level hybrid feedback model that combines full-information and bandit feedback, and formulate the portion selection problem as an online learning task under this setting. We derive an instance-dependent regret lower bound for the hybrid feedback model and propose AdaPort, a hybrid learning algorithm that leverages both feedback types to improve learning efficiency. We further establish an instance-dependent regret upper bound that matches the lower bound asymptotically, and demonstrate through real-world trace driven simulations that AdaPort consistently outperforms state-of-the-art baseline methods.
Paper Structure (28 sections, 6 theorems, 65 equations, 5 figures, 1 algorithm)

This paper contains 28 sections, 6 theorems, 65 equations, 5 figures, 1 algorithm.

Key Result

Theorem 1

(Lower bound) Consider an online learning algorithm under 2/F/B feedback that achieves $R(T) = o(T^\delta) \quad \forall\, \delta > 0$. Then the algorithm is subject to the following regret lower bound:

Figures (5)

  • Figure 1: Relationship between the user's viewport, the predicted viewport, and delivery portions.
  • Figure 2: Lower bound constant comparison for two arms.
  • Figure 3: The range of $\epsilon_3$.
  • Figure 4: Trace-based simulations: AdaPort compared against Thompson sampling under 2/B/B and 1/B feedback. (a) and (b) show the relative throughput degradation compared to the optimal policy using fixed sending rates at 100 Mbps and 150 Mbps, respectively.
  • Figure 5: Trace-based simulations: AdaPort compared against Thompson sampling and EXP3 under 1/B feedback. (a) and (b) show the relative throughput degradation compared to the optimal policy using fixed sending rates at 100 Mbps and 150 Mbps, respectively.

Theorems & Definitions (16)

  • Theorem 1
  • proof
  • Remark 1: lower bound constant comparison
  • Theorem 2
  • proof
  • Definition 1
  • Lemma 1
  • Remark 2
  • Lemma 2
  • proof
  • ...and 6 more