Table of Contents
Fetching ...

Online Learning for Position-Aided Millimeter Wave Beam Training

Vutha Va, Takayuki Shimizu, Gaurav Bansal, Robert W. Heath

TL;DR

This paper uses a multi-armed bandit framework to develop the online learning algorithms for beam pair selection and refinement, which uses the upper confidence bound with a newly proposed risk-aware feature, while the beam refinement uses a modified optimistic optimization algorithm.

Abstract

Accurate beam alignment is essential for beam-based millimeter wave communications. Conventional beam sweeping solutions often have large overhead, which is unacceptable for mobile applications like vehicle-to-everything. Learning-based solutions that leverage sensor data like position to identify good beam directions are one approach to reduce the overhead. Most existing solutions, though, are supervised-learning where the training data is collected beforehand. In this paper, we use a multi-armed bandit framework to develop online learning algorithms for beam pair selection and refinement. The beam pair selection algorithm learns coarse beam directions in some predefined beam codebook, e.g., in discrete angles separated by the 3dB beamwidths. The beam refinement fine-tunes the identified directions to match the peak of the power angular spectrum at that position. The beam pair selection uses the upper confidence bound (UCB) with a newly proposed risk-aware feature, while the beam refinement uses a modified optimistic optimization algorithm. The proposed algorithms learn to recommend good beam pairs quickly. When using 16x16 arrays at both the transmitter and receiver, it can achieve on average 1dB gain over the exhaustive search (over 271x271 beam pairs) on the unrefined codebook within 100 time-steps with a training budget of only 30 beam pairs.

Online Learning for Position-Aided Millimeter Wave Beam Training

TL;DR

This paper uses a multi-armed bandit framework to develop the online learning algorithms for beam pair selection and refinement, which uses the upper confidence bound with a newly proposed risk-aware feature, while the beam refinement uses a modified optimistic optimization algorithm.

Abstract

Accurate beam alignment is essential for beam-based millimeter wave communications. Conventional beam sweeping solutions often have large overhead, which is unacceptable for mobile applications like vehicle-to-everything. Learning-based solutions that leverage sensor data like position to identify good beam directions are one approach to reduce the overhead. Most existing solutions, though, are supervised-learning where the training data is collected beforehand. In this paper, we use a multi-armed bandit framework to develop online learning algorithms for beam pair selection and refinement. The beam pair selection algorithm learns coarse beam directions in some predefined beam codebook, e.g., in discrete angles separated by the 3dB beamwidths. The beam refinement fine-tunes the identified directions to match the peak of the power angular spectrum at that position. The beam pair selection uses the upper confidence bound (UCB) with a newly proposed risk-aware feature, while the beam refinement uses a modified optimistic optimization algorithm. The proposed algorithms learn to recommend good beam pairs quickly. When using 16x16 arrays at both the transmitter and receiver, it can achieve on average 1dB gain over the exhaustive search (over 271x271 beam pairs) on the unrefined codebook within 100 time-steps with a training budget of only 30 beam pairs.

Paper Structure

This paper contains 25 sections, 3 theorems, 44 equations, 11 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Assuming that the ideal reward signal eq:true_reward_def is accessible, the expected regret at time $n$ of the greedy UCB algorithm is upper bounded by

Figures (11)

  • Figure 1: Beam patterns in our codebook for an $8\times8$ array. The array is assumed to face upward in the $+z$ direction. The codebook covers the directions in the $+z$ half-space (i.e., assuming no radiation in the backplane).
  • Figure 2: A snapshot of the ray-tracing simulation in an urban street. The street has two lanes, and two types of vehicles (cars and trucks) are simulated. The BS's antenna is at 7m and the MU's antenna is at 1.5m from the ground.
  • Figure 3: Illustration of the intuition of the proposed position-aided beam alignment. Consider a vehicle at position A. The geometry of the environment only allows two possible pointing directions: the LOS and the building-reflection path. If the system can learn from past beam measurement results at position A to identify the two beam directions, then beam training can be reduced to just train these two directions. In an actual setting there will be position error. In our proposed solution, we use location bin that allows position inaccuracy to be in the range of the bin size.
  • Figure 4: An illustration of position-aided beam alignment in the uplink. It consists of two phases. Phase 1 is for the training request where the MU position is sent to the BS. The BS uses the position and its learned database to determine a list of promising beam pairs $\mathcal{S}$. In Phase 2, the beam pairs in the list are trained, and a feedback indicating the best beam index is sent at the end. The database used for the beam pair selection is stored and maintained at the BS without any burden on the MU.
  • Figure 5: A flowchart of the two-layer online learning. The algorithm starts with a training request detection loop. When it detects a request, the algorithm decodes the user's position and input to the beam selection procedure, which then reads the learning parameters corresponding to the position and determines a subset of promising beam pairs. If beam refinement is enabled, the refinement parameters of those selected pairs are selected. The beam subset is then sent to the user and the subset of beam pairs are trained. The beam measurements are used to update the learning parameters and the algorithm returns to the training request detection loop.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Lemma 1