Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

Zhongguo Li; Wen-Hua Chen; Jun Yang; Yunda Yan

Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

Zhongguo Li, Wen-Hua Chen, Jun Yang, Yunda Yan

TL;DR

This work tackles auto-optimisation under unknown environmental parameters by introducing a dual control framework of exploration and exploitation (DCEE) with an ensemble-based active learning engine. The approach jointly learns environment parameters and tracks the unknown optimal operation through a reward model $J(\theta,y)=\phi(y)^T\theta$, yielding dual effects that balance identification and performance. The authors provide convergence analyses for both single-integrator and general linear systems, and demonstrate the method via a numerical example and a photovoltaic MPPT application, where DCEE achieves high efficiency and robust adaptation without external perturbations. The results suggest substantial practical potential for real-time autonomous control in uncertain settings beyond MPPT, including nonlinear extensions and broader engineering domains.

Abstract

The quest for optimal operation in environments with unknowns and uncertainties is highly desirable but critically challenging across numerous fields. This paper develops a dual control framework for exploration and exploitation (DCEE) to solve an auto-optimisation problem in such complex settings. In general, there is a fundamental conflict between tracking an unknown optimal operational condition and parameter identification. The DCEE framework stands out by eliminating the need for additional perturbation signals, a common requirement in existing adaptive control methods. Instead, it inherently incorporates an exploration mechanism, actively probing the uncertain environment to diminish belief uncertainty. An ensemble based multi-estimator approach is developed to learn the environmental parameters and in the meanwhile quantify the estimation uncertainty in real time. The control action is devised with dual effects, which not only minimises the tracking error between the current state and the believed unknown optimal operational condition but also reduces belief uncertainty by proactively exploring the environment. Formal properties of the proposed DCEE framework like convergence are established. A numerical example is used to validate the effectiveness of the proposed DCEE. Simulation results for maximum power point tracking are provided to further demonstrate the potential of this new framework in real world applications.

Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

TL;DR

, yielding dual effects that balance identification and performance. The authors provide convergence analyses for both single-integrator and general linear systems, and demonstrate the method via a numerical example and a photovoltaic MPPT application, where DCEE achieves high efficiency and robust adaptation without external perturbations. The results suggest substantial practical potential for real-time autonomous control in uncertain settings beyond MPPT, including nonlinear extensions and broader engineering domains.

Abstract

Paper Structure (11 sections, 71 equations, 7 figures, 1 table)

This paper contains 11 sections, 71 equations, 7 figures, 1 table.

Introduction
Problem Statement
Dual Control Reformulation
Ensemble based Active Learning
DCEE for Single Integrator
Algorithm Development
Convergence Analysis
DCEE for Linear Systems
Numerical Example
Application for MPPT
Conclusion

Figures (7)

Figure 1: Simulation results with environment uncertainties using active learning based DCEE.
Figure 2: Simulation results with environment uncertainties using passive learning based auto-optimisation approach.
Figure 3: Time-varying solar irradiance profile.
Figure 4: Power profile using different algorithms.
Figure 5: Voltage profile using different algorithms.
...and 2 more figures

Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

TL;DR

Abstract

Dual Control of Exploration and Exploitation for Auto-Optimisation Control with Active Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)