Table of Contents
Fetching ...

Deep Reinforcement Learning for Radiative Heat Transfer Optimization Problems

Eva Ortiz-Mansilla, Juan José García-Esteban, Jorge Bravo-Abad, Juan Carlos Cuevas

TL;DR

This work shows that reinforcement learning can be effectively applied to optimization problems in radiative heat transfer, using near-field transfer between multilayer hyperbolic metamaterials as a test case. By formulating layer configurations as sequential decisions, the authors compare a suite of RL algorithms (including SARSA, Double DQN, REINFORCE, A2C, and PPO) and demonstrate that Double DQN offers the best sample efficiency while PPO delivers robust performance with fewer explored states. The results reveal that RL can surpass physically intuitive baselines, achieving up to ~21% higher HTC for 16-layer stacks and scalable gains to 24-layer configurations, thereby providing a practical toolkit for optimization and inverse design in radiative heat transfer. The authors also provide public code to facilitate applying these RL methods to similar thermal-radiation problems and outline guidance for selecting algorithms based on problem characteristics.

Abstract

Reinforcement learning is a subfield of machine learning that is having a huge impact in the different conventional disciplines, including physical sciences. Here, we show how reinforcement learning methods can be applied to solve optimization problems in the context of radiative heat transfer. We illustrate their use with the optimization of the near-field radiative heat transfer between multilayer hyperbolic metamaterials. Specifically, we show how this problem can be formulated in the language of reinforcement learning and tackled with a variety of algorithms. We show that these algorithms allow us to find solutions that outperform those obtained using physical intuition. Overall, our work shows the power and potential of reinforcement learning methods for the investigation of a wide variety of problems in the context of radiative heat transfer and related topics.

Deep Reinforcement Learning for Radiative Heat Transfer Optimization Problems

TL;DR

This work shows that reinforcement learning can be effectively applied to optimization problems in radiative heat transfer, using near-field transfer between multilayer hyperbolic metamaterials as a test case. By formulating layer configurations as sequential decisions, the authors compare a suite of RL algorithms (including SARSA, Double DQN, REINFORCE, A2C, and PPO) and demonstrate that Double DQN offers the best sample efficiency while PPO delivers robust performance with fewer explored states. The results reveal that RL can surpass physically intuitive baselines, achieving up to ~21% higher HTC for 16-layer stacks and scalable gains to 24-layer configurations, thereby providing a practical toolkit for optimization and inverse design in radiative heat transfer. The authors also provide public code to facilitate applying these RL methods to similar thermal-radiation problems and outline guidance for selecting algorithms based on problem characteristics.

Abstract

Reinforcement learning is a subfield of machine learning that is having a huge impact in the different conventional disciplines, including physical sciences. Here, we show how reinforcement learning methods can be applied to solve optimization problems in the context of radiative heat transfer. We illustrate their use with the optimization of the near-field radiative heat transfer between multilayer hyperbolic metamaterials. Specifically, we show how this problem can be formulated in the language of reinforcement learning and tackled with a variety of algorithms. We show that these algorithms allow us to find solutions that outperform those obtained using physical intuition. Overall, our work shows the power and potential of reinforcement learning methods for the investigation of a wide variety of problems in the context of radiative heat transfer and related topics.
Paper Structure (14 sections, 24 equations, 9 figures, 6 tables, 5 algorithms)

This paper contains 14 sections, 24 equations, 9 figures, 6 tables, 5 algorithms.

Figures (9)

  • Figure 1: The reinforcement learning control loop diagram.
  • Figure 2: (a) Schematic representation of the physical system under study. It features two identical hyperbolic metamaterials comprising alternating metallic (grey) and dielectric (blue) layers. Both reservoirs have infinitely-extended layers and are separated by a distance $d_0 = 10$ nm. Each layer has a thickness of 5 nm and both subsystems are backed by a metallic substrate. (b) Transmission of evanescent waves as a function of the frequency ($\omega$) and the parallel wavevector ($k$) for the periodic structure of panel (a) composed by 16 active layers per subsystem. (c) The corresponding spectral heat transfer coefficient $h_{\omega}$ at room temperature ($T=300$ K) as a function of the frequency, baseline in legend. The result is compared to that of two metallic plates (bulk) with the same gap.
  • Figure 3: Training of SARSA algorithm for our physical problem of interest. (a) Largest HTC discovered as a function of the number of found states in the problem with 16 layers obtained with SARSA algorithm. We also present the results obtained with the random algorithm. (b) The evolution of the corresponding loss curve of SARSA algorithm. (c) Return obtained in a simulation of an episode with the $Q$-network of SARSA algorithm at each training step. The dashed line corresponds to the value of $\varepsilon$ (right scale). In all panels the solid lines correspond to the mean value and the shaded areas to the standard deviations, as obtained in 40 independent runs for SARSA algorithm.
  • Figure 4: Training of the Double DQN algorithm. (a) Largest HTC discovered as a function of the number of found states in the problem with 16 layers obtained with the Double DQN algorithm. We also present the results obtained with the random algorithm. (b) Evolution of the corresponding loss curve of the Double DQN algorithm. (c) Return obtained in a simulation of an episode with the $Q$-network of Double DQN algorithm at each training step. The dashed line corresponds to the value of $\varepsilon$ (right scale). The vertical line and the green shaded area correspond to the training steps at which the highest HTC of the runs are found. In all panels the solid lines correspond to the mean value and the shaded areas to the standard deviations, as obtained in 40 independent runs. In all cases, 4 experiences were stored per training step.
  • Figure 5: Same as in Fig. \ref{['fig-double DQN']} but with 25 experiences stored per training step.
  • ...and 4 more figures