Table of Contents
Fetching ...

CineMPC: A Fully Autonomous Drone Cinematography System Incorporating Zoom, Focus, Pose, and Scene Composition

Pablo Pueyo, Juan Dendarieta, Eduardo Montijano, Ana C. Murillo, Mac Schwager

TL;DR

CineMPC tackles autonomous cinematography by jointly optimizing drone pose and camera intrinsics (focus, zoom, aperture) within a nonlinear Model Predictive Control loop guided by RGB-D perception. It introduces a perception-informed pipeline that tracks multiple dynamic targets and re-optimizes trajectories at each time step to satisfy artistic and technical objectives. The contributions include a four-term MPC cost for depth of field, image composition, camera–target relative pose, and intrinsics, a low-level controller for smooth execution, and a ROS-based modular implementation validated in photorealistic simulation and real flights with occlusion and collision handling. The work demonstrates rich cinematographic capabilities previously unattainable with extrinsics-only control and provides open-source code for community use.

Abstract

We present CineMPC, a complete cinematographic system that autonomously controls a drone to film multiple targets recording user-specified aesthetic objectives. Existing solutions in autonomous cinematography control only the camera extrinsics, namely its position, and orientation. In contrast, CineMPC is the first solution that includes the camera intrinsic parameters in the control loop, which are essential tools for controlling cinematographic effects like focus, depth-of-field, and zoom. The system estimates the relative poses between the targets and the camera from an RGB-D image and optimizes a trajectory for the extrinsic and intrinsic camera parameters to film the artistic and technical requirements specified by the user. The drone and the camera are controlled in a nonlinear Model Predicted Control (MPC) loop by re-optimizing the trajectory at each time step in response to current conditions in the scene. The perception system of CineMPC can track the targets' position and orientation despite the camera effects. Experiments in a photorealistic simulation and with a real platform demonstrate the capabilities of the system to achieve a full array of cinematographic effects that are not possible without the control of the intrinsics of the camera. Code for CineMPC is implemented following a modular architecture in ROS and released to the community.

CineMPC: A Fully Autonomous Drone Cinematography System Incorporating Zoom, Focus, Pose, and Scene Composition

TL;DR

CineMPC tackles autonomous cinematography by jointly optimizing drone pose and camera intrinsics (focus, zoom, aperture) within a nonlinear Model Predictive Control loop guided by RGB-D perception. It introduces a perception-informed pipeline that tracks multiple dynamic targets and re-optimizes trajectories at each time step to satisfy artistic and technical objectives. The contributions include a four-term MPC cost for depth of field, image composition, camera–target relative pose, and intrinsics, a low-level controller for smooth execution, and a ROS-based modular implementation validated in photorealistic simulation and real flights with occlusion and collision handling. The work demonstrates rich cinematographic capabilities previously unattainable with extrinsics-only control and provides open-source code for community use.

Abstract

We present CineMPC, a complete cinematographic system that autonomously controls a drone to film multiple targets recording user-specified aesthetic objectives. Existing solutions in autonomous cinematography control only the camera extrinsics, namely its position, and orientation. In contrast, CineMPC is the first solution that includes the camera intrinsic parameters in the control loop, which are essential tools for controlling cinematographic effects like focus, depth-of-field, and zoom. The system estimates the relative poses between the targets and the camera from an RGB-D image and optimizes a trajectory for the extrinsic and intrinsic camera parameters to film the artistic and technical requirements specified by the user. The drone and the camera are controlled in a nonlinear Model Predicted Control (MPC) loop by re-optimizing the trajectory at each time step in response to current conditions in the scene. The perception system of CineMPC can track the targets' position and orientation despite the camera effects. Experiments in a photorealistic simulation and with a real platform demonstrate the capabilities of the system to achieve a full array of cinematographic effects that are not possible without the control of the intrinsics of the camera. Code for CineMPC is implemented following a modular architecture in ROS and released to the community.
Paper Structure (54 sections, 30 equations, 17 figures, 10 tables)

This paper contains 54 sections, 30 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: CineMPC pipeline. (a) The drone holds a cinematographic camera, capturing footage of targets in a scene. (b) The perception module processes the recorded images to extract the targets' pose and calculates the error in comparison to user instructions. (c) Visual representation of user instructions, with the focused area highlighted in red, the blurry area in blue, and yellow lines depict the desired image position for the top and lower parts of the target. (d) The calculated error is the input to the control module, which determines the next $N$ steps for both the drone and the camera to minimize the error. (e) This process results in a new image acquired by the drone, restarting the loop for continuous refinement, producing Autonomous Cinematographic Filming.
  • Figure 2: CineMPC System Overview. A schematic summary of the platform, its modules, and their interactions. The cinematographic agents comprise the scene (containing the target(s)), the drone, the cine-camera, and the user providing instructions. The perception module utilizes camera images to extract the pose of targets, which are then fed to the control module. This module calculates the trajectory for the next $N$ steps for both the camera and drone, optimizing the cost function within an MPC framework. This trajectory is transmitted through a low-level controller, ensuring smooth recording.
  • Figure 3: Effect of intrinsics in the final image. The first row compares two aperture ($A$) values, affecting the portion of the scene shown in focus (depth of field). The left side, with a low f-stop, has a wider aperture and shallow depth of field. The right side, with a high f-stop, has a narrow aperture and a larger depth of field. The second row contrasts two focus distance ($F$ - distance from the camera to the center of the depth of field) values, focusing on a closer distance (left) and a further distance (right). The third row compares different focal length ($f$) values, affecting the zoom, field of view, and depth of field. The left side has a small focal length, providing a wide-angle view. The right side has a large focal length, resulting in a highly zoomed image. The camera maintains the same pose (same extrinsics) across all images.
  • Figure 4: Control module diagram. Components of the control module and their interaction. The module consists of the MPC framework, comprising the cost function, optimizer, and the low-level control submodule. The diagram incorporates inputs from other modules and the module's outputs.
  • Figure 5: Perception module diagram. Components of the perception module and their interaction. The module consists of depth and position measurements, along with the estimation of the targets' next poses. The diagram displays inputs from other modules and the outputs of this module.
  • ...and 12 more figures