Compressing Multi-Task Model for Autonomous Driving via Pruning and Knowledge Distillation
Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang, Katsuya Suto, Lei Zhong
TL;DR
This work tackles the deployment challenge of multi-task autonomous driving panoptic perception by introducing a two-stage compression framework: safe pruning driven by Taylor-based per-task channel importance and inter-task gradient conflict, followed by head-agnostic feature-level knowledge distillation from a teacher model. The pruning stage preserves channels essential for any task while mitigating interference, and the distillation stage transfers intermediary backbone/encoder representations to recover performance without relying on task-specific heads. On BDD100K, the approach achieves a 32.7% reduction in parameters with negligible segmentation drops and only modest detection degradation, running at 32.7 FPS in real time. Ablation studies demonstrate that combining TCI, GCP, and KD is crucial to balancing multi-task performance, validating the method as a practical path toward real-time, low-resource multi-task panoptic perception for on-board autonomous driving.
Abstract
Autonomous driving systems rely on panoptic perception to jointly handle object detection, drivable area segmentation, and lane line segmentation. Although multi-task learning is an effective way to integrate these tasks, its increasing model parameters and complexity make deployment on on-board devices difficult. To address this challenge, we propose a multi-task model compression framework that combines task-aware safe pruning with feature-level knowledge distillation. Our safe pruning strategy integrates Taylor-based channel importance with gradient conflict penalty to keep important channels while removing redundant and conflicting channels. To mitigate performance degradation after pruning, we further design a task head-agnostic distillation method that transfers intermediate backbone and encoder features from a teacher to a student model as guidance. Experiments on the BDD100K dataset demonstrate that our compressed model achieves a 32.7% reduction in parameters while segmentation performance shows negligible accuracy loss and only a minor decrease in detection (-1.2% for Recall and -1.8% for mAP50) compared to the teacher. The compressed model still runs at 32.7 FPS in real-time. These results show that combining pruning and knowledge distillation provides an effective compression solution for multi-task panoptic perception.
