Table of Contents
Fetching ...

TwinMixing: A Shuffle-Aware Feature Interaction Model for Multi-Task Segmentation

Minh-Khoi Do, Huy Che, Dinh-Duy Phan, Duc-Khai Lam, Duc-Lung Vu

Abstract

Accurate and efficient perception is essential for autonomous driving, where segmentation tasks such as drivable-area and lane segmentation provide critical cues for motion planning and control. However, achieving high segmentation accuracy while maintaining real-time performance on low-cost hardware remains a challenging problem. To address this issue, we introduce TwinMixing, a lightweight multi-task segmentation model designed explicitly for drivable-area and lane segmentation. The proposed network features a shared encoder and task-specific decoders, enabling both feature sharing and task specialization. Within the encoder, we propose an Efficient Pyramid Mixing (EPM) module that enhances multi-scale feature extraction through a combination of grouped convolutions, depthwise dilated convolutions and channel shuffle operations, effectively expanding the receptive field while minimizing computational cost. Each decoder adopts a Dual-Branch Upsampling (DBU) Block composed of a learnable transposed convolution-based Fine detailed branch and a parameter-free bilinear interpolation-based Coarse grained branch, achieving detailed yet spatially consistent feature reconstruction. Extensive experiments on the BDD100K dataset validate the effectiveness of TwinMixing across three configurations - tiny, base, and large. Among them, the base configuration achieves the best trade-off between accuracy and computational efficiency, reaching 92.0% mIoU for drivable-area segmentation and 32.3% IoU for lane segmentation with only 0.43M parameters and 3.95 GFLOPs. Moreover, TwinMixing consistently outperforms existing segmentation models on the same tasks, as illustrated in Fig. 1. Thanks to its compact and modular design, TwinMixing demonstrates strong potential for real-time deployment in autonomous driving and embedded perception systems. The source code: https://github.com/Jun0se7en/TwinMixing.

TwinMixing: A Shuffle-Aware Feature Interaction Model for Multi-Task Segmentation

Abstract

Accurate and efficient perception is essential for autonomous driving, where segmentation tasks such as drivable-area and lane segmentation provide critical cues for motion planning and control. However, achieving high segmentation accuracy while maintaining real-time performance on low-cost hardware remains a challenging problem. To address this issue, we introduce TwinMixing, a lightweight multi-task segmentation model designed explicitly for drivable-area and lane segmentation. The proposed network features a shared encoder and task-specific decoders, enabling both feature sharing and task specialization. Within the encoder, we propose an Efficient Pyramid Mixing (EPM) module that enhances multi-scale feature extraction through a combination of grouped convolutions, depthwise dilated convolutions and channel shuffle operations, effectively expanding the receptive field while minimizing computational cost. Each decoder adopts a Dual-Branch Upsampling (DBU) Block composed of a learnable transposed convolution-based Fine detailed branch and a parameter-free bilinear interpolation-based Coarse grained branch, achieving detailed yet spatially consistent feature reconstruction. Extensive experiments on the BDD100K dataset validate the effectiveness of TwinMixing across three configurations - tiny, base, and large. Among them, the base configuration achieves the best trade-off between accuracy and computational efficiency, reaching 92.0% mIoU for drivable-area segmentation and 32.3% IoU for lane segmentation with only 0.43M parameters and 3.95 GFLOPs. Moreover, TwinMixing consistently outperforms existing segmentation models on the same tasks, as illustrated in Fig. 1. Thanks to its compact and modular design, TwinMixing demonstrates strong potential for real-time deployment in autonomous driving and embedded perception systems. The source code: https://github.com/Jun0se7en/TwinMixing.

Paper Structure

This paper contains 25 sections, 1 equation, 12 figures, 10 tables.

Figures (12)

  • Figure 1: The horizontal axis represents FLOPs, the vertical axis denotes mIoU, and the circle radius corresponds to IoU.
  • Figure 2: Visual comparison of semantic scene understanding versus drivable area and lane segmentation, highlighting the focus on safety-critical and navigable regions for autonomous driving.
  • Figure 3: The architecture of TwinMixing. The model consists of a shared encoder and two task-specific decoders. The encoder integrates the proposed Efficient Pyramid Mixing (EPM) modules to enhance multi-scale feature extraction and contextual representation. Each decoder adopts a Dual Branch Upsampling Block (DBU) composed of Fine detailed branch and Coarse grained branch. The two decoders independently generate segmentation masks for lane lines and drivable areas, respectively.
  • Figure 4: Overview of the encoder in TwinMixing. The encoder extracts hierarchical multi-scale features from the input image through a combination of standard convolutional layers, Efficient Pyramid Mixing (EPM) modules, and Partial Class Activation Attention (PCAA) pcaa, producing the shared representation $\mathcal{F}_e$ for subsequent decoding.
  • Figure 5: Illustration of the proposed Efficient Pyramid Mixing (EPM) module. The design is inspired by the ESP espnet, where the EPM performs a reduction step using an EPM Unit with a kernel 1$\times$1 before splitting features into multiple parallel branches. Each branch transforms the reduced feature through an EPM Unit with a different dilation rate to capture multi-scale spatial information. The outputs of all branches are then merged through the Hierarchical Feature Fusion (HFF) mechanism espnet. In the Stride EPM variant, the reduction step employs an EPM Unit with a kernel 1$\times$1 and a stride of 2 to achieve downsampling.
  • ...and 7 more figures