International Journal of Control, Automation, and Systems 2024; 22(9): 2899-2908
https://doi.org/10.1007/s12555-024-0089-8
© The International Journal of Control, Automation, and Systems
In the realm of infrared (IR) small target detection, pinpointing blurry and low-contrast targets accurately is immensely challenging due to the intricate features of IR images. To tackle this, we introduce CSI-Net, a novel network architecture merging CNN and swin transformer. CSI-Net features a hybrid encoder design, blending encoder-decoder layout of UNet with swin transformer’s parallel execution alongside CNN. This amalgamation enables the network to capture local features and long-distance dependencies, enhancing its ability to accurately identify small targets. Leveraging hierarchical features of swin transformer, CSI-Net adeptly grasps contextual information crucial for small target detection. Moreover, CSI-Net employs full-scale skip connections over encoder-decoder and decoder-decoder, integrating multiscale CNN and swin transformer features to improve gradient propagation. Experimental results validate superiority of proposed method over traditional CNN and Transformer methods. At NUAA-SIRST, metrics like mIoU (0.7483), detection probability (0.9734), and false alarm rates (0.101 × 10−5) demonstrate significant improvement. Similarly, at NUDT-SIRST, values like mIoU (0.8887), detection probability (0.9894), and false alarm rates (0.431×10−5) show notable enhancement. The performance of network scales with dataset size, and its robustness is affirmed by the area under the ROC curve (AUC). Additionally, an ablation study validates the efficacy of hybrid encoder. Varying the presence of the parallel swin transformer module (PSM) reveals that its application enhances small target detection performance. The comprehensive evaluation shows that the swin transformer-enhanced UNet architecture effectively tackles the challenges of IR small target detection.
Keywords Hybrid encoder, infrared (IR) image, small target detection, swin transformer, UNet.
International Journal of Control, Automation, and Systems 2024; 22(9): 2899-2908
Published online September 1, 2024 https://doi.org/10.1007/s12555-024-0089-8
Copyright © The International Journal of Control, Automation, and Systems.
Lammi Choi, Won Young Chung, and Chan Gook Park*
Seoul National University
In the realm of infrared (IR) small target detection, pinpointing blurry and low-contrast targets accurately is immensely challenging due to the intricate features of IR images. To tackle this, we introduce CSI-Net, a novel network architecture merging CNN and swin transformer. CSI-Net features a hybrid encoder design, blending encoder-decoder layout of UNet with swin transformer’s parallel execution alongside CNN. This amalgamation enables the network to capture local features and long-distance dependencies, enhancing its ability to accurately identify small targets. Leveraging hierarchical features of swin transformer, CSI-Net adeptly grasps contextual information crucial for small target detection. Moreover, CSI-Net employs full-scale skip connections over encoder-decoder and decoder-decoder, integrating multiscale CNN and swin transformer features to improve gradient propagation. Experimental results validate superiority of proposed method over traditional CNN and Transformer methods. At NUAA-SIRST, metrics like mIoU (0.7483), detection probability (0.9734), and false alarm rates (0.101 × 10−5) demonstrate significant improvement. Similarly, at NUDT-SIRST, values like mIoU (0.8887), detection probability (0.9894), and false alarm rates (0.431×10−5) show notable enhancement. The performance of network scales with dataset size, and its robustness is affirmed by the area under the ROC curve (AUC). Additionally, an ablation study validates the efficacy of hybrid encoder. Varying the presence of the parallel swin transformer module (PSM) reveals that its application enhances small target detection performance. The comprehensive evaluation shows that the swin transformer-enhanced UNet architecture effectively tackles the challenges of IR small target detection.
Keywords: Hybrid encoder, infrared (IR) image, small target detection, swin transformer, UNet.
Vol. 22, No. 9, pp. 2673~2953