Abstract:
[Objective] The domain adaptation issue among metro track images results in low segmentation accuracy for track-wheel images with high inter-class similarity in existing algorithms. To address this challenge, a few-shot metro track-wheel image segmentation algorithm based on a cross-attention network is proposed. [Method] The computational roadmap and process of the few-shot metro track-wheel image segmentation algorithm based on cross-attention network is elaborated. First, a group of backbone networks with shared weights is employed to map the input track-wheel images from both the support branch and the query branch into a deep feature space. Then, the low-, mid-, and high-level features from the dual-branch mappings are fused across scales. A cross-attention network is used to mine the relational semantics between these fused features, enabling the capture of shared semantic information in the deep space across different metro track-wheel images belonging to the same class. Finally, an average pooling is applied to convert the common features of both branches into class-specific prototypes, and the prototypes are leveraged to guide the segmentation of unannotated track-wheel images in the query images. Comparative and ablation experiments are conducted on a self-constructed metro track-wheel image dataset to verify the accuracy and effectiveness of the algorithm. [Result & Conclusion] Testing shows that the proposed algorithm achieves a mIoU (mean intersection over union) of 66.17% and a foreground-background intersection over union (FB-IoU) of 78.21%. Compared with current mainstream semantic segmentation algorithms, the proposed few-shot metro track-wheel image segmentation algorithm based on cross-attention networks demonstrates significantly improved segmentation performance and shows potential for practical application.