ISSN 1004-4140
CN 11-3017/P
汤少杰, 袁腾奇, 李思余, 等. 基于LSTM与交叉注意力机制的少样本周期视频图像分割[J]. CT理论与应用研究(中英文), xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033.
引用本文: 汤少杰, 袁腾奇, 李思余, 等. 基于LSTM与交叉注意力机制的少样本周期视频图像分割[J]. CT理论与应用研究(中英文), xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033.
TANG S J, YUAN T Q, LI S Y, et al. Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism[J]. CT Theory and Applications, xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033. (in Chinese).
Citation: TANG S J, YUAN T Q, LI S Y, et al. Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism[J]. CT Theory and Applications, xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033. (in Chinese).

基于LSTM与交叉注意力机制的少样本周期视频图像分割

Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

  • 摘要: 随着现代视频技术的发展,周期运动视频图像分割在运动分析、医学影像等领域中具有重要应用。本文基于深度学习技术设计一种新颖的周期性运动检测和分割网络,结合卷积长短期记忆网络(LSTM)和交叉注意力机制,只需要相对较少的标签,便能够有效捕获视频序列中感兴趣对象的时空上下文信息、跨帧一致性并进行精确分割。实验结果表明,少样本标签情况下,本文方法在周期性运动视频数据集上表现出色。在普通视频中,平均区域相似度和轮廓相似度分别为67.51% 和72.97%,相较于传统方法普遍提升1%~1.5%。在医学视频中,平均区域相似度和轮廓相似度分别为59.93% 和90.56%,在区域相似度上,相较于DAN和Unet分别提升12.92% 和8.85%。在轮廓相似度上,分别提升20.09% 和12.89%,具有更高的准确性和稳定性。

     

    Abstract: With the development of modern video technology, periodic motion video image segmentation has important applications in motion analysis, medical imaging, and other fields. In this study, we designed a novel periodic motion detection and segmentation network based on deep learning technology, which combines the convolutional long short term memory network (LSTM) and cross-attention mechanism. With relatively few labels, we can effectively capture the spatiotemporal context information of the objects of interest in the video sequence, achieving cross-frame consistency and accurate segmentation. Experimental results show that the proposed method performs well on periodic motion video datasets with few sample labels. In an ordinary video, the average region similarity and contour accuracy were 67.51% and 72.97%. respectively, which improved by 1%~1.5% than those obtained with the traditional method. In medical videos, the average region similarity and contour accuracy were 59.93% and 90.56%, respectively. Compared with DAN and Unet, the proposed method increased the regional similarity by 12.92% and 8.85%, whereas it improved the contour accuracy by 20.09% and 12.89%, respectively, thus achieving higher accuracy and stability.

     

/

返回文章
返回