Swin-JDE: Joint Detection and Embedding Multi-Object Tracking in Crowded Scenes Based on Swin-Transformer

doi:10.1016/j.engappai.2022.105770

淡江大學機構典藏 > 工學院 > 電機工程學系暨研究所 > 期刊論文 > Item 987654321/123315

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/123315

題名:	Swin-JDE: Joint Detection and Embedding Multi-Object Tracking in Crowded Scenes Based on Swin-Transformer
作者:	Tsai, Chi-Yi;Shen, Guan-Yu;Nisar, Humaira
日期:	2023-03
上傳時間:	2023-04-28 17:36:47 (UTC+8)
摘要:	Multi-object tracking (MOT) is a highly valued and challenging research topic in computer vision. To achieve more robust tracking performance, recently published MOT methods tend to use anchor-free object detectors, which have the advantage of dealing with the identity ambiguity problem encountered by anchor-based methods in learning appearance features. However, in practical applications, it is found that the detection accuracy of the anchor-free object detector based on classical convolutional neural networks in crowded scenes will be significantly reduced. In order to have better detection and tracking performance in crowded scenes, this paper proposes an anchor-free joint detection and embedding (JDE) MOT method based on Transformer architecture, called Swin-JDE. The proposed method includes a novel Patch-Expanding module, which can improve the spatial information of feature maps by up-sampling processing through neural network learning and Einops Notation-based rearrangement to enhance the detection and tracking performance of the MOT model. In terms of training method, we propose a two-step training method that trains the detection branch separately from the appearance branch to enhance the detection robustness of anchor-free predictors. Furthermore, during the training process, we also propose an examination method to remove occluded targets from the training dataset to improve the accuracy of the appearance embedding layer. In terms of data association, we propose a new post-processing method, which simultaneously considers the three factors of detection confidence, appearance embedding distance, intersection-over-union (IoU) distance to match each tracklet and the detection information to improve the tracking robustness of the MOT model. Experimental results show that the proposed method achieves 70.38% multiple object tracking accuracy (MOTA) and 69.53% identification F1-score (IDF1) results in the MOT20 benchmark dataset, and the identification switch (ID Switch) is reduced to 2026. Compared with FairMOT, the proposed method improves MOTA and IDF1 by 8.58% and 2.23%, respectively.
關聯:	Engineering Applications of Artificial Intelligence 119(2), p.1-16
DOI:	10.1016/j.engappai.2022.105770
顯示於類別:	[電機工程學系暨研究所] 期刊論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	169	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....