English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62830/95882 (66%)
Visitors : 4061480      Online Users : 463
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/123315


    Title: Swin-JDE: Joint Detection and Embedding Multi-Object Tracking in Crowded Scenes Based on Swin-Transformer
    Authors: Tsai, Chi-Yi;Shen, Guan-Yu;Nisar, Humaira
    Date: 2023-03
    Issue Date: 2023-04-28 17:36:47 (UTC+8)
    Abstract: Multi-object tracking (MOT) is a highly valued and challenging research topic in computer vision. To achieve more robust tracking performance, recently published MOT methods tend to use anchor-free object detectors, which have the advantage of dealing with the identity ambiguity problem encountered by anchor-based methods in learning appearance features. However, in practical applications, it is found that the detection accuracy of the anchor-free object detector based on classical convolutional neural networks in crowded scenes will be significantly reduced. In order to have better detection and tracking performance in crowded scenes, this paper proposes an anchor-free joint detection and embedding (JDE) MOT method based on Transformer architecture, called Swin-JDE. The proposed method includes a novel Patch-Expanding module, which can improve the spatial information of feature maps by up-sampling processing through neural network learning and Einops Notation-based rearrangement to enhance the detection and tracking performance of the MOT model. In terms of training method, we propose a two-step training method that trains the detection branch separately from the appearance branch to enhance the detection robustness of anchor-free predictors. Furthermore, during the training process, we also propose an examination method to remove occluded targets from the training dataset to improve the accuracy of the appearance embedding layer. In terms of data association, we propose a new post-processing method, which simultaneously considers the three factors of detection confidence, appearance embedding distance, intersection-over-union (IoU) distance to match each tracklet and the detection information to improve the tracking robustness of the MOT model. Experimental results show that the proposed method achieves 70.38% multiple object tracking accuracy (MOTA) and 69.53% identification F1-score (IDF1) results in the MOT20 benchmark dataset, and the identification switch (ID Switch) is reduced to 2026. Compared with FairMOT, the proposed method improves MOTA and IDF1 by 8.58% and 2.23%, respectively.
    Relation: Engineering Applications of Artificial Intelligence 119(2), p.1-16
    DOI: 10.1016/j.engappai.2022.105770
    Appears in Collections:[電機工程學系暨研究所] 期刊論文

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML51View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback