淡江大學機構典藏:Item 987654321/126772
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 64191/96979 (66%)
造訪人次 : 8237082      線上人數 : 7606
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/126772


    題名: HMTV: Hierarchical Multimodal Transformer for Video Highlight Query on Baseball
    作者: Zhang, Qiaoyun;Chang, Chih-Yung;Su, Ming-Yang;Chang, Hsiang-Chuan;Roy, Diptendu Sinha
    關鍵詞: 分層多模態 Transformer;BERT;反白顯示查詢
    日期: 2024-09-23
    上傳時間: 2025-03-20 09:24:10 (UTC+8)
    摘要: With the increasing popularity of watching baseball videos, there is a growing desire among fans to enjoy the highlights of these videos. However, the extraction of the highlights from lengthy baseball videos faces a significant challenge due to its time-consuming and labor-intensive nature. To address this challenge, this paper proposes a novel mechanism, called Hierarchical Multimodal Transformer for Video query (HMTV). The proposed HMTV incorporates a two-phase involving Coarse-Grained clipping for candidate videos and Fine-Grained identification for highlights. In the Coarse-Grained phase, a pitching detection model is employed to extract relevant candidate videos from baseball videos, encompassing the features of pitch deliveries and pitching. In the Fine-Grained phase, Transformer encoder and pre-trained Bidirectional Encoder Representations from Transformers (BERT) are utilized to capture relationship features between frames of candidate videos and words from users’ questions, respectively. These relationship features are then fed into the Video Query (VideoQ) model, implemented by the Text Video Attention (TVA). The VideoQ model identifies the start and end positions of the highlights mentioned in the query within the candidate videos. Simulation results demonstrate that the proposed HMTV significantly improves accuracy of highlights identification in terms of precision, recall, and F1-score.
    關聯: Multimedia Systems 30(285), p. 1-18
    DOI: 10.1007/s00530-024-01479-6
    顯示於類別:[資訊工程學系暨研究所] 期刊論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML18檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋