Precise news video text detection and text extraction based on multiple frames integration

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/54177

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/54177

Title:	Precise news video text detection and text extraction based on multiple frames integration
Other Titles:	基於多影格的精確新聞影片文字偵測與擷取
Authors:	張曉維;Chang, Hsiao-Wei
Contributors:	淡江大學資訊工程學系博士班顏淑惠;Yen, Shwu-Huey
Keywords:	文字偵測;文字擷取;二值化;黑白轉變;肯尼邊緣偵測器;text detection;text extraction;binarization;black-and-white transition;Canny edge detector
Date:	2011
Issue Date:	2011-06-16 22:07:00 (UTC+8)
Abstract:	出現於新聞影片中的文字對於新聞影片的索引與摘要是很重要的。在本論文中，我們提出一個強韌與有效的文字偵測（text detection）方法，以及隨後對偵測到的文字區域做精確的文字擷取（text extraction），即二值化（binarization）。我們提出的文字偵測方法是首先利用時間的訊息與邏輯運算AND移除絕大多數不相關的背景，然後在邊圖（edge map）上應用視窗為基礎的方式計算黑白轉變（black-and-white transition）得到粗略的文字塊。直線消除法被運用二次以細緻化文字塊。我們提出的方法可適用於多種語言，例如：英文、日文與中文。對文字亮度低（背景亮度高）或文字亮度高（背景亮度低）、文字的不同大小與文字水平或垂直方向的排列都具有強韌性。我們以三種評估方法去測量本論文所提出文字偵測方法的效能，對多種語言的實驗結果均可達到96%以上的優異表現。文字偵測之後，我們提出文字擷取（二值化）的方法，它是首先利用肯尼邊緣偵測器（Canny edge detector）於已偵測到的文字方塊上，然後對文字方塊由左至右垂直掃描二次。垂直線由上往下掃描穿過各像素直到碰撞邊緣像素（edge pixel）或到達最底線，相同的，垂直線由下往上掃描穿過各像素直到碰撞邊緣像素或到達最頂線，所有這些被穿過的像素均分類為背景像素。然後我們從非背景像素的直方圖中找出最多相同亮度的背景像素點p及計算出標準差σ，最後根據判斷文字亮度低（高），設定臨界值T = [0, p+kσ]或T = [p-kσ, 255]，並擷取出文字。我們提出的文字擷取方法的特點是不需任何參數，對文字亮度低（高）亦沒有限制，並能處理背景與文字有相同的亮度的狀況。本方法亦可用於不同的新聞影片、歷史檔案文件及其他不同的文件上，在準確性與品質方面更優於其他眾所周知的方法，例如：Otsu、Niblack及Souvola。 Text on news video is crucial for news video indexing and summarization. In this thesis, we present both a robust and efficient text detection algorithm and the subsequent precise text extraction (binarization) algorithm to binarize the detected text regions on news videos. The proposed text detection algorithm first uses both the temporal information of video and the logical AND operation to remove most of the irrelevant background. Then, a window-based method by counting the black-and-white transitions is applied to the resulted edge map to obtain rough text blocks. Line deletion technique is used twice to refine the text blocks. The proposed algorithm is applicable to multiple languages (i.e. English, Japanese, and Chinese), robust to text polarities (positive or negative), various character sizes, and text alignments (horizontal or vertical). Three metrics (recall, precision, and quality of bounding preciseness) are adopted to measure the efficacy of text detection algorithms. According to the experimental results on various multilingual video sequences, the proposed algorithm has above 96% performance in all three metrics. Following text detection, the text extraction (binarization) algorithm proposed first applies the Canny edge detector on the text box. Next, the vertical line scanning from left to right of the text box is performed twice. The vertical line traverses downwards until it hits an edge pixel or it reaches the bottom of the box. Similarly, the vertical line traverses upwards until it hits an edge pixel or it reaches the top of the box. These traversed pixels are classified as background pixels. The algorithm then locate the peak intensity p and evaluate the standard deviation σ from the histogram of those non-background pixels. And finally, after the threshold is set to be T = [0, p+kσ] or T = [p-kσ, 255] depending on text polarity, the algorithm obtain the result of text extraction (binarization). Notably, the proposed method is parameter-free, has no limitation on the text polarity, and can handle the cases with similar intensity in both background and text of news video. The method has been extensively experimented on text boxes from various news videos, historical archive documents, and other different documents. The proposed algorithm outperforms the well-known methods such like Otsu, Niblack, and Souvola methods etc. in precision and quality.
Appears in Collections:	[資訊工程學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	420	View/Open

Loading...