視覺化網頁區塊擷取

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/74609

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74609

Title:	視覺化網頁區塊擷取
Other Titles:	Visual block-based data extraction from web page
Authors:	莊政洋;Chuang, Cheng-Yang
Contributors:	淡江大學資訊工程學系碩士班蔡憶佳;Tsai, Yih-Jia
Keywords:	資料擷取;網頁分割;視覺區塊;Data extraction;Page segmentation;Visual block
Date:	2011
Issue Date:	2011-12-28 19:02:35 (UTC+8)
Abstract:	現今網際網路上所存在的資料量越來越大、越來越豐富，並且以非常快的速度持續成長。使用者常常需要在各個主要內容來源網站搜集新資訊，更有不少人是幾乎每天會固定瀏覽一些特定的網頁，查看持續更新的內容。使用上的需求逼迫人們必須要在各個內容來源網站之間跳躍，只是為了查看是否存在自己感興趣的資料。這樣高頻率的在不同頁面間跳轉瀏覽，提高了資訊蒐集的成本，如何讓資訊的取得更方便快速，是一個重要的課題。自動擷取網頁內容的方法大致分為兩個方向，傳統的方法是針對目標網站寫一個擷取程式，透過人工觀察網頁原始碼的方式，設計出內容擷取的流程。而這樣的作法除了需要具有程式撰寫能力外，如果擷取目標網站不只一個，就需要針對每一個網站撰寫各自獨立的擷取流程。另一類方法是透過預先定義的判斷方式，去自動判定不同網頁中的資料所在位置，再加以擷取下來。但頁面中到底哪些資料對使用者來說是感興趣的，非常難給予一個精確定義，導致系統在資料區域的定義上很難有一個通用的解釋。在本篇論文中提出VBDE (Visual Block-based Data Extraction) 網頁區塊擷取演算法，並結合視覺化操作介面，實作一個視覺化的資料擷取系統。使用者不需要具備相關背景知識，就能在一個直覺的操作環境下指定網頁中想要擷取的特定區塊。在不同的網頁間，有效的適應並正確擷取資料。 With the explosive growth of web pages available on the Internet, network has become a major source of information for a large number of users. Those users will regularly browse specific websites to check for new information. High-frequency jumps between different web pages increase data collection cost. Therefore, how to efficiently retrieve users’ interested information from different web pages is an important issue. There are two major categories of algorithm in extracting contents of web pages automatically. First, we can observe source code of web page and write a specific program to capture those data of our interest. However you must have the ability to write the program and if you want to capture data from different web pages, then you need to write different programs for different web pages. Another way to extract content data is by defining rules. Use extracting rules to find the data of our interest. Nevertheless, a set of general rules to describe users interested data region is hard to define. In this paper, we propose VBDE (Visual Block-based Data Extraction) algorithm to extract the specific data block from different web pages. We provide user a visual data extraction system without requiring users to have deep knowledge such as how to define rules for capturing information in the web page or how to program in specific programming language.
Appears in Collections:	[資訊工程學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	428	View/Open

Loading...