淡江大學機構典藏:Item 987654321/74609
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62822/95882 (66%)
Visitors : 4015061      Online Users : 654
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74609


    Title: 視覺化網頁區塊擷取
    Other Titles: Visual block-based data extraction from web page
    Authors: 莊政洋;Chuang, Cheng-Yang
    Contributors: 淡江大學資訊工程學系碩士班
    蔡憶佳;Tsai, Yih-Jia
    Keywords: 資料擷取;網頁分割;視覺區塊;Data extraction;Page segmentation;Visual block
    Date: 2011
    Issue Date: 2011-12-28 19:02:35 (UTC+8)
    Abstract: 現今網際網路上所存在的資料量越來越大、越來越豐富,並且以非常快的速度持續成長。使用者常常需要在各個主要內容來源網站搜集新資訊,更有不少人是幾乎每天會固定瀏覽一些特定的網頁,查看持續更新的內容。使用上的需求逼迫人們必須要在各個內容來源網站之間跳躍,只是為了查看是否存在自己感興趣的資料。這樣高頻率的在不同頁面間跳轉瀏覽,提高了資訊蒐集的成本,如何讓資訊的取得更方便快速,是一個重要的課題。
    自動擷取網頁內容的方法大致分為兩個方向,傳統的方法是針對目標網站寫一個擷取程式,透過人工觀察網頁原始碼的方式,設計出內容擷取的流程。而這樣的作法除了需要具有程式撰寫能力外,如果擷取目標網站不只一個,就需要針對每一個網站撰寫各自獨立的擷取流程。另一類方法是透過預先定義的判斷方式,去自動判定不同網頁中的資料所在位置,再加以擷取下來。但頁面中到底哪些資料對使用者來說是感興趣的,非常難給予一個精確定義,導致系統在資料區域的定義上很難有一個通用的解釋。在本篇論文中提出VBDE (Visual Block-based Data Extraction) 網頁區塊擷取演算法,並結合視覺化操作介面,實作一個視覺化的資料擷取系統。使用者不需要具備相關背景知識,就能在一個直覺的操作環境下指定網頁中想要擷取的特定區塊。在不同的網頁間,有效的適應並正確擷取資料。
    With the explosive growth of web pages available on the Internet, network has become a major source of information for a large number of users. Those users will regularly browse specific websites to check for new information. High-frequency jumps between different web pages increase data collection cost. Therefore, how to efficiently retrieve users’ interested information from different web pages is an important issue.
    There are two major categories of algorithm in extracting contents of web pages automatically. First, we can observe source code of web page and write a specific program to capture those data of our interest. However you must have the ability to write the program and if you want to capture data from different web pages, then you need to write different programs for different web pages. Another way to extract content data is by defining rules. Use extracting rules to find the data of our interest. Nevertheless, a set of general rules to describe users interested data region is hard to define.
    In this paper, we propose VBDE (Visual Block-based Data Extraction) algorithm to extract the specific data block from different web pages. We provide user a visual data extraction system without requiring users to have deep knowledge such as how to define rules for capturing information in the web page or how to program in specific programming language.
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

    Files in This Item:

    File SizeFormat
    index.html0KbHTML235View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback