淡江大學機構典藏:Item 987654321/52344
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62822/95882 (66%)
Visitors : 4028238      Online Users : 565
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/52344


    Title: 植基於網頁結構的資料區塊化自動分類
    Other Titles: Automatic identification of data blocks based on web page structure
    Authors: 廖益辰;Liso, Yi-chen
    Contributors: 淡江大學資訊工程學系碩士在職專班
    蔡憶佳;Tsai, Yih-jia
    Keywords: 網頁結構;web page structure
    Date: 2010
    Issue Date: 2010-09-23 17:33:50 (UTC+8)
    Abstract: 網際網路發展至今的普及化,再加上使用者瀏覽行為的改變,許多資料內容的取得已漸漸地從紙本轉移至網際網路上,如:新聞網站資訊的提供就是一例。然而,隨著網際網路資訊量愈來愈多,使得自動化資料收集的機制成為一個不可或缺的重要工具。
    目前一般資料收集的方法,除了網站有提供Really Simple Syndication(RSS)機制可供用戶訂閱之外,其餘便是以特定程式分析網頁結構的方法取得網頁資料,但若當網頁視覺結構改變時,那麼分析網頁程式便得重新改變。因此,本篇論文希望提出一個可自動化分析網頁結構的方法,經由分析網頁結構,找出網頁結構樣式,並加以驗證後,使得該網頁結構樣式成為分析規則。
    本文利用其分析規則,對實驗目的網站每一個小時擷取一次資料,並且比對資料更新的新聞項目,經驗證後本論文所提出的方法確實能自動化地分析網頁結構,並達到資料收集的目的。
    The internet has been a major source of information. It has taken the place of paper and become the most popular medium, such as: News web sites. Therefore, developing an automatic data collection technology is very important.
    At present the Really Simple Syndication (RSS) is a general of data collection method for the users. Besides, it is use the specific program analysis web page structures to obtain the web page information. When the web page changed, the program must be rewritten. Therefore, this paper provides an automated analysis web page structure method. Using this method find the web page pattern and approved it can be the rule. It has been tested in automatic collection of web page data.
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

    Files in This Item:

    File SizeFormat
    index.html0KbHTML238View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback