淡江大學機構典藏:Item 987654321/112117
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 62822/95882 (66%)
造訪人次 : 4017075      線上人數 : 554
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/112117


    題名: Heuristics-Based Schema Extraction for Deep Web Query Interfaces
    作者: Jou, Chichang;Cheng, Yucheng
    關鍵詞: Deep Web, Query Interface, Schema Extraction, Heuristic Rules, String Similarity
    日期: 2017-08-04
    上傳時間: 2017-11-15 02:10:59 (UTC+8)
    摘要: Along with the fast popularity of the internet, contents
    inside web databases also increase quickly. These data,
    hidden behind the query interfaces, are called Deep Web. Volumes of deep web contents were estimated to be around 500 times those of surface web. In order to obtain the dynamic contents which satisfy the conditions imposed by the elements of the interface, the internet users must fill in valid values. This is the reason why these contents are not collected by the search engines. Many deep web contents related applications, like contents collection, topic-focused crawling, and data integration, are based on understanding the schema of these query interfaces. The schema needs to cover mappings of input elements and labels, data types of valid input values, and range constraints of the input values, etc. We propose a Heuristics-based deep web query interface Schema Extraction system (HSE) that identifies labels, elements, mappings among labels and elements, and relationships among elements. In HSE, Texts surrounding elements are collected as candidate labels.
    We propose a string similarity definition and dynamic
    similarity threshold setup to cleanse or modify candidate labels. Elements, candidate labels, and new lines in the query interface are streamlined to produce its Interface Expression (IEXP). By combining the users' view and the designer’s view, with the aid of semantic information, we then build heuristic rules to extract schema from IEXP of query interfaces in the ICQ dataset. These rules are constructed through utilizing (1) the characteristics of labels and elements, and (2) the spatial, group, and range relationships of labels and elements. Our schema not only helps extracting contents of the deep web, but also benefits the processes of schema matching and schema merging. The experimental results on the TEL-8 dataset show that HSE produces effective performance.
    關聯: 
    顯示於類別:[資訊管理學系暨研究所] 會議論文

    文件中的檔案:

    沒有與此文件相關的檔案.

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋