淡江大學機構典藏:Item 987654321/108702
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 62797/95867 (66%)
造访人次 : 3728245      在线人数 : 830
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/108702


    题名: Semantics-Assisted Deep Web Query Interface Classification
    作者: Jou, Chi-Chang
    关键词: Query Interface Classification;Semantics;Web Database;Web Mining;Deep Web;Heuristics
    日期: 2015-07-13
    上传时间: 2016-12-03 02:11:26 (UTC+8)
    摘要: Huge amounts of structured data sources are hidden in the databases behind web forms. Volumes of deep web contents were estimated to be around 500 times those of surface web. However, many web forms are not deep web query interfaces. To retrieve contents in the web databases, an important task is to identify those web forms that are deep web query interfaces. Deep web contents normally are associated with a specific domain, and many domain semantics are embedded in the web forms. Additionally, returned HTML pages of deep web queries contain particular patterns, which could assist identifying query interfaces. Thus, we collect the following semantics to assist the classification: (1) feature words: for non-query forms and for keyword fields in deep web query interfaces; (2) common fields in a particular domain: their valid values and relationships, and their synonyms. We design and implement a Semantics-Assisted deep Web Query Interface Classifier (SAWQIC) system based on heuristics. In the pre-query analysis of SAWQIC, feature words of non-query form attributes are combined with heuristics to filter out non-query forms. For web forms passing the filtering, we utilize semantics in filling in valid input data for their components to submit the form. In the post-query analysis of SAWQIC, we then use heuristics in analyzing the returned HTML pages to identify the deep web query interfaces. The SAWQIC system is evaluated against web forms for the "Book" and "Job" domains. The experimental results illustrate that SAWQIC could generate highly effective classification measures.
    關聯: 會議論文集
    显示于类别:[資訊管理學系暨研究所] 會議論文

    文件中的档案:

    档案 大小格式浏览次数
    index.html0KbHTML137检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈