English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 62805/95882 (66%)
造访人次 : 3926856      在线人数 : 768
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/87765


    题名: 商品展覽會深網整合及其關鍵字查詢排名策略
    其它题名: Deep web integration of product exhibitions and its ranking strategy for keyword search
    作者: 石永瑜;Shih, Yung-Yu
    贡献者: 淡江大學資訊管理學系碩士班
    周清江
    关键词: 深網整合;關鍵字查詢;排名策略;Deep Web Integration;Keyword Search;Ranking Strategy
    日期: 2012
    上传时间: 2013-04-13 11:42:37 (UTC+8)
    摘要: 隨著網路使用量不斷地增加,搜尋引擎已成為蒐集資訊情報的重要工具,但仍然有許多有價值資料隱藏在深層網路的資料庫內,無法有效率的在傳統搜尋引擎中被找到,本研究以商品展覽會網路資料庫為例,提供一個解決方案。一個中小企業人員及參展廠商,在網路上常面臨到無法確實得知何時何地有國際展覽會舉行,而展覽會中又有哪些公司及相關產品參展,所花費的時間過長且找尋到資料未必齊全,無法有效地蒐集展覽會相關資訊。本研究整合網路上來自相同領域不同展覽會的資料,並提供使用者進行產品關鍵字查詢,查詢結果包括了產品所屬的公司及該公司中與關鍵字相關產品。本研究由兩個系統完成:(1)資料整合系統:使用網路機器人,蒐集多個展覽會網站資料來源、將不同網站所提供的資訊,整合於關聯式資料庫中;(2)排名處理系統:處理關鍵字查詢,且提供排名策略,除了參考過去研究之值組樹大小標準化、文件長度標準化、反向文件頻率標準化及文件之間權重標準化的調整因素外,本研究加入特定欄位出現次數權重及異質資料倍率權重進行排序調整,讓公司及產品資訊與使用者輸入的關鍵字相關性較高者,排名較前面。經過使用者測試評估顯示,當特定欄位出現次數權重值為9及異質資料倍率權重值為2-7時,平均準確率(Mean Average Precision, MAP)的結果為0.6471,與未考慮這兩項的做法比較,有59.70%的改善。
    With the rapid development of World Wide Web, the search engine has become an important tool to collect information. However, there are still lots of valuable information in the deep web that can’t be found by traditional search engine efficiently. We tackle the problem using web exhibition product databases. A small and medium enterprises (SMEs) personnel and exhibitor often face a problem in the web that they could not exactly know when and where an international exhibition to would be held and they could not get the information about which companies and related products are in the exhibition. The collection of this information takes time. Furthermore, it may not be the complete information. In this study, we integrate different exhibition websites information in the same field. It provides users to search product through keyword query. Moreover, the query results include the product’s company and its other products related to the keyword. The system is implemented by the combination of two systems. The first one is the crawler extracting system that uses network robot to collect many data of exhibition sites in the same field and to integrate these data into a relational database. The other one is the query processing system that answers a keyword query with its ranking strategies. Except for the tuple tree size normalization, the document length normalization reconsidered, the document frequency normalization and the inter-document weight normalization that were used in the past research, we join the specific field occurrences weight and heterogeneous data weights to adjust ranking list. The more company and product descriptions related to the keywords, the closer they will be put in the top of the result. Compared with past practices, when specific field occurrences weight is with value 9 and heterogeneous data weights with value 2-7, our experiments had a MAP (Mean Average Precision) value 0.6471, which was 59.70% improvement.
    显示于类别:[資訊管理學系暨研究所] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    index.html0KbHTML172检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈