商品展覽會深網整合及其關鍵字查詢排名策略

淡江大學機構典藏 > 商管學院 > 資訊管理學系暨研究所 > 學位論文 > Item 987654321/87765

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/87765

題名:	商品展覽會深網整合及其關鍵字查詢排名策略
其他題名:	Deep web integration of product exhibitions and its ranking strategy for keyword search
作者:	石永瑜;Shih, Yung-Yu
貢獻者:	淡江大學資訊管理學系碩士班周清江
關鍵詞:	深網整合;關鍵字查詢;排名策略;Deep Web Integration;Keyword Search;Ranking Strategy
日期:	2012
上傳時間:	2013-04-13 11:42:37 (UTC+8)
摘要:	隨著網路使用量不斷地增加，搜尋引擎已成為蒐集資訊情報的重要工具，但仍然有許多有價值資料隱藏在深層網路的資料庫內，無法有效率的在傳統搜尋引擎中被找到，本研究以商品展覽會網路資料庫為例，提供一個解決方案。一個中小企業人員及參展廠商，在網路上常面臨到無法確實得知何時何地有國際展覽會舉行，而展覽會中又有哪些公司及相關產品參展，所花費的時間過長且找尋到資料未必齊全，無法有效地蒐集展覽會相關資訊。本研究整合網路上來自相同領域不同展覽會的資料，並提供使用者進行產品關鍵字查詢，查詢結果包括了產品所屬的公司及該公司中與關鍵字相關產品。本研究由兩個系統完成：(1)資料整合系統：使用網路機器人，蒐集多個展覽會網站資料來源、將不同網站所提供的資訊，整合於關聯式資料庫中；(2)排名處理系統：處理關鍵字查詢，且提供排名策略，除了參考過去研究之值組樹大小標準化、文件長度標準化、反向文件頻率標準化及文件之間權重標準化的調整因素外，本研究加入特定欄位出現次數權重及異質資料倍率權重進行排序調整，讓公司及產品資訊與使用者輸入的關鍵字相關性較高者，排名較前面。經過使用者測試評估顯示，當特定欄位出現次數權重值為9及異質資料倍率權重值為2-7時，平均準確率(Mean Average Precision, MAP)的結果為0.6471，與未考慮這兩項的做法比較，有59.70%的改善。 With the rapid development of World Wide Web, the search engine has become an important tool to collect information. However, there are still lots of valuable information in the deep web that can’t be found by traditional search engine efficiently. We tackle the problem using web exhibition product databases. A small and medium enterprises (SMEs) personnel and exhibitor often face a problem in the web that they could not exactly know when and where an international exhibition to would be held and they could not get the information about which companies and related products are in the exhibition. The collection of this information takes time. Furthermore, it may not be the complete information. In this study, we integrate different exhibition websites information in the same field. It provides users to search product through keyword query. Moreover, the query results include the product’s company and its other products related to the keyword. The system is implemented by the combination of two systems. The first one is the crawler extracting system that uses network robot to collect many data of exhibition sites in the same field and to integrate these data into a relational database. The other one is the query processing system that answers a keyword query with its ranking strategies. Except for the tuple tree size normalization, the document length normalization reconsidered, the document frequency normalization and the inter-document weight normalization that were used in the past research, we join the specific field occurrences weight and heterogeneous data weights to adjust ranking list. The more company and product descriptions related to the keywords, the closer they will be put in the top of the result. Compared with past practices, when specific field occurrences weight is with value 9 and heterogeneous data weights with value 2-7, our experiments had a MAP (Mean Average Precision) value 0.6471, which was 59.70% improvement.
顯示於類別:	[資訊管理學系暨研究所] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
index.html	0Kb	HTML	442	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....