基於漸進式匹配與合併之深網查詢介面整合 : 以書籍領域為例

機構典藏 > College of Business and Management > Graduate Institute & Department of Information Management > Thesis > Item 987654321/101617

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/101617

Title:	基於漸進式匹配與合併之深網查詢介面整合 : 以書籍領域為例
Other Titles:	Integrating deep web query interfaces based on incremental matching and merging : using book domain as an example
Authors:	蕭子竣;Hsiao, Tzu-Chun
Contributors:	淡江大學資訊管理學系碩士班周清江
Keywords:	深層網路;綱要匹配;綱要合併;整合型介面;Deep Web;Schema Matching;Schema Merging;Integrated Search Interface
Date:	2014
Issue Date:	2015-05-01 16:12:15 (UTC+8)
Abstract:	相較於能被搜尋引擎索引的表層網路資料，深層網路(簡稱深網)所蘊含的龐大、高品質資料逐漸受到重視，能提供更多有用的資訊。但深網的資料藏於網站背後的資料庫中，使用者想要取得這些資料，必須經由網站開發者所提供的深網查詢介面，輸入正確的查詢詞並提交表單，才能得到結果。為了取得滿意的深網結果，使用者通常要在多個查詢介面交叉反覆查詢，有時還需要手動整合查詢結果。在此過程中，因為需要造訪設計理念不同之多個網站，依各查詢介面之輸入要求，反覆輸入調整過之查詢詞，導致查詢成本大幅提升。因此，整合各深網查詢介面成單一查詢介面有其必要性。本研究為建立整合型深層網路查詢介面及考量後續加入新查詢介面的擴充性，提出一個漸進式介面綱要匹配及合併架構。過去研究提出的匹配方式，大都採用先輸入所有綱要，再利用統計資訊進行匹配。我們的架構，能夠彈性的加入新的綱要進行匹配與合併。本研究的綱要匹配是基於標籤字串相似度及標籤字串同義字之雙層匹配方法；在產生整合型查詢介面部分，考慮到使用者使用上的便利性，本研究以儘量維持原介面之排序方式及易於輸入為產生整合型深網查詢介面的準則。我們從開放式目錄dmoz.org上蒐集書籍領域中9個深層網路查詢介面，作為我們進行整合的測試對象，其中包含如Amazon、eBay等熱門網站，並於此整合介面進行查詢，以測試其可行性與效能。 Data hidden inside the deep web are of much higher quality than those in the surface web. When internet users would like to obtain deep web data, they must fulfill query conditions in the HTML query interface and click the submit button. Unfortunately, deep web data from one site normally is not sufficient for users. They usually need to integrate information from different deep web sites. Thus, they have to enter duplicate queries in different query interfaces. It also takes lots of time to perform manual integration of those query results. Thus, an integrated deep web query interface is needed to alleviate the burdens. However, web developers design the query conditions with miscellaneous expressions. It is difficult to match attributes among several query interfaces. To easily extend the integrated query interface in the future, we design and develop an incremental matching and merging methodology for interface schema integration. Our matching method is based on the string similarity and synonyms for labels. After schema matching and merging, our system automatically constructs an integrated query interface to query several deep web sites at the same time. In our integrated search interface, we consider how to provide convenient user interfaces for the users. To test our methodology, we integrate nine search interfaces in the books domain from the open directory dmoz.org, including Amazon, eBay and other popular sites. We also conduct query experiments using our integrated query interface for checking feasibility and performance of the methodology.
Appears in Collections:	[Graduate Institute & Department of Information Management] Thesis

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	301	View/Open

Loading...