淡江大學機構典藏:Item 987654321/34100
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 64178/96951 (66%)
造访人次 : 9691240      在线人数 : 11748
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/34100


    题名: 以聚合法(AGNES)提升檢索效果之研究 : 以中文新聞為例
    其它题名: The research on improving the performance of information retrieval with the agglomerative nesting (AGNES) algorithm : using a Chinese news dataset
    作者: 宋永杰;Sung, Yung-chieh
    贡献者: 淡江大學資訊管理學系碩士班
    魏世杰;Wei, Shih-chieh
    关键词: 資訊檢索;聚合法;分群;向量空間模式;Information Retrieval;Agglomerative Nesting Algorithm;Clustering;Vector Space Model
    日期: 2007
    上传时间: 2010-01-11 04:55:14 (UTC+8)
    摘要: 傳統向量模式檢索系統回傳的相關資料往往過於雜亂缺乏系統,使用者必須花費心思逐步過濾,才能取得真正符合需求的資訊。本研究以聚合法所建構出的樹狀結構為基礎,由下而上動態群聚向量模式檢索系統所回傳的結果,形成多個群集,群集間依本研究之耦合力與內聚力的平均值做排名,群集內則依文章與查詢的相似度做排名,經調整排名後提升其精確率,並以群集的方式提供使用者瀏覽。
    本研究採用中文文件集,經斷詞、特徵詞選取、建立文件向量、分群、檢索、群聚檢索結果與調整排名等處理。實驗結果顯示,在整體檢索表現中本系統可提升傳統向量模式檢索系統約20.9%~24.0%的精確率,經Wilcoxon Signed Ranks Test檢定,在1個關鍵詞與2個關鍵詞查詢下,本系統檢索表現優於傳統向量模式檢索系統。
    Usually the document ranking returned by the traditional vector space model of an information retrieval system is unorganized. It is often found that related documents do not have adjacent ranks. In order not to miss the needed information, the user still has to read several unrelated documents before finding another related document. In this research, we cluster the documents from the traditional vector space model based on the binary tree hierarchy constructed by the AGglomerative NESting (AGNES) algorithm. The clusters are ranked by the average of the coupling and the cohesion measures proposed in this thesis, and the documents in the cluster are ranked by the similarity between the query and the document. We try to improve the precision by such ranking adjustment.
    We used the Chinese news dataset and went through the word segmentation, vector representation, AGNES clustering, query based document retrieval and the final ranking adjustments for evaluation. As result, our system can improve the precision by 20.9% to 24.0% compared to the traditional vector space model. We also tested the result by the Wilcoxon Signed Ranks Test. It shows that our system is significantly better than the traditional vector space model for queries of one or two keywords.
    显示于类别:[資訊管理學系暨研究所] 學位論文

    文件中的档案:

    档案 大小格式浏览次数
    0KbUnknown427检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈