淡江大學機構典藏:Item 987654321/37602
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 62819/95882 (66%)
造访人次 : 3999533      在线人数 : 511
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/37602


    题名: Using the keyword in context segmentation method for a Chinese website
    作者: Chen, Jui-Fa;Lin, Wei-Chuan;Jian, Chih-Yu;Ho, Tzong-Yuh;Dai, Shi-Yao
    贡献者: 淡江大學資訊工程學系
    关键词: 中文斷詞;動態網站;加權計算系統;上下文比較系統;Chinese Segmentation;Dynamic Website;Weight Calculation System;Context Comparing System
    日期: 2003-04-24
    上传时间: 2010-01-11 13:46:11 (UTC+8)
    摘要: 網路科技的持續發展,使得靜態的網站漸漸不能符合使用的需求;一個主動及互動的機制網站成為目前的基本要求;為提供主動及互動的機制,各式各樣的網路代理人(Agent)技術便因應而生。中文不同於英文,他並沒有以空白分開每個詞彙,這使得中文斷詞比英文斷詞來得困難很多。所以,為了幫助代理人處理使用者輸入的語句和分析語意,一個快速而正確的斷詞方法是必要的。本論文由前後文的角度去探討詞彙切割與詞類定義的方法,由於本斷詞系統的主要判斷方式採用前後文的方法關係,比起光用語料庫及構詞原則的斷詞方式在語意上具有更高的正確性,比較不會出現語意上的錯誤,而為了使系統在斷詞上不會因為未知詞造成太多無謂的錯誤,我們根據各網站需要使用對應的專業語料庫。對於較簡單、隨意的口語對話本方法亦有相當高之正確率,本論文中將介紹這個斷詞方法,並提出我們的斷詞實驗結果,可證明本論文所提的方法有較高的正確性。
    With the continuous developing of network technology, the static website is not enough to response the user request. A dynamic website becomes the basic requirement of today's network because of the interactive mechanism. For providing the dynamic and interactive mechanism, a website should apply some kinds of agent technology. Due to the Chinese language does not use the space to segment the lexical entry, the segmentation of Chinese language for an interactive website is more difficult than English. This paper proposes a segmentation method and part of speech (POS) definition from the keyword in the context of a Chinese sentence. This method preprocesses the input sentences to analyze what the user wants. The results can be used as the basis to response appropriate message to the user. Because the proposed method is to use the keyword relationship in the grammar of the input context, it has a higher correctness in meanings than that of which only uses the corpora and the word- building principle. To avoid in making too many mistakes in segmentation for the unknown lexical entries, the proposed method uses the corresponding professional corpora according to each kind of website. The implementation results can offer the evidence that the proposed method can provide higher correctness for succinct and colloquial language conversation.
    關聯: 第11屆國際電腦輔助教學研討會ICCAI2003暨第16屆中華民國電腦輔助教學研討會論文集=Proceedings 2003 Internaitonal Conference on Computer-Assisted Instruction,6頁
    显示于类别:[資訊工程學系暨研究所] 會議論文

    文件中的档案:

    没有与此文件相关的档案.

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈