Using the keyword in context segmentation method for a Chinese website

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 會議論文 > Item 987654321/37602

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/37602

題名:	Using the keyword in context segmentation method for a Chinese website
作者:	Chen, Jui-Fa;Lin, Wei-Chuan;Jian, Chih-Yu;Ho, Tzong-Yuh;Dai, Shi-Yao
貢獻者:	淡江大學資訊工程學系
關鍵詞:	中文斷詞;動態網站;加權計算系統;上下文比較系統;Chinese Segmentation;Dynamic Website;Weight Calculation System;Context Comparing System
日期:	2003-04-24
上傳時間:	2010-01-11 13:46:11 (UTC+8)
摘要:	網路科技的持續發展，使得靜態的網站漸漸不能符合使用的需求；一個主動及互動的機制網站成為目前的基本要求；為提供主動及互動的機制，各式各樣的網路代理人(Agent)技術便因應而生。中文不同於英文，他並沒有以空白分開每個詞彙，這使得中文斷詞比英文斷詞來得困難很多。所以，為了幫助代理人處理使用者輸入的語句和分析語意，一個快速而正確的斷詞方法是必要的。本論文由前後文的角度去探討詞彙切割與詞類定義的方法，由於本斷詞系統的主要判斷方式採用前後文的方法關係，比起光用語料庫及構詞原則的斷詞方式在語意上具有更高的正確性，比較不會出現語意上的錯誤，而為了使系統在斷詞上不會因為未知詞造成太多無謂的錯誤，我們根據各網站需要使用對應的專業語料庫。對於較簡單、隨意的口語對話本方法亦有相當高之正確率，本論文中將介紹這個斷詞方法，並提出我們的斷詞實驗結果，可證明本論文所提的方法有較高的正確性。 With the continuous developing of network technology, the static website is not enough to response the user request. A dynamic website becomes the basic requirement of today's network because of the interactive mechanism. For providing the dynamic and interactive mechanism, a website should apply some kinds of agent technology. Due to the Chinese language does not use the space to segment the lexical entry, the segmentation of Chinese language for an interactive website is more difficult than English. This paper proposes a segmentation method and part of speech (POS) definition from the keyword in the context of a Chinese sentence. This method preprocesses the input sentences to analyze what the user wants. The results can be used as the basis to response appropriate message to the user. Because the proposed method is to use the keyword relationship in the grammar of the input context, it has a higher correctness in meanings than that of which only uses the corpora and the word- building principle. To avoid in making too many mistakes in segmentation for the unknown lexical entries, the proposed method uses the corresponding professional corpora according to each kind of website. The implementation results can offer the evidence that the proposed method can provide higher correctness for succinct and colloquial language conversation.
關聯:	第11屆國際電腦輔助教學研討會ICCAI2003暨第16屆中華民國電腦輔助教學研討會論文集=Proceedings 2003 Internaitonal Conference on Computer-Assisted Instruction，6頁
顯示於類別:	[資訊工程學系暨研究所] 會議論文

文件中的檔案:

沒有與此文件相關的檔案.

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....