基於不同主題的中文情感分析技術比較

淡江大學機構典藏 > 商管學院 > 統計學系暨研究所 > 學位論文 > Item 987654321/114442

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114442

Title:	基於不同主題的中文情感分析技術比較
Other Titles:	Comparisons of sentiment analysis techniques based on different topics
Authors:	吳登揚;Wu, Dang-Yang
Contributors:	淡江大學統計學系碩士班陳景祥
Keywords:	PMI;Sentiment analysis;SVM;text mining;文字探勘;情感分析;點互信息
Date:	2017
Issue Date:	2018-08-03 14:52:44 (UTC+8)
Abstract:	在現今社會中，越來越多人會透過網路分享自己對某些事件的看法，分析評論情感傾向的技術，稱為情感分析(Sentiment Analysis)。因此，如何在網路資訊累積速度越來越快時，即時且精確地分析網路評論的情感傾向，是情感分析重要的研究方向。在網路評論中，部份詞彙具有其對應的情緒，可能為正向、也可能為負向，一般來說稱之為詞彙極性。在情感分析領域中，對於詞彙極性之標注採用人工的方式最為準確，但也最花費時間與成本。本論文會先提出一個利用基於語意PMI概念的非監督式方法，期望建立屬於該主題的情緒詞庫；進一步地我們結合監督式與半監督式的優點，提出一個半監督式方法，結合我們所提出來的非監督式方法與監督式方法中的支持向量機(Support Vector Machine,SVM)，期望能更近一步提升我們的分類準確率。　　對於不同的主題，我們採用了不同的情感分析技術去比較，最後我們實際應用在報紙的文章、隨機選取數百篇文章驗證本論文的方法。結果顯示我們的方法比單純非監督式技術預測精確度更高。 In the era of internet, more and more people share their opinions on the web and sentiment analysis is the technique used to analyze the emotions of these opinions. While network opinions are accumulated in increasing speed, improvement of accuracy and correctness of classification of emotional tendencies in sentiment analysis become important research directions. In the online opinions, terms or words may be positive or negative., generally referred to as lexical polarity. So far, manual tagging is the most accurate way to judge the semantic orientations in sentiment analysis, with the disadvantage of higher cost. In this paper, we presents an unsupervised sentiment analysis approach that uses a semantic-based PMI technique to build the emotional dictionaries for different topics. Our method is a combination of our proposed unsupervised methods and Support Vector Machine (SVM) classification algorithm. The goal is to improve classification accuracy and reduction of supervising costs in sentiment analysis. Real-world online reviews and randomly selected news data are used for evaluation and comparisons of different algorithms. The results show that our method is relatively effective with the consideration of balance between classification accuracy and supervising costs.
Appears in Collections:	[統計學系暨研究所] 學位論文

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	186	View/Open

Loading...