網路論壇之輿情探勘

機構典藏 > College of Engineering > Graduate Institute & Department of Computer Science and Information Engineering > Thesis > Item 987654321/111336

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/111336

Title:	網路論壇之輿情探勘
Other Titles:	Opinion mining on internet forums
Authors:	許維勻;Hsu, Wei-Yun
Contributors:	淡江大學資訊工程學系碩士班許輝煌;Hsu, Hui-Huang
Keywords:	輿情探勘;支援向量機;自然語言處理;文字探勘;網路爬蟲;Opinion Mining;Support Vector Machine;Natural Language Processing;text mining;web crawler
Date:	2016
Issue Date:	2017-08-24 23:50:18 (UTC+8)
Abstract:	現在透過網路進行交流已經是相當普遍的事情，不論是透過何種網路平台，每個人在網路上的發表的言論都會留下記錄，如果企業能夠善用這些記錄，從這些留言評論裡面找出企業自身的產品或服務的負面評價，就能夠進一步的改善問題或是了解市場需求。本論文就是以企業角度從網路論壇上透過爬蟲程式取得與企業相關的討論區之留言評論作為訓練資料以及結合網路爬蟲與偵測新評論的功能獲取待預測之資料，再利用本論文所提出的文字資料預處理和特徵向量轉換方法對訓練資料進行處理並分成負面類別以及非負面類別，處理完畢後就可以用來訓練支援向量機，訓練好的支援向量機就可以做為預測情緒類別的分類器對新評論偵測爬蟲程式所抓到的資料進行情緒分類，當然在進行情緒分類之前也會進行預處理及特徵向量轉換的步驟。經過情緒分類器的判斷，一旦發現新評論含有負面情緒，就會回報該評論。而本論文的關鍵也就是在情緒分類器是否能有效的分辨出負面情緒的評論，透過本論文提出的特徵向量轉換方法會將文字資料轉換成二維的特徵，這兩個特徵的特徵值都是以負面情緒為主的計算方式。第一個特徵的特徵值是藉由事先建立的語料庫計算單詞出現在負面與非負面句子的次數再配合本論文提出的方程式進行運算；第二個特徵的特徵值的取得方式是根據正負面情緒詞彙表計算句子裡正負面情緒詞彙所佔的比重。與其他注重正負面平衡的文獻相比，本論文提出的方法較能夠識別出負面情緒的評論並符合本研究所訂定的目標。根據實驗的結果顯示，在負面類別資料量約非負面類別的一半時，所產生的負面類別分類效能並不好，所以本論文透過SMOTE（Synthetic Minority Over-sampling Technique）解決資料不平衡的問題，再利用本論文提出的特徵向量轉換方法進行情緒分類，而在經過SMOTE的處理後，Recall提昇了約8%至9%，以至於F1-score也提昇了一些，整體Accuracy也有不錯的分類效果。所以從實驗成果可以知道本研究的資料透過SMOTE的處理後雖然非負面類別評論的分類效能降低一點，但卻能更有效的找出網路論壇上的負面評論內容。 It is common that people can exchange their opinions on the Internet. If an enterprise can acquire the opinions about its products or services, it may find some negative evaluations from the opinions and then try to improve or learn the market demand. In this paper, we proposed a detection of negative comments through Web crawler to collect data for training and prediction from Internet forums. In the training process, the data are labeled as negative and non-negative class after pre-processing. Next, we use the proposed method to transform training data into 2-dimention vectors for the input to Support Vector Machine(SVM) to train the classifier. After training process, we use the classifier to classify the prediction data which are pre-processed and transformed into 2-dimention vectors. If the result of classification is negative, the data which contain the author, content, date, and title will be reported. The key point of this paper is how to effectively recognize the negative comments. The calculation of the proposed method of vector transformation is negative-oriented. Most of the researches for sentiment classification focus on the balance of the positive and negative classes. Compared with those researches, the negative-oriented calculation can be more effective for identifying negative sentiment classification. According the experimental results, imbalanced data can not be used so we use SMOTE (Synthetic Minority Over-sampling Technique) to fix problems about the imbalanced data. The experimental results show that Recall raises 8% to 9% also the negative class’s F1-score and Accuracy are good after the process of SMOTE.
Appears in Collections:	[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	242	View/Open

Loading...