興趣點之多標籤分類方法研究

淡江大學機構典藏 > 工學院 > 電機工程學系暨研究所 > 學位論文 > Item 987654321/114801

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114801

題名:	興趣點之多標籤分類方法研究
其他題名:	A study of multi-label classification of POI
作者:	邱捷琦;Chiu, Chieh-Chi
貢獻者:	淡江大學電機工程學系碩士班李維聰;Lee, Wei-Tsong
關鍵詞:	分類;機器學習;爬蟲;相似度;支持向量機;最近鄰居法;classification;similarity;Machine learning;web crawler;SVM;KNN
日期:	2017
上傳時間:	2018-08-03 15:04:29 (UTC+8)
摘要:	在這個資訊爆炸的時代，網際網路的盛行，在網路的世界裡上有成千上萬的資訊，如何在眾多的資訊中找到自己想要的資料，因此有了推薦系統。好的推薦系統勢必要有好的分類，因此便有了分類系統這門學問。一個好的分類系統可以在短時間內快速地找到適合自己的資料，不好的分類不僅耗時還有可能找到不是自己想要的資料。傳統的單標籤分類只能單方面的知道這個資訊是不是屬於這一類，在搜尋前如果不知道所要查的資訊是屬於哪一類時，便會提高尋找資料的時間，這樣對使用者來說相當耗時。但是多標籤分類，可以提高資訊的相關性，讓未知的訊息多了幾個可以找到它們的線索，可以讓使用者在搜尋資料時可以更快地找到並符合自己想要的資料。因此，本論文主要的研究，即針對使用者有興趣的地點進行以食、衣、住、行、育樂為標籤的多標籤分類機制研究。在本論文中，首先利用網路爬蟲取得網頁的資料。當收到地點的資料時，利用搜尋名稱並使用Google Custom Search API取得網頁來蒐集資料。之後藉由斷詞系統(Ckip)來分類蒐集到的網頁內容並計算所有值的權重，透過權重值的計算來得知網頁字詞與類別的相關性。接著，本論文利用搜尋到的網頁內容來製作關鍵詞表，分為食、衣、住、行、育樂五種。再來，我們將地點名稱量化並取權重值、相似度及相似度符合率來作為三個特徵值。最後利用這三個特徵值加上kNN及SVM來取得單標籤分類的結果，我們將單標籤分類後的結果，再進行一次分類來達到本論文所要做的多標籤分類。本論文的實驗是將未知的地點訊息做多標籤分類，讓使用者在未知的地方。使用社群網站輸入地點名稱，進而找到該地點的資訊。從實驗的結果我們可以發現，當訊息越多時，分類的效果越好；反之，當訊息越少，則分類的效果較差。k值越大則分類的效果較佳為結論。在未來，我們希望能將分類的範圍擴大，目的是資訊越多，能分類的項目就越多，利用範圍擴大來提升資料的多樣性以及準確性。 With the rise of internet technology and development of mobile application, more and more data are around us. However, it’s not always easy to find the needed information that people want. Therefore, a good recommendation system is required for giving useful or interesting information. To provide useful information for user, a good classification of data is needed for recommendation system. Good classification of data allows system to process users’ requests easily and efficiently, on the other hand, poor classification of data makes recommendation useless and time-consumed. Traditional single-label classification can only be unilateral to know whether this information belongs to a certain category. Before searching information, if you do not know the category of the information, it will increase the time to find information, so the search is quite time consuming. In contrast, the multi-label classification can obtain the relevance of the information, so that it can find a few more clues for the unknown data and allow users to obtain the needed information faster. Therefore, the main research of this paper is to study the multi-label classification mechanism, which tries to classify data into following categories: food, clothing, accommodation, transportation and education. In this paper, we first use the web crawler to obtain the information of the webpage. When we receive the information of the place, we use the search name and use the Google Custom Search API to obtain the webpage to collect the data. Then by the word system (Ckip) to classify the collected web content and calculate the weight of all values. Through the weight of the calculation, the relevance of the page terms and categories can be obtained. Second, we use the web content to construct the keyword table, which includes words related food, clothing, accommodation, transportation and education categories. Then, we use three features with kNN and SVM to get the results of single-label classification. In order to improve the diversity of information, the results of single-label, are sorted after the unknown information is classified into food, clothing, accommodation, transportation and education. After that, the classifiers are applied to the results to obtain the results of Multi-label classification. The experiment in this paper is to sort the unknown location information into a multi-label category, allowing the user to use the community site to enter the place name in an unknown place to find the information for that location. From the results of the experiment we can find that the more the data we collected, the better the results of classification; the other hand, when the obtained data is less, the results of classification are poor. Moreover, the simulation results also show that when k value is greater the results of classification are better. In the future, we want to extend the scope of the classification to have more data and so that expand the diversity and accuracy of classification.
顯示於類別:	[電機工程學系暨研究所] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	177	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....