English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62830/95882 (66%)
Visitors : 4127696      Online Users : 333
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/114801


    Title: 興趣點之多標籤分類方法研究
    Other Titles: A study of multi-label classification of POI
    Authors: 邱捷琦;Chiu, Chieh-Chi
    Contributors: 淡江大學電機工程學系碩士班
    李維聰;Lee, Wei-Tsong
    Keywords: 分類;機器學習;爬蟲;相似度;支持向量機;最近鄰居法;classification;similarity;Machine learning;web crawler;SVM;KNN
    Date: 2017
    Issue Date: 2018-08-03 15:04:29 (UTC+8)
    Abstract: 在這個資訊爆炸的時代,網際網路的盛行,在網路的世界裡上有成千上萬的資訊,如何在眾多的資訊中找到自己想要的資料,因此有了推薦系統。好的推薦系統勢必要有好的分類, 因此便有了分類系統這門學問。一個好的分類系統可以在短時間內快速地找到適合自己的資料,不好的分類不僅耗時還有可能找到不是自己想要的資料。
    傳統的單標籤分類只能單方面的知道這個資訊是不是屬於這一類,在搜尋前如果不知道所要查的資訊是屬於哪一類時,便會提高尋找資料的時間,這樣對使用者來說相當耗時。但是多標籤分類,可以提高資訊的相關性,讓未知的訊息多了幾個可以找到它們的線索,可以讓使用者在搜尋資料時可以更快地找到並符合自己想要的資料。
    因此,本論文主要的研究,即針對使用者有興趣的地點進行以食、衣、住、行、育樂為標籤的多標籤分類機制研究。在本論文中,首先利用網路爬蟲取得網頁的資料。當收到地點的資料時,利用搜尋名稱並使用Google Custom Search API取得網頁來蒐集資料。之後藉由斷詞系統(Ckip)來分類蒐集到的網頁內容並計算所有值的權重,透過權重值的計算來得知網頁字詞與類別的相關性。接著,本論文利用搜尋到的網頁內容來製作關鍵詞表,分為食、衣、住、行、育樂五種。再來,我們將地點名稱量化並取權重值、相似度及相似度符合率來作為三個特徵值。最後利用這三個特徵值加上kNN及SVM來取得單標籤分類的結果,我們將單標籤分類後的結果,再進行一次分類來達到本論文所要做的多標籤分類。
    本論文的實驗是將未知的地點訊息做多標籤分類,讓使用者在未知的地方。使用社群網站輸入地點名稱,進而找到該地點的資訊。從實驗的結果我們可以發現,當訊息越多時,分類的效果越好;反之,當訊息越少,則分類的效果較差。k值越大則分類的效果較佳為結論。在未來,我們希望能將分類的範圍擴大,目的是資訊越多,能分類的項目就越多,利用範圍擴大來提升資料的多樣性以及準確性。
    With the rise of internet technology and development of mobile application, more and more data are around us. However, it’s not always easy to find the needed information that people want. Therefore, a good recommendation system is required for giving useful or interesting information. To provide useful information for user, a good classification of data is needed for recommendation system. Good classification of data allows system to process users’ requests easily and efficiently, on the other hand, poor classification of data makes recommendation useless and time-consumed.
    Traditional single-label classification can only be unilateral to know whether this information belongs to a certain category. Before searching information, if you do not know the category of the information, it will increase the time to find information, so the search is quite time consuming. In contrast, the multi-label classification can obtain the relevance of the information, so that it can find a few more clues for the unknown data and allow users to obtain the needed information faster.
    Therefore, the main research of this paper is to study the multi-label classification mechanism, which tries to classify data into following categories: food, clothing, accommodation, transportation and education. In this paper, we first use the web crawler to obtain the information of the webpage. When we receive the information of the place, we use the search name and use the Google Custom Search API to obtain the webpage to collect the data. Then by the word system (Ckip) to classify the collected web content and calculate the weight of all values. Through the weight of the calculation, the relevance of the page terms and categories can be obtained.



    Second, we use the web content to construct the keyword table, which includes words related food, clothing, accommodation, transportation and education categories. Then, we use three features with kNN and SVM to get the results of single-label classification. In order to improve the diversity of information, the results of single-label, are sorted after the unknown information is classified into food, clothing, accommodation, transportation and education. After that, the classifiers are applied to the results to obtain the results of Multi-label classification.
    The experiment in this paper is to sort the unknown location information into a multi-label category, allowing the user to use the community site to enter the place name in an unknown place to find the information for that location. From the results of the experiment we can find that the more the data we collected, the better the results of classification; the other hand, when the obtained data is less, the results of classification are poor. Moreover, the simulation results also show that when k value is greater the results of classification are better. In the future, we want to extend the scope of the classification to have more data and so that expand the diversity and accuracy of classification.
    Appears in Collections:[Graduate Institute & Department of Electrical Engineering] Thesis

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML152View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback