English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62797/95867 (66%)
Visitors : 3734181      Online Users : 398
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/111494


    Title: 基於內容感知的興趣點分類方法之研究
    Other Titles: A study of content-aware classification of POI
    Authors: 謝仲興;Xie, Zhong-Xing
    Contributors: 淡江大學電機工程學系碩士班
    衛信文
    Keywords: 分類;機器學習;爬蟲;相似度;支持向量機;最近鄰居法;classification;Machine learning;web crawler;similarity;SVM;KNN
    Date: 2016
    Issue Date: 2017-08-24 23:54:10 (UTC+8)
    Abstract: 隨著資訊科技不斷的進步,網際網路的盛行,網路平台上有用來越多的訊息分享,在臉書和google等社群網站中,他們為使用者抓取經緯度和地點資料。使用者可以分享他們的地點和狀態給他們的朋友,而若使用者無法找到目前地點,他們也允許使用者自定義地點,但是使用者可能只會輸入地點名稱,如何將其訊息做正確的分類便成為一門嚴峻的學問。
    我們設計網路爬蟲來取得網頁資料,當收到一個地點資料時,我們利用搜尋名稱並使用Google Search API來取得網頁以蒐集資料,我們使用CKIP來分類所有網頁內容的詞並計算所有值的權重,權重值是由(Term Frequency, TF)和(Inverse Document Frequency, IDF)所計算。
    我們製作與類別相關的關鍵字表,並且使用IPeen網站以及淡水周邊已知分類類型的地點名稱來製作它,並且保證一定數量的詞跟分類類型相關以外,亦保留部分隱藏或者潛在附加屬性的詞。
    我們將地點名稱量化,並且取權重值、相似度以及相似度符合率作為三個特徵值,並且利用這三個特徵值結合kNN以及SVM來達到分類的效果。
    在最後我們將地點分為食、住與育樂等三類,並得到使用三個特徵值的結果為最好,且得到在k值較小的情況,kNN的分類效果會較佳的結論。在未來,我們希望能提升至食、衣、住、行、育與樂六類,並且期望能將地點以原本的單標籤延伸為多標籤,以此讓地點資料更為多樣性。
    Abstract:
    As information technology continues to progress and the prevalence of Internet, there are more and more data shared on websites. Many social webs, such as Facebook and Google Plus, provides geographical and location information for users, so the users can share their status and location to their friends. Those webs also allow users to upload information about places if the users are not able to find out needed information about current location.
    We first develop a web crawler to get webpages from the website. When the system receiving a location (or a place name) from an application or users, data about the location is collected from websites by searching the name or GEO of the location (or place) by using Google Search API. Then, system identify all terms of web content by using Chinese knowledge information processing (CKIP) and determine a weighted value for each of these terms. The weighted value of a term is calculated by its Term Frequency(TF) and Inverse Document Frequency.
    We use the content in“iPeen”website to define a keyword table, in which all terms are related to known category. In addition, the keyword table guarantee the specific number of relationship between terms and the type of category and hidden or latent additional attributes.
    We quantize names of locations and obtain the three features: weight value, similarity, and matching rate of similarity. We can implement the classification through above three features, kNN and SVM.
    Finally, we classify the locations into diet, accommodation, and recreation. When we implement classification by the three features, the result is closely precise. When the value of k is smaller, the effect of kNN is better than the effect of SVM. In the future, the categories will be expanded to the range about living, like Clothing, Accommodation, Transportation, Education, and Recreation.
    Appears in Collections:[Graduate Institute & Department of Electrical Engineering] Thesis

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML73View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback