English  |  正體中文  |  简体中文  |  Items with full text/Total items : 51771/86989 (60%)
Visitors : 8370337      Online Users : 107
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/111494

    Title: 基於內容感知的興趣點分類方法之研究
    Other Titles: A study of content-aware classification of POI
    Authors: 謝仲興;Xie, Zhong-Xing
    Contributors: 淡江大學電機工程學系碩士班
    Keywords: 分類;機器學習;爬蟲;相似度;支持向量機;最近鄰居法;classification;Machine learning;web crawler;similarity;SVM;KNN
    Date: 2016
    Issue Date: 2017-08-24 23:54:10 (UTC+8)
    Abstract: 隨著資訊科技不斷的進步,網際網路的盛行,網路平台上有用來越多的訊息分享,在臉書和google等社群網站中,他們為使用者抓取經緯度和地點資料。使用者可以分享他們的地點和狀態給他們的朋友,而若使用者無法找到目前地點,他們也允許使用者自定義地點,但是使用者可能只會輸入地點名稱,如何將其訊息做正確的分類便成為一門嚴峻的學問。
    我們設計網路爬蟲來取得網頁資料,當收到一個地點資料時,我們利用搜尋名稱並使用Google Search API來取得網頁以蒐集資料,我們使用CKIP來分類所有網頁內容的詞並計算所有值的權重,權重值是由(Term Frequency, TF)和(Inverse Document Frequency, IDF)所計算。
    As information technology continues to progress and the prevalence of Internet, there are more and more data shared on websites. Many social webs, such as Facebook and Google Plus, provides geographical and location information for users, so the users can share their status and location to their friends. Those webs also allow users to upload information about places if the users are not able to find out needed information about current location.
    We first develop a web crawler to get webpages from the website. When the system receiving a location (or a place name) from an application or users, data about the location is collected from websites by searching the name or GEO of the location (or place) by using Google Search API. Then, system identify all terms of web content by using Chinese knowledge information processing (CKIP) and determine a weighted value for each of these terms. The weighted value of a term is calculated by its Term Frequency(TF) and Inverse Document Frequency.
    We use the content in“iPeen”website to define a keyword table, in which all terms are related to known category. In addition, the keyword table guarantee the specific number of relationship between terms and the type of category and hidden or latent additional attributes.
    We quantize names of locations and obtain the three features: weight value, similarity, and matching rate of similarity. We can implement the classification through above three features, kNN and SVM.
    Finally, we classify the locations into diet, accommodation, and recreation. When we implement classification by the three features, the result is closely precise. When the value of k is smaller, the effect of kNN is better than the effect of SVM. In the future, the categories will be expanded to the range about living, like Clothing, Accommodation, Transportation, Education, and Recreation.
    Appears in Collections:[電機工程學系暨研究所] 學位論文

    Files in This Item:

    File Description SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback