English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 49200/83641 (59%)
造訪人次 : 7098244      線上人數 : 46
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/111494


    題名: 基於內容感知的興趣點分類方法之研究
    其他題名: A study of content-aware classification of POI
    作者: 謝仲興;Xie, Zhong-Xing
    貢獻者: 淡江大學電機工程學系碩士班
    衛信文
    關鍵詞: 分類;機器學習;爬蟲;相似度;支持向量機;最近鄰居法;classification;Machine learning;web crawler;similarity;SVM;KNN
    日期: 2016
    上傳時間: 2017-08-24 23:54:10 (UTC+8)
    摘要: 隨著資訊科技不斷的進步,網際網路的盛行,網路平台上有用來越多的訊息分享,在臉書和google等社群網站中,他們為使用者抓取經緯度和地點資料。使用者可以分享他們的地點和狀態給他們的朋友,而若使用者無法找到目前地點,他們也允許使用者自定義地點,但是使用者可能只會輸入地點名稱,如何將其訊息做正確的分類便成為一門嚴峻的學問。
    我們設計網路爬蟲來取得網頁資料,當收到一個地點資料時,我們利用搜尋名稱並使用Google Search API來取得網頁以蒐集資料,我們使用CKIP來分類所有網頁內容的詞並計算所有值的權重,權重值是由(Term Frequency, TF)和(Inverse Document Frequency, IDF)所計算。
    我們製作與類別相關的關鍵字表,並且使用IPeen網站以及淡水周邊已知分類類型的地點名稱來製作它,並且保證一定數量的詞跟分類類型相關以外,亦保留部分隱藏或者潛在附加屬性的詞。
    我們將地點名稱量化,並且取權重值、相似度以及相似度符合率作為三個特徵值,並且利用這三個特徵值結合kNN以及SVM來達到分類的效果。
    在最後我們將地點分為食、住與育樂等三類,並得到使用三個特徵值的結果為最好,且得到在k值較小的情況,kNN的分類效果會較佳的結論。在未來,我們希望能提升至食、衣、住、行、育與樂六類,並且期望能將地點以原本的單標籤延伸為多標籤,以此讓地點資料更為多樣性。
    Abstract:
    As information technology continues to progress and the prevalence of Internet, there are more and more data shared on websites. Many social webs, such as Facebook and Google Plus, provides geographical and location information for users, so the users can share their status and location to their friends. Those webs also allow users to upload information about places if the users are not able to find out needed information about current location.
    We first develop a web crawler to get webpages from the website. When the system receiving a location (or a place name) from an application or users, data about the location is collected from websites by searching the name or GEO of the location (or place) by using Google Search API. Then, system identify all terms of web content by using Chinese knowledge information processing (CKIP) and determine a weighted value for each of these terms. The weighted value of a term is calculated by its Term Frequency(TF) and Inverse Document Frequency.
    We use the content in“iPeen”website to define a keyword table, in which all terms are related to known category. In addition, the keyword table guarantee the specific number of relationship between terms and the type of category and hidden or latent additional attributes.
    We quantize names of locations and obtain the three features: weight value, similarity, and matching rate of similarity. We can implement the classification through above three features, kNN and SVM.
    Finally, we classify the locations into diet, accommodation, and recreation. When we implement classification by the three features, the result is closely precise. When the value of k is smaller, the effect of kNN is better than the effect of SVM. In the future, the categories will be expanded to the range about living, like Clothing, Accommodation, Transportation, Education, and Recreation.
    顯示於類別:[電機工程學系暨研究所] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML2檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋