English  |  正體中文  |  简体中文  |  Items with full text/Total items : 52052/87180 (60%)
Visitors : 8893069      Online Users : 482
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/108275

    Title: Challenges of Statistical and Machine Learning on Supervised Learning with Class-imbalanced Data.
    Other Titles: 監督式學習方法用於類別不平衡的資料下之統計與機器學習理論的挑戰
    Authors: Lin, Sung-Chiang;Wang, Charlotte;Chang, Ivan Yuan-Chin
    Keywords: 分類;類別不平衡;不平衡資料;操作者特徵曲線下的部分面積;監督式學習;classification;class-imbalance;imbalanced data;pAUC;supervised learning
    Date: 2014-03-31
    Issue Date: 2016-11-16 02:10:43 (UTC+8)
    Publisher: 中國統計學社
    Abstract: 監督式學習(supervised learning)是利用已知類別的訓練資料(training data)來建立分類器(classifier),並以此作為分類新資料的基準。類別不平衡的資料指的是在資料中隸屬於某一類別的資料特別多,導致資料類別的分布呈現偏斜(skew)的分布。在處理分類的問題時,若不考慮類別不平衡這樣的一個現象,將會使得分類器的結果表現不好;而傳統的分類方法,都是以分類結果的整體正確率(accuracy)或類似的標準為基準做最佳化而發展出來的,但是,這些方法卻無法正確辨識出稀少卻較為重要的類別。在這篇文章中,我們的重點是在於回顧因應類別不平衡資料分類問題而發展出來的監督式學習方法,討論類別不平衡資料出現的情境與分類上造成的困難與挑戰,並介紹幾類目前在統計與機器學習理論下大家的對策,接著討論適合用於此情境下評估分類器表現的指標,最後討論未來可能的發展方向與新衍生出的問題,如:多類別的分類問題(multi-class classifications)、多標籤(multi-label classifications)的分類問題及海量資料(big data)的分類問題等。
    Supervised learning tries to classify samples based on labeled training data. Class-imbalanced problems mean that the sample size of the some class dominates over others resulting in a skewed class distribution. Therefore, using traditional classification methods without considering class distributions and designed to optimize the performance of classifiers based on accuracy or other similar criteria are difficult to figure out the rare but important cases successfully. In this paper, research developments on classification problems under class-imbalanced circumstances are reviewed. We introduced strategies for dealing with class-imbalanced data in binary classification, and then discuss how to evaluate the performance of learning algorithms. Finally, the last section concludes this work and discusses challenge of class-imbalanced problems for multi-class classifications, multi-label classifications and big data classifications in the future.
    Relation: 中國統計學報 52(1),頁59-84
    Appears in Collections:[數學學系暨研究所] 期刊論文

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback