English  |  正體中文  |  简体中文  |  Items with full text/Total items : 52047/87178 (60%)
Visitors : 8677568      Online Users : 87
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/109869

    Title: Active Learning with Sequential Sampling and Dimension Reduction for Analyzing Large-Scale Datasets
    Authors: Wang, Charlotte;Chang, Yuan-chin Ivan
    Keywords: active learning;clustering;D-optimal design;sequential sampling
    Date: 2016-12-09
    Issue Date: 2017-03-10 02:19:47 (UTC+8)
    Abstract: Active learning is a kind of semi-supervised learning methods in which learning algorithm is able to interactively query some information to get new subjects’ labels/classes. When labeling subjects is quite expensive, active learning is a possible solution to reduce cost because only the selected subjects need to be exanimated and labeled, such as in money laundering detection and disease screening. For analyzing large-scale datasets, the large sample size and high dimension become a challenge for both analysis and computation. In this talk, we will present an active learning algorithm for analyzing large-scale datasets. The proposed method is based on a logistic regression model with a modified iterative algorithm for estimating parameters in order to be more computational efficiency, without sacrificing too much in statistical efficiency. In addition, the methods of shrinkage estimation and subject clustering are considered for selecting effective variables and reducing subject-searching time when analyzing large-scale datasets. For the perspectives of uncertainty sampling and precision of parameter estimates, we search the representatives of subject clusters and select useful samples based on the concept of sequential D-optimal design. The real data applications and simulations will be used to evaluate the performance of the proposed active learning algorithm.
    Relation: no proceeding
    Appears in Collections:[數學學系暨研究所] 會議論文

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback