淡江大學機構典藏:Item 987654321/76939
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 62797/95867 (66%)
造訪人次 : 3750432      線上人數 : 465
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/76939


    題名: 不等機率抽樣下多零值資料的擬概度信賴區間
    其他題名: Pseudo Likelihood Confidence Intervals for the Mean of a Population Containing Many Zero Values under Varying Probability Sampling
    作者: 陳順益
    貢獻者: 淡江大學數學學系
    關鍵詞: Accounting;inclusion probability;mixture models;pseudo likelihood;stratified sampling;survey sampling
    日期: 2011
    上傳時間: 2012-05-22 22:15:53 (UTC+8)
    摘要: Pseudo Likelihood Confidence Intervals for the Mean of a Population Containing Many Zero Values under Varying Probability Sampling The many-zero-observation problem in survey sampling under complex probability sampling is considered. In this project the problem is addressed in the context of confidence interval estimation for the population mean. The traditional approach based on the central limit theorem (CLT) performs poorly due to the sever skewness of the population at zero and the maximum likelihood (ML) method does not work well either in applications of survey sampling because the sampling designs can often be so complex in practice that it is difficult to pin down the likelihood function and express it explicitly. The nonparametric approach suggested by Chen, Chen and Rao (2003) and Chen and Qin (2003) is completely free from the risk of model misspecification. When a suitable parametric model is available, parametric analysis has potential advantages in efficiency and simplicity. In this spirit, Chen, Chen and Chen (2010) consider the mixture model proposed by Kvanli, Shen and Deng (1998) and propose a pseudo likelihood method to attack the problem. The pseudo likelihood function is unbiased when the weights are chosen to be the reciprocal of the inclusion probabilities. Simulation results show that the pseudo likelihood method improves the coverage probability substantially when the inclusion probabilities are related to the unit values and it outperforms the CLT and ML methods on the coverage probability, the balance of non-coverage rates on the lower and upper sides, and the interval length. The pseudo likelihood method is intended to deal with complex survey sampling problems. It is noted from the simulation results of Chen, Chen and Chen (2010) that the pseudo likelihood method is quite robust against mis-specification of superpopulation models (In fact, their discussion is only for the normal and gamma distributions). However, it is unclear here. We will investigate why the pseudo likelihood method is robust against mis-specification of superpopulation models. Furthermore, in this project, several other distributions that have been widely used in mixture models will also be discussed, and their applications derived. We will include the exponential, Weibull, and generalized gamma distributions. Regarding the choice of weights in the pseudo likelihood method, since the auxiliary information (Xj) in complex surveys is used and the correlation coefficient of (Yj ,Xj) is known, we will consider different weighting systems that can utilize the auxiliary information such as, for the unit i in the stratum Pj , w−1 i = x(i) Pl2Pj x(l) pj . Another problem is that, is it reasonable that the above inclusion probabilities are proportional to w−1 i for unit i? It is possible to do some modifications in future study. It is generally easily said than done to have an unequal probability sampling plan. We have to provide more details regarding the weights and the inclusion probability. Finally, in this research project, we will also apply the pseudo likelihood approach to the data set contains many zero values by utilizing several sampling schemes, such as probability-proportional-to-size sampling and biased sampling. We will develop the related theories and perform extensive simulations. We will also look into possibilities of employing the new method to different sampling designs, e.g., simple random sampling and stratified random sampling.
    本計畫將研究複雜機率抽樣下含有大量零值的調查資料, 以建構其母體平均數之信賴區間。 多零值資料相當常見, 舉例來說, 像是到診所看病, 大部分人是繳掛號費150 元, 只有少數會發 生看診項目或藥物給付而超過掛號費, 病患需再依各自情況付費, 則此種資料顯示大多數病患都 為自付150 元掛號費, 只有少數付超過150 元。另外, 在品質管制中檢查不良樣品, 不良的個數 通常只有少數幾件, 若將多數良品資料記為0 , 不良品資料即為非零值時, 此筆資料即為大量含 零值的資料。通常抽樣調查資料會使用傳統中央極限近似法, 用已知樣本估計未知母體平均數的 信賴區間。但是當所得樣本含帶有大量零值訊息時, 用傳統方法估計的結果會變得不可靠。但若 利用Kvanli, Shen 和Deng (1998) 提出的最大概度比方法(maximum likelihood ratio) 來 處理, 則會因複雜機率抽樣而無法得到準確的概度函數。為了解決這個問題, 一個自然的補救方 法就是採用無母數方法。Chen, Chen, 和Rao(2003) 及Chen 和Qin (2003) 發展出經驗概度 比法, 建立多零母體平均數的信賴區間。在多零值資料(Y ) 中, 若每個資料值皆可以找到和變數 X 有相關程度的輔助訊息, Chen 和Sitter (1999) 將經驗概度方法, 結合輔助訊息, 推廣成擬經 驗概度法(pseudo empirical likelihood) 並應用到複雜抽樣設計(complex survey sampling) 上, 這種抽取樣本的方法會讓帶高訊息的資料越有較高機率被抽取到。例如查稅, 高所得的納稅 人會比低所得的納稅人較容易被抽到成為查稅的樣本。且以前被查稅過且有犯錯的人很有可能 會再被抽取到。在此例子中, 高所得和之前被查過犯錯的輔助訊息會使之有較高機率被抽到。其 所建立出來的擬經驗概度信賴區間會比不用輔助訊息的信賴區間更加精確。 但如果有合適的參數分布模型可用, 因其簡單有效, 所以Chen, Chen 和Chen (2010) 提 出擬概度法(pseudo likelihood method), 結合輔助訊息, 利用不同機率抽取樣本的方法來解 決此類問題, 所建立出來的信賴區間會比傳統方法與最大概度估計方法所建立出來的更加準確 可靠, 且較不受非零值比例的大小影響。擬概度法可應用到複雜抽樣設計上, Chen, Chen 和Chen (2010) 模擬結果顯示此方法, 對錯誤指定超大母體分布時有穩健性, 但不清楚為何有 如此穩健性。本計畫將探討其對錯及分析原因, 並研究應用於其他幾種常用的參數分布模型。另 外不同機率抽取樣本的方法需要選取相對的權數, 本計畫將研究選取其他相對的權數, 尤其是 與輔助訊息相關的權數。本研究計畫針對少量非零值資料值且全部樣本個數不多的數據, 將探 討利用輔助訊息, 研究其他各種抽樣方法的可行性, 如偏差抽樣(biased sampling), PPS 抽 樣(probability-proportional-to-size sampling) 等不等機率抽樣方法。同時並研究將新方法 應用於各種不同的抽樣設計, 如簡單隨機抽樣(simple random sampling), 及分層抽樣設計 (stratified sampling)。
    顯示於類別:[數學學系暨研究所] 研究報告

    文件中的檔案:

    沒有與此文件相關的檔案.

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋