個人信貸信用風險評分卡模型之探討

淡江大學機構典藏 > 商管學院 > 統計學系暨研究所 > 學位論文 > Item 987654321/52088

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/52088

Title:	個人信貸信用風險評分卡模型之探討
Other Titles:	A comparison of different credit risk scorecards for personal loans
Authors:	范維真;Fan, Wei-jan
Contributors:	淡江大學統計學系碩士班林志娟;Lin, Jyh-jiuan
Keywords:	信用風險評分卡模型;邏輯斯迴歸;支持向量機;核函數;credit scoring model;Logistic Regression;support vector machines;kernel function
Date:	2010
Issue Date:	2010-09-23 16:43:00 (UTC+8)
Abstract:	本研究主要是利用資料採礦中的支持向量機，來建構個人信貸信用風險評分卡模型。目前較常被使用來建立信用風險評分卡模型的方法為邏輯斯迴歸，雖然資料採礦在使用上很方便而且限制不多，但實務上卻較少被使用來建立信用風險評分卡模型，其主要原因為支持向量機模型所選取變數之經濟意涵常不易被解釋。為了探究支持向量機模型是否能提供另一個信用風險評分卡模型的較佳選擇，本研究除了先以該銀行所提供的所有變數為考量的情況下進行模式建構，另外再分別以證據權數(weight of evidence, WOE)/訊息值(information value,IV)、逐步選取法、刪除異常變數、相關係數等四種方法來選取變數，並將這五種篩選出的變數組合分別套用在邏輯斯迴歸及支持向量機模型中；另外在支持向量機模型中，本研究所採用分割資料的核函數(kernel functions)分別有線性(linear)、多項式(polynomial)、放射(radial basis function, RBF) 和S型(sigmoid)等四種，期望能從以上所搭配出的這二十五種模型中，找到較適合且能合理解釋的信用風險評分卡模型。至於本研究採用評估各模型優劣的準則有正確率(accuracy rate)、AUROC(area under the receiver operating characteristic)、吉尼(gini)係數、穩定度分析指標(population stability index, PSI)及交叉驗證(cross-validation)。本研究實證結果顯示，支持向量機模型中採用放射核函數的方法為最佳，其正確率為最高，而AUROC、吉尼係數雖然並非為最高，但其值跟最高的邏輯斯迴歸相差並不大，因此本研究建議先以此法為分類之優先選擇。 The main purpose of the research is to build a credit scoring model for personal loans with a data mining approach based on support vector machines (SVM). Though the logistic regression model is more commonly adopted by the credit card industry due to its easier explanation feature in credit scoring, SVM are more accurate in applicants’ classification problems pointed out in recent literature. Hence this research intends to apply SVM incorporating the features selected from 4 different criteria and suggests a better model for the credit scoring problems. The feature selection criteria includes the original variables provided by the credit card department in Taiwan financial holding company, the stepwise procedure through the logistic regression model, weight of evidence/ information value, abnormal deletion and correlation coefficients. In addition, 4 different kernel functions- linear, polynomial, radial basis function and sigmoid, are adopted in SVM to find the optimal hyperplane. To evaluate the performance of SVM, we compare them with naïve logistic regression along with the aforementioned 5 different feature combinations. Besides, population stability index and cross-validation are used to check the model fitness of the aforementioned 5 naïve logistic regression models and 20 SVM, respectively. The empirical results show that SVM with radial basis function performs more or less about the same as the naïve logistic regression models in term of area under the receiver operating characteristic, equivalently, and gini coefficient. However, it outperforms the rest 24 models in terms of accuracy rate. Therefore, SVM with radial basis function is recommended.
Appears in Collections:	[統計學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	390	View/Open

Loading...