In recent years, following the gene’s exploration and the development in medical field, analyzing the high dimensional of factors in data is a major challenge to modern statistics. In the process of data analyzing, we would identify the influential multifactor interactions which are significant to the response variable in a multifactor data, so the choice of model selection method is very important. Model selection method has been developed through different approaches so far, and in this paper the authors studied the two model selection methods which are for the response variable to be continuous and the independent variables to be categorical: MB-MDR and SPV.
In comparing the two model selection methods, we use computer programs to simulate two sets of data for the quantitative trait loci (QTL). One set of data contained a large number of factors and the other contained a small number of factors. After analyzed the data with both MB-MDR and SPV methods, we use the average accuracy and average error rate to evaluate the results. The results showed that the SPV performed well in the average accuracy rate when identifying the main effects in the large sample, but did worse when deal with small samples. Furthermore, the results for mixed average accuracy (main effects with interactions) is worse than ideal under all samples settings, however, the average error rates are very low under all situations. The average accuracy rate of interactions based on MB-MDR in all samples setting, are all performed well, but it has higher average error rate under all situations.
According to the simulation results, the selection of MB-MDR or SPV is based on the requirement of the user. For example, the user might be interested in exploring the main effects or interactions effects in a model, or requiring a high accuracy or low error rate, the users can make the choices based on their needs.