Using X-ray crystallography to determine the 3D structure of a protein is a costly and time-consuming process. One of the major reasons is that the protein needs to be purified and crystallized first, and the failure rate of protein crystallization is quite high. Thus it is desired to use a computational method to predict protein crystallizability based on the primary structure information before the whole process starts. This can dramatically lower the average cost for protein structure determination. In this paper, we investigated the feature sets used in previous research. The support vector machine (SVM) was chosen as the predictor. Different weightings are set for the penalty parameters of the two classes to deal with the imbalanced data problem. As a result, a combined set of features is able to produce better results, especially on the specificity.
Proceedings of the 5th International Conference on Innovations in Information Technology (Innovations 2008), pp.702-706