In linear algebra, Cholesky factorization is useful in solving a system of equations with a symmetric positive definite coefficient matrix. Cholesky factorization is roughly twice as fast relative to LU factorization which applies to general matrices. In recent years, with advances in technology, a Fermi GPU card can accommodate hundreds of cores compared to the small number of 8 or 16 cores on CPU. Therefore a trend is seen to use the graphics card as a general purpose graphics processing unit (GPGPU) for parallel computation. In this work, Volkov's hybrid implementation of Cholesky factorization is evaluated on the new Fermi GPU with others and then some improvement strategies were proposed. After experiments, compared to the CPU version using Intel Math Kernel Library (MKL), our proposed GPU improvement strategy can achieve a speedup of 3.85x on Cholesky factorization of a square matrix of dimension 10,000.
Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on, pp.896-900