An Improved Parallel Multiple Sequence Alignment Algorithm on Multi-core System
In this paper, we introduce an improved parallel algorithm for computing the number of exact matches nid (S,T) in the local alignment of two biological sequences S and T. This number is used in the first stage of progressive alignment to compute the distance between two sequences. The distance computations are usually its most computationally intensive part. Therefore, this work concentrates on improving an algorithm for this stage using vectorizing technique and running on multi-core. Our program is able to compute nid (S,T) between very long sequences, up to 34 k residues by C++ with OpenMP library on an Intel Core-i7-3770 quad-core processor of 3.40 GHz and main memory of 8 GB. It outperforms ClustalW-MPI 0.13 with 2.9-fold speedup, and the efficiency reached 0.35. Furthermore, a higher speedup with improved efficiency can be accomplished. Its performance figures vary from a low of 0.438 GCUPS to a high of 3.66 GCUPS as the lengths of the query sequences decrease from 34,500 to 9200.
Index Terms: Bioinformatics, Distance Computation, Multi-cores, Multiple Sequence Alignment, Parallel Programming
 T. F. Smith and M. S. Waterman. “Identification of common molecular subsequences.” Journal of Molecular Biology, vol. 147, no. 1, pp. 195-197, 1981.
 S. Henikoff, J. G. Henikoff and S. Pietrokovski. “Blocks+: A non-redundant database of protein alignment blocks derived from multiple
compilations.” Bioinformatics, vol. 15, no. 6, pp. 471-479, 1999.
 B. Schmidt. Bioinformatics: High Performance Parallel Computer Architectures. Florida: CRC Press, 2010.
 L. Wang and T. Jiang. “On the complexity of multiple sequence alignment.” Journal of Computational Biology, vol. 1, no. 4, pp. 337-348, 1994.
 R. C. Edgar and S. Batzoglou. “Multiple sequence alignment.” Current Opinion in Structural Biology, vol. 16, no. 3, pp. 368-373, 2006.
 D. F. Feng and R. F. Doolittle. “Progressive sequence alignment as a prerequisitetto correct phylogenetic trees.” Journal of Molecular
Evolution, vol. 25, no. 4, pp. 351-360, 1987.
 P. Sneath and R. Sokal. “Unweighted pair group method with arithmetic mean,” in Numerical Taxonomy, San Francisco: W.H. Freeman and Company, pp. 230-234, 1973.
 N. Saitou and M. Nei. “The neighbor-joining method: A new method for reconstructing phylogenetic trees.” Molecular Biology and Evolution, vol. 4, no. 4, pp. 406-425, 1987.
 K. Chaichoompu and S. Kittitornkun. “Multithreaded clustalw with im-proved optimization for intel multi-core processor,” in Communications and Information Technologies, 2006. ISCIT’06. International Symposium on. IEEE, 2006, pp. 590-594.
 W. Liu, B. Schmidt, G. Voss and W. Muller-Wittig. “Streaming algorithms for biological sequence alignment on gpus.” IEEE
Transactions on Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, 2007.
 F. F. Ghaleb, N. M. Reda and M. W. Al-Neama. “An overview of multiple sequence alignment parallel tools.” in Proc. CSCCA ’13, Dubrovnik, Croatia, 2013, pp. 91-96.
 Y. Liu, B. Schmidt and D. L. Maskell. “Msaprobs: Multiple sequence alignment based on pair hidden markov models and partition
function posterior probabilities.” Bioinformatics, vol. 26, no. 16, pp. 1958-1964, 2010.
 J. D. Thompson, B. Linard, O. Lecompte and O. Poch. “A comprehensive benchmark study of multiple sequence alignment
methods: Current challenges and future perspectives.” Plosone, vol. 6, no. 3, pp. 2101-2113, 2011.
 J. Daugelaite, A. O. Driscoll and R. D. Sleator. “An overview of multiple sequence alignments and cloud computing in bioinformatics,” ISRN Biomathematics, 2013.
 J. R. Cole, Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity and J. M. Tiedje. “The ribosomal database project: improved alignments and new tools for rRNA analysis.” Nucleic Acids Research, vol. 37, no. 1, pp. D141-D145, 2009.
. Z. Ying., X. Lin., S. C. W. See and M. Li. “GPU-accelerated DNA distance matrix computation,” in Proc. ChinaGrid 2011, Dalian, Liaoning, China, 2011, pp. 42-47.
. A. Wirawan, C. K. Kwoh and B. Schmidt. “Multi threaded vectorized distance matrix computation on the Cell/BE and x86/SSE2 architectures.” Bioinformatics Advance, vol. 26, no. 10, pp. 1368-1369, 2010.
. M. W. Al-Neama, N. M. Reda and F. F. Ghaleb. “Multiple sequence alignment on multi-cores.” International Journal of Biomathematics, vol. 8, no. 6, p. 1550084, 2015.
. National Center for Biotechnology Information (NCBI); 2017. Available: https://www.ncbi.nlm.nih.gov/. [Last Accessed on 2017 Aug 24].
. G. Hager and G. Wellein. Introduction to High Performance Computing for Scientists and Engineers. Boca Raton, FL: CRC Press, 2010.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.