Detection of New Motifs Properties in Biodata

Authors

  • Nooruldeen Nasih Qader College of Science and Technology, University of Human Development, Sulaymaniyah, Kurdistan Region, Iraq http://orcid.org/0000-0001-7822-6868
  • Hussein K. Al-Khafaji Alrafidain University College, Baghdad, Iraq

DOI:

https://doi.org/10.21928/juhd.v1n4y2015.pp404-412

Keywords:

Motif model, mining, DNA, biodata, sequence, genome, k-mers, structure, Bioinformatics, monad, composite, Background Frequency

Abstract

Biodata are rich of information. Knowing the properties of biological sequence can be valuable in analyzing data and making appropriate conclusions. This research applied naturalistic methodology to investigate the structural properties of biological sequences (i.e., DNA). The research implemented in the field of motif finding. Two new motifs properties were discovered named identical neighbors and adjacent neighbors.  The analysis is done in different situations of background frequency and motif model, using distinctive real data set of varied data size. The analysis demonstrated the strong existence of the properties. Exploiting of these properties considers significant steps towards developing powerful algorithms in molecular biology.

References

[1] N. N. Qader and H. K. Al-khafaji, “Motivation and Justification of Naturalistic Method for Bioinformatics Research,” J. Emerg. Trends Comput. Inf. Sci., vol. 5, no. 2, pp. 80–87, 2014.
[2] N. N. Qader and H. K. Al-khafaji, “Motif Discovery and Data Mining in Bioinformatics,” Int. J. Comput. Technol., vol. 13, no. 1, pp. 4082–4095, 2014.
[3] S. Bandyopadhyay, S. Mallik, and A. Mukhopadhyay, “A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data.,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 11, no. 1, pp. 95–115, Nov. 2013.
[4] D.-J. Yu, J. Hu, J. Yang, H.-B. Shen, J. Tang, and J.-Y. Yang, “Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering.,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 10, no. 4, pp. 994–1008, 2013.
[5] M. Friberg, P. Von Rohr, and G. Gonnet, “Scoring Functions for Transcription Factor Binding Site Prediction,” BMC Bioinformatics, vol. 6, no. 1, pp. 1–11, 2005.
[6] E. Milotti, V. Vyshemirsky, M. Sega, S. Stella, F. Dogo, and R. Chignola, “Computer-aided biophysical modeling: a quantitative approach to complex biological systems.,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 10, no. 3, pp. 805–10, 2013.
[7] H. Chen-Ming, C. Chien-Yu, and L. Baw-Jhiune, “WildSpan: mining structured motifs from protein sequences,” Algorithms Mol. Biol., vol. 6, no. 1, p. 6, 2011.
[8] A. M. Carvalho, A. T. Freitas, A. L. Oliveira, and M.-F. Sagot, “An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 3, no. 2, pp. 126–140, 2006.
[9] Y. Zhang and M. Zaki, “SMOTIF: efficient structured pattern and profile motif search,” Algorithms Mol. Biol., vol. 1, no. 1, p. 22, Jan. 2006.
[10] P. Boyen, F. Neven, D. Van Dyck, F. L. Valentim, and A. D. J. Van Dijk, “Maximally Covering Interactions in a Protein-Protein Interaction Network,” vol. 10, no. 1, pp. 73–86, 2013.
[11] E. Loekito, J. Bailey, and J. Pei, “A Binary Decision Diagram Based Approach for Mining Frequent Subsequences,” Knowl. Inf. Syst., vol. 24, no. 2, pp. 235–268, Sep. 2010.
[12] P. P. Kuksa and V. Pavlovic, “Efficient motif finding algorithms for large-alphabet inputs.,” BMC Bioinformatics, vol. 11 Suppl 8, no. 1471–2105, p. S1, Jan. 2010.
[13] F. Masseglia, P. Poncelet, and M. Teisseire, Successes and New Directions in Data Mining. Information Science Reference, 2008, p. 386.
[14] M. Piipari, T. Down, and T. Hubbard, “Large-Scale Gene Regulatory Motif Discovery with NestedMICA,” Sci. Eng. Biol. Informatics, vol. 7, p. 1, 2011.
[15] K. Gouda, M. Hassaan, and M. Zaki, “Prism: An effective approach for frequent sequence mining via prime-block encoding,” J. Comput. Syst. Sci., vol. 1, pp. 1–15, 2010.
[16] F. Hadzic, T. Dillon, and H. Tan, Mining of Data with Complex Structures. 2011, p. 348.
[17] G. Chen and Q. Zhou, “Heterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding,” Biometrics, vol. 66, no. 3, pp. 694–704, 2010.
[18] W. Li, B. Ma, and K. Zhang, “Optimizing Spaced k -mer Neighbors for Efficient Filtration in Protein Similarity Search,” vol. 11, no. 2, pp. 398–406, 2014.
[19] “Index of /~zaki/software/sMotif.” [Online]. Available: http://www.cs.rpi.edu/~zaki/software/sMotif/. [Accessed: 31-May-2014].
[20] “Arabidopsis thaliana (ID 4) - Genome - NCBI.” [Online]. Available: http://www.ncbi.nlm.nih.gov/genome/4. [Accessed: 31-May-2014].
[21] L. Mao and W. J. Zheng, “Combining comparative genomics with de novo motif discovery to identify human transcription factor DNA-binding motifs,” BMC Bioinformatics, vol. 7, no. Suppl 4, p. S21, 2006.
[22] H. M. Lodhi and S. H. Muggleton, Elements of computational systems biology, vol. 8. John Wiley & Sons Inc, 2009.
[23] I. Kulakovskiy and A. Favorov, “Motif discovery and motif finding from genome-mapped DNase footprint data,” Bioinformatics, 2009.
[24] F. Chin and H. C. M. Leung, “DNA Motif Representation with Nucleotide Dependency,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 5, no. 1, pp. 110–9, 2008.
[25] H. Ji and W. H. Wong, “Computational biology: toward deciphering gene regulatory information in mammalian genomes,” Biometrics, vol. 62, no. 3, pp. 645–663, 2006.
[26] Y. Zhang and M. Zaki, “EXMOTIF: efficient structured motif extraction,” Algorithms Mol. Biol., vol. 18, 2006.
[27] M. Halachev and N. Shiri, “Fast Structured Motif Search in DNA Sequences,” Bioinforma. Res. Dev., pp. 58–73, 2008.
[28] K. Jensen and M. Styczynski, “A Generic Motif Discovery Algorithm for Sequential Data,” Bioinforma., vol. 22, no. 1, pp. 21–28, 2006.
[29] M. Hasan, Q. Liu, H. Wang, and J. Fazekas, “GIST: Genomic island suite of tools for predicting genomic islands in genomic sequences,” vol. 8, no. 4, pp. 203–205, 2012.
[30] W. K. Ching and M. K. Ng, Markov chains: models, algorithms and applications, vol. 83. Springer-Verlag New York Inc, 2006.
[31] T. Jiang, Y. Xu, and M. Q. Zhang, Current topics in computational molecular biology. The MIT Press, 2002.
[32] V. S. Mathura and P. Kangueane, Bioinformatics: a Concept-Based Introduction. Springer Verlag, 2009, p. 196.
[33] T. T. Nguyen and I. P. Androulakis, “Recent Advances in the Computational Discovery of Transcription Factor Binding Sites,” Algorithms, vol. 2, no. 1, pp. 582–605, Mar. 2009.
[34] E. Lander, “Initial impact of the sequencing of the human genome,” Nature, vol. 470, no. 7333, pp. 187–97, Feb. 2011.
[35] “Genomes - Genome - NCBI.” [Online]. Available: http://www.ncbi.nlm.nih.gov/genome/genomes/4. [Accessed: 31-May-2014].
[36] “GoldenPath of currentGenomes -Homo_sapiens chromosomes.” [Online]. Available: ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/chromosomes/. [Accessed: 05-Apr-2011].

Published

2015-09-30

Issue

Section

Articles