Autosoft Journal

Online Manuscript Access


Leveraging Clustering Techniques to Facilitate Metagenomic Analysis


Authors



Abstract

Machine learning clustering algorithms provide excellent methods for conducting metagenomic analysis with efficiency. This study uses two machine learning algorithms, the self-organizing map and the K-means algorithms, to cluster data from an environmental sample collected from a hot springs habitat and to provide a visual analysis of that data. A data processing pipeline is described that uses the clustering algorithms to identify which reference genomes should be included for further analysis in determining possible organisms that are present in a metagenomic sample. The clustering revealed probable candidates for additional analysis, including a thermophilic, anaerobic bacterium, which is likely to be found in a hot springs environment and serves to validate the functionality of these tools. The machine learning techniques discussed here can serve as a launching point for elucidating protein sequences that could serve as possible reference comparisons to a specific metagenomic sample and lead to further study.


Keywords


Pages

Total Pages: 13
Pages: 153-165

DOI
10.1080/10798587.2015.1073887


Manuscript ViewPdf Subscription required to access this document

Obtain access this manuscript in one of the following ways


Already subscribed?

Need information on obtaining a subscription? Personal and institutional subscriptions are available.

Already an author? Have access via email address?


Published

Volume: 22
Issue: 1
Year: 2015

Cite this document


References

Abubucker, Sahar et al. "Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome." Ed. Jonathan A. Eisen. PLoS Computational Biology 8.6 (2012): e1002358. Crossref. Web. https://doi.org/10.1371/journal.pcbi.1002358

Aggarwal C. C. Data clustering: Algorithms and applications

Altschul, Stephen F. et al. "Basic Local Alignment Search Tool." Journal of Molecular Biology 215.3 (1990): 403-410. Crossref. Web. https://doi.org/10.1016/S0022-2836(05)80360-2

Bazinet, Adam L, and Michael P Cummings. "A Comparative Evaluation of Sequence Classification Programs." BMC Bioinformatics 13.1 (2012): 92. Crossref. Web. https://doi.org/10.1186/1471-2105-13-92

Chistoserdova, Ludmila. "Functional Metagenomics: Recent Advances and Future Challenges." Biotechnology and Genetic Engineering Reviews 26.1 (2009): 335-352. Crossref. Web. https://doi.org/10.5661/bger-26-335

Culligan, Eamonn P et al. "Metagenomics and Novel Gene Discovery." Virulence 5.3 (2013): 399-412. Crossref. Web. https://doi.org/10.4161/viru.27208

Edgar, Robert C. "Search and Clustering Orders of Magnitude Faster Than BLAST." Bioinformatics 26.19 (2010): 2460-2461. Crossref. Web. https://doi.org/10.1093/bioinformatics/btq461

Ghosh, Tarini et al. "HabiSign: a Novel Approach for Comparison of Metagenomes and Rapid Identification of Habitat-Specific Sequences." BMC Bioinformatics 12.Suppl 13 (2011): S9. Crossref. Web. https://doi.org/10.1186/1471-2105-12-S13-S9

Handelsman, Jo et al. "Molecular Biological Access to the Chemistry of Unknown Soil Microbes: a New Frontier for Natural Products." Chemistry & Biology 5.10 (1998): R245-R249. Crossref. Web. https://doi.org/10.1016/S1074-5521(98)90108-9

Hartigan, John A., Helmut Spath, and J. Van Ryzin. "Clustering Algorithms." Journal of Marketing Research 18.4 (1981): 487. Crossref. Web. https://doi.org/10.2307/3151350

Jain, Anil K. "Data Clustering: 50 Years Beyond K-Means." Pattern Recognition Letters 31.8 (2010): 651-666. Crossref. Web. https://doi.org/10.1016/j.patrec.2009.09.011

Kelley, David R, and Steven L Salzberg. "Clustering Metagenomic Sequences with Interpolated Markov Models." BMC Bioinformatics 11.1 (2010): 544. Crossref. Web. https://doi.org/10.1186/1471-2105-11-544

Kohonen, Teuvo. "The Self-Organizing Map." Neurocomputing 21.1-3 (1998): 1-6. Crossref. Web. https://doi.org/10.1016/S0925-2312(98)00030-7

Kohonen, Teuvo. "Self-Organizing Maps." Springer Series in Information Sciences (2001): n. pag. Crossref. Web. https://doi.org/10.1007/978-3-642-56927-2

Lai, Chih-Chin. "A Novel Clustering Approach Using Hierarchical Genetic Algorithms." Intelligent Automation & Soft Computing 11.3 (2005): 143-153. Crossref. Web. https://doi.org/10.1080/10798587.2005.10642900

Li, W. et al. "Ultrafast Clustering Algorithms for Metagenomic Sequence Analysis." Briefings in Bioinformatics 13.6 (2012): 656-668. Crossref. Web. https://doi.org/10.1093/bib/bbs035

https://doi.org/10.1093/bioinformatics/btll58

Liao, Qiuhong et al. "Cluster Analysis of Citrus Genotypes Using Near-Infrared Spectroscopy." Intelligent Automation & Soft Computing 19.3 (2013): 347-359. Crossref. Web. https://doi.org/10.1080/10798587.2013.824719

Marsland S. Machine learning: An algorithmic perspective

Murphy K. Machine learning: A probabilistic perspective

Weber, Marc et al. "Practical Application of Self-Organizing Maps to Interrelate Biodiversity and Functional Data in NGS-Based Metagenomics." The ISME Journal 5.5 (2010): 918-928. Crossref. Web. https://doi.org/10.1038/ismej.2010.180

JOURNAL INFORMATION


ISSN PRINT: 1079-8587
ISSN ONLINE: 2326-005X
DOI PREFIX: 10.31209
10.1080/10798587 with T&F
IMPACT FACTOR: 0.652 (2017/2018)
Journal: 1995-Present




CONTACT INFORMATION


TSI Press
18015 Bullis Hill
San Antonio, TX 78258 USA
PH: 210 479 1022
FAX: 210 479 1048
EMAIL: tsiepress@gmail.com
WEB: http://www.wacong.org/tsi/