Autosoft Journal

Online Manuscript Access

Leveraging Clustering Techniques to Facilitate Metagenomic Analysis



Machine learning clustering algorithms provide excellent methods for conducting metagenomic analysis with efficiency. This study uses two machine learning algorithms, the self-organizing map and the K-means algorithms, to cluster data from an environmental sample collected from a hot springs habitat and to provide a visual analysis of that data. A data processing pipeline is described that uses the clustering algorithms to identify which reference genomes should be included for further analysis in determining possible organisms that are present in a metagenomic sample. The clustering revealed probable candidates for additional analysis, including a thermophilic, anaerobic bacterium, which is likely to be found in a hot springs environment and serves to validate the functionality of these tools. The machine learning techniques discussed here can serve as a launching point for elucidating protein sequences that could serve as possible reference comparisons to a specific metagenomic sample and lead to further study.



Total Pages: 13
Pages: 153-165


Manuscript ViewPdf Subscription required to access this document

Obtain access this manuscript in one of the following ways

Already subscribed?

Need information on obtaining a subscription? Personal and institutional subscriptions are available.

Already an author? Have access via email address?


Volume: 22
Issue: 1
Year: 2015

Cite this document


Abubucker, Sahar et al. "Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome." Ed. Jonathan A. Eisen. PLoS Computational Biology 8.6 (2012): e1002358. Crossref. Web.

Aggarwal C. C. Data clustering: Algorithms and applications

Altschul, Stephen F. et al. "Basic Local Alignment Search Tool." Journal of Molecular Biology 215.3 (1990): 403-410. Crossref. Web.

Bazinet, Adam L, and Michael P Cummings. "A Comparative Evaluation of Sequence Classification Programs." BMC Bioinformatics 13.1 (2012): 92. Crossref. Web.

Chistoserdova, Ludmila. "Functional Metagenomics: Recent Advances and Future Challenges." Biotechnology and Genetic Engineering Reviews 26.1 (2009): 335-352. Crossref. Web.

Culligan, Eamonn P et al. "Metagenomics and Novel Gene Discovery." Virulence 5.3 (2013): 399-412. Crossref. Web.

Edgar, Robert C. "Search and Clustering Orders of Magnitude Faster Than BLAST." Bioinformatics 26.19 (2010): 2460-2461. Crossref. Web.

Ghosh, Tarini et al. "HabiSign: a Novel Approach for Comparison of Metagenomes and Rapid Identification of Habitat-Specific Sequences." BMC Bioinformatics 12.Suppl 13 (2011): S9. Crossref. Web.

Handelsman, Jo et al. "Molecular Biological Access to the Chemistry of Unknown Soil Microbes: a New Frontier for Natural Products." Chemistry & Biology 5.10 (1998): R245-R249. Crossref. Web.

Hartigan, John A., Helmut Spath, and J. Van Ryzin. "Clustering Algorithms." Journal of Marketing Research 18.4 (1981): 487. Crossref. Web.

Jain, Anil K. "Data Clustering: 50 Years Beyond K-Means." Pattern Recognition Letters 31.8 (2010): 651-666. Crossref. Web.

Kelley, David R, and Steven L Salzberg. "Clustering Metagenomic Sequences with Interpolated Markov Models." BMC Bioinformatics 11.1 (2010): 544. Crossref. Web.

Kohonen, Teuvo. "The Self-Organizing Map." Neurocomputing 21.1-3 (1998): 1-6. Crossref. Web.

Kohonen, Teuvo. "Self-Organizing Maps." Springer Series in Information Sciences (2001): n. pag. Crossref. Web.

Lai, Chih-Chin. "A Novel Clustering Approach Using Hierarchical Genetic Algorithms." Intelligent Automation & Soft Computing 11.3 (2005): 143-153. Crossref. Web.

Li, W. et al. "Ultrafast Clustering Algorithms for Metagenomic Sequence Analysis." Briefings in Bioinformatics 13.6 (2012): 656-668. Crossref. Web.

Liao, Qiuhong et al. "Cluster Analysis of Citrus Genotypes Using Near-Infrared Spectroscopy." Intelligent Automation & Soft Computing 19.3 (2013): 347-359. Crossref. Web.

Marsland S. Machine learning: An algorithmic perspective

Murphy K. Machine learning: A probabilistic perspective

Weber, Marc et al. "Practical Application of Self-Organizing Maps to Interrelate Biodiversity and Functional Data in NGS-Based Metagenomics." The ISME Journal 5.5 (2010): 918-928. Crossref. Web.


ISSN PRINT: 1079-8587
ISSN ONLINE: 2326-005X
DOI PREFIX: 10.31209
10.1080/10798587 with T&F
IMPACT FACTOR: 0.652 (2017/2018)
Journal: 1995-Present


TSI Press
18015 Bullis Hill
San Antonio, TX 78258 USA
PH: 210 479 1022
FAX: 210 479 1048