Autosoft Journal

Online Manuscript Access

Implementation of Web Mining Algorithm Based on Cloud Computing



The rapid growth of the Internet exceeds all expectations. The analysis and mining of huge amounts of web data is facing a bottleneck in computing power and storage space. Through the use of cloud computing technology, we can facilitate the network access to powerful computing power, storage capacity and infrastructure. Cloud computing can effectively solve the problems by providing a data processing storage center of high reliability and scalability, which will improve the ability to process web data and reduce the requirements of the terminal devices. This paper studies web mining algorithms in a cloud computing environment. The web data mining algorithm and the MapReduce programming model are combined. We study the web mining techniques, especially the K-centers clustering algorithm, explore the combination of web mining algorithms and cloud computing technology and improve the data mining algorithms to adapt to the analysis and processing of mass web data based on cloud computing platforms. Our study constructs a distributed cloud environment using a Hadoop framework. In the experimental environment, we analyze the impact on computational performance by setting different block size parameters. Here, the block size determines the number that the pending data file is split, and the corresponding scale and amount of parallel calculation.



Total Pages: 6
Pages: 599-604


Manuscript ViewPdf Subscription required to access this document

Obtain access this manuscript in one of the following ways

Already subscribed?

Need information on obtaining a subscription? Personal and institutional subscriptions are available.

Already an author? Have access via email address?


Volume: 23
Issue: 4
Year: 2017

Cite this document


Bezdek, J. C. "Numerical Taxonomy with Fuzzy Sets." Journal of Mathematical Biology 1.1 (1974): 57-71. Crossref. Web.

Dunn†, J. C. "Well-Separated Clusters and Optimal Fuzzy Partitions." Journal of Cybernetics 4.1 (1974): 95-104. Crossref. Web.

Fang X.J. Journal of Chemical and Pharmaceutical Research 5.12 (2013)

Huang Z.X. DMKD

Lam, C. (2010, July). Hadoop in action . Connecticut: Manning Publications, p. 325.

Langville, A.N. & Meyer, C.D. (2006). Google’s pagerank and beyond: The science of search engine rankings . Princeton: Princeton University Press, p. 28.

LEI Lei. "Towards a High Performance Virtual Hadoop Cluster." Journal of Convergence Information Technology 7.6 (2012): 292-303. Crossref. Web.

Mahendiran A. Research Journal of Applied Sciences, Engineering and Technology 4.10 (2012)

Ruan, Shen. "Based on Cloud-Computing”s Web Data Mining." Communications and Information Processing (2012): 241-248. Crossref. Web.

Liangfei Xue, Mingyan Jiang, and Dongfeng Yuan. "Cloud Computing Model in Web Data Mining." Journal of Convergence Information Technology 7.22 (2012): 585-592. Crossref. Web.

Zhang, Feng, and Li Liu. "Research on Data Mining Technology in Web Based on the Cloud Computing." Advanced Materials Research 532-533 (2012): 919-923. Crossref. Web.

Xuejie, Zhang, Wang Zhijian, and Xu Feng. "Reliability Evaluation of Cloud Computing Systems Using Hybrid Methods." Intelligent Automation & Soft Computing 19.2 (2013): 165-174. Crossref. Web.


ISSN PRINT: 1079-8587
ISSN ONLINE: 2326-005X
DOI PREFIX: 10.31209
PREVIOUS DOI PREFIX (with T&F): 10.1080/10798587
InCites Journal IMPACT FACTOR (JIF) Data

2018  0.790
2017  0.652
2016  0.644

Scimago Journal and Country Rank (SJR) Data

2018  0.993
2017  0.655
2016  0.660
SJR: "The two years line is equivalent to journal impact factor ™ (Thomson Reuters) metric."

Journal: 1995-Present


TSI Press
18015 Bullis Hill
San Antonio, TX 78258 USA
PH: 210 479 1022
FAX: 210 479 1048