Autosoft Journal

Online Manuscript Access


An Improved k-nearest Neighbor Algorithm using Tree Structure and Pruning Technology


Author



Abstract

K-Nearest Neighbor algorithm (KNN) is a simple and mature classification method. However there are susceptible factors influencing the classification performance, such as k value determination, the overlarge search space, unbalanced and multi-class patterns, etc. To deal with the above problems, a new classification algorithm that absorbs tree structure, tree pruning and adaptive k value method was proposed. The proposed algorithm can overcome the shortcoming of KNN, improve the performance of multi-class and unbalanced classification, reduce the scale of dataset maintaining the comparable classification accuracy. The simulations are conducted and the proposed algorithm is compared with several existing algorithms. The results indicate that the proposed algorithm can obtain higher classification efficiency and smaller search reference set on UCI datasets. Furthermore, the proposed algorithm can overcome the shortcoming of KNN and improve the performance of multi-class and unbalanced classification. This illustrates that the proposed algorithm is an expedient method in design nearest neighbour classifiers.


Keywords


Pages

Total Pages: 14

DOI
10.31209/2018.100000003


Manuscript ViewPdf Subscription required to access this document

Obtain access this manuscript in one of the following ways


Already subscribed?

Need information on obtaining a subscription? Personal and institutional subscriptions are available.

Already an author? Have access via email address?


Published

Online Article

Cite this document


References

Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms, Machine Learning, 6(1), 37-66. https://doi.org/10.1007/BF00153759

Barandela, Ricardo, Ferri, F. J., and Sánchez, J. S.(2005). Decision boundary preserving prototype selection for nearest neighbor classification, International Journal of Pattern Recognition and Artificial Intelligence, 19(6): 787-806. https://doi.org/10.1142/S0218001405004332

Fayed, H. A. & Atiya, A. F. (2009). A novel template reduction approach for the k-nearest neighbor method, IEEE Transactions on Neural Networks, 20(5), 890-896. https://doi.org/10.1109/TNN.2009.2018547

Franti, P., Virmajoki, O., and Hautamaki, V.(2006). Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11): 1875-1881. https://doi.org/10.1109/TPAMI.2006.227

Friedman, J. H., Bentley, J. H., and Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time, ACM Transactions on Mathematical Software, 3(3): 209-226. https://doi.org/10.1145/355744.355745

Gong, A. & Liu, Y. (2011). Improved KNN classification algorithm by dynamic obtaining K, ECWAC 2011, Part I, CCIS 143: 320-324.

Han, Eui-Hong, George Karypis, and Vipin Kumar. "Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification." Lecture Notes in Computer Science (2001): 53-65. Crossref. Web. https://doi.org/10.1007/3-540-45357-1_9

Haro-García, A. D. & García-Pedrajas, N. (2009). A divide-and-conquer recursive approach for scaling up instance selection algorithms, Data Mining & Knowledge Discovery, 18(3), 392-418. https://doi.org/10.1007/s10618-008-0121-2

Jagadish, H. V., Ooi, B. C., Tan, K. L., Yu, C., and Zhang, R. (2005). Idistance:an adaptive b+-tree based indexing method for nearest neighbor search, ACM Transactions on Database Systems, 30(2), 364-397. https://doi.org/10.1145/1071610.1071612

Khazaee, Ali & Ebrahimzadeh, Ataollah. (2013). Heart arrhythmia detection using support vector machines, Intelligent Automation and Soft Computing, 19(1), 1-9. https://doi.org/10.1080/10798587.2013.771456

Köknar-Tezel, S. & Latecki, L. J. (2011). Improving svm classification on imbalanced time series data sets with ghost points, Knowledge & Information Systems, 28(1), 1-23. https://doi.org/10.1007/s10115-010-0310-3

Krishna, K., Thathachar, M. A. L. and Ramakrishnan, K. R. (2000). Voronoi networks and their probability of misclassification, IEEE Transactions on Neural Networks, 11(6): 1361-1372. https://doi.org/10.1109/72.883447

Li, B., Qin, L., and Yu, S. (2004). An adaptive k-nearest neighbor text categorization strategy, ACM Transactions on Asian Language Information Processing, 3(4), 215-226. https://doi.org/10.1145/1039621.1039623

Li, X., Shi, D., Charastrakul, V., and Zhou, J. (2009). Advanced p-tree based k-nearest neighbors for customer preference reasoning analysis, Journal of Intelligent Manufacturing, 20(5), 569-579. https://doi.org/10.1007/s10845-008-0146-9

Liu, Wei & Chawla, Sanjay. (2011). Class Condence Weighted kNN Algorithms for Imbalanced Data Sets. PAKDD”11 Proceedings of the 15th PAKDD, V(II): 345-356.

Lumini, A. & Nanni, L. (2006). A clustering method for automatic biometric template selection, Pattern Recognition, 39(3), 495-497. https://doi.org/10.1016/j.patcog.2005.11.004

Mccallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification, IN AAAI-98 WORKSHOP ON LEARNING FOR TEXT CATEGORIZATION, 62(2), 41-48.

Milli, M. & Bulut, H. (2017). The effect of neighborhood selection on collaborative filtering and a novel hybrid algorithm, Intelligent Automation and Soft Computing, 23(2): 261-269. https://doi.org/10.1080/10798587.2016.1204776

Mollineda, R. A., Ferri, F. J., and Vidal, E. (2002). An efficient prototype merging strategy for the condensed 1-nn rule through class-conditional hierarchical clustering, Pattern Recognition, 35(12), 2771-2782. https://doi.org/10.1016/S0031-3203(01)00208-4

Olvera-López, J. A., Carrasco-Ochoa, J. A., and Martínez-Trinidad, J. F. (2010). A new fast prototype selection method based on clustering, Pattern Analysis and Applications, 13:131–141. https://doi.org/10.1007/s10044-008-0142-x

Platt, John C. (2013). Using analytic QP and sparseness to speed training of support vector machines, Advances in neural information processing systems, 557-563.

Raicharoen, T., Lursinsap, C., and Lin, F. (2005). A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm for regression problem, Pattern Recognition Letters, 26(10), 1554-1567. https://doi.org/10.1016/j.patrec.2005.01.003

Rico-Juan, J. R., & I-esta, J. M. (2012). New rank methods for reducing the size of the training set using the nearest neighbor rule, Pattern Recognition Letters, 33(10), 1434-1434. https://doi.org/10.1016/j.patrec.2012.04.001

Sample, Neal, Matthew, Haines, Mark, Arnold, and Purcell, Timothy. (2001). Optimizing Search Strategies in k-d Trees, 5th WSES/IEEE World Multiconference on (CSCC 2001).

Wang, J., Neskovic, P., and Cooper, L. N. (2006). Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence, Pattern Recognition, 39(3), 417-423. https://doi.org/10.1016/j.patcog.2005.08.009

Xu, Y., Shen, F., and Zhao, J. (2012). An incremental learning vector quantization algorithm for pattern classification, Neural Computing & Applications, 21(6), 1205-1215. https://doi.org/10.1007/s00521-010-0511-4

Zeng, Yong, Yang, Yupu, and Zhao, Liang. (2009). Nonparametric classification based on local mean and class statistics, Expert Systems with Applications, 36(4): 8443-8448. https://doi.org/10.1016/j.eswa.2008.10.041

Zeng, Yong, Yang, Yupu, and Zhao, Liang. (2009). Pseudo nearest neighbor rule for pattern classification, Expert Systems with Applications, 36(2): 3587-3595. https://doi.org/10.1016/j.eswa.2008.02.003

Zhang, B. & Srihari, S. N.(2004). Fast k-nearest neighbor classification using cluster-based trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4): 525-528. https://doi.org/10.1109/TPAMI.2004.1265868

Zheng, B., Lee, W. C., and Lee, D. L. (2003). Search K Nearest Neighbors on Air, Mobile Data Management, Springer Berlin Heidelberg, 2003:181-195.

Zou, T., Wang, Y., Wei, X., Li, Z., and Yang, G. (2014). An effective collaborative filtering via enhanced similarity and probability interval prediction, Intelligent Automation and Soft Computing, 20(4), 555-566. https://doi.org/10.1080/10798587.2014.934598

JOURNAL INFORMATION


ISSN PRINT: 1079-8587
ISSN ONLINE: 2326-005X
DOI PREFIX: 10.31209
10.1080/10798587 with T&F
IMPACT FACTOR: 0.652 (2017/2018)
Journal: 1995-Present




CONTACT INFORMATION


TSI Press
18015 Bullis Hill
San Antonio, TX 78258 USA
PH: 210 479 1022
FAX: 210 479 1048
EMAIL: tsiepress@gmail.com
WEB: http://www.wacong.org/tsi/