AUTHORS: M. N. Shah Zainudin, Md Nasir Sulaiman, Norwati Mustapha, Raihani Mohamed
Download as PDF
ABSTRACT: In data mining, classification learning is broadly categorized into two categories; supervised and unsupervised. In the former category, the training example is learned and the hidden class is predicted to represent the appropriate class. The class is known, but it is hidden from the learning model. Unlike supervised, unsupervised directly build the learning model for unlabeled example. Clustering is one of the means in data mining of predicting the class based on separating the data categories from similar features. Expectation maximization (EM) is one of the representatives clustering algorithms which have broadly applied in solving classification problems by improving the density of data using the probability density function. Meanwhile, Kmeans clustering algorithm has also been reported has widely known for solving most unsupervised classification problems. Unlike EM, K-means performs the clustering by measuring the distance between the data centroid and the object within the same cluster. On top of that, random forest ensemble classifier model has reported successive perform in most classification and pattern recognition problems. The expanding of randomness layer in the traditional decision tree is able to increase the diversity of classification accuracy. However, the combination of clustering and classification algorithm might rarely be explored, particularly in the context of an ensemble classifier model. Furthermore, the classification using original attribute might not guarantee to achieve high accuracy. In such states, it could be possible some of the attributes might overlap or may redundant and also might incorrectly place in its particular cluster. Hence, this situation is believed in yielding of decreasing the classification accuracy. In this article, we present the exploration on the combination of the clustering based algorithm with an ensemble classification learning. EM and K-means clustering algorithms are used to cluster the multi-class classification attribute according to its relevance criteria and afterward, the clustered attributes are classified using an ensemble random forest classifier model. In our experimental analysis, ten widely used datasets from UCI Machine Learning Repository and additional two accelerometer human activity recognition datasets are utilized.
KEYWORDS: - Expectation maximization, K-means, random forest, clustering, classificationREFERENCES:
 Y. G. Jung, M. S. Kang, and J. Heo, “Clustering performance comparison using K -means and expectation maximization algorithms,” Biotechnol. Biotechnol. Equip., vol. 28, no. sup1, pp. S44–S48, 2014.
 M. Singh, K. Kaur, and B. Singh, “Cluster Algorithm for Genetic Diversity,” World Acad. Sci. Eng. Technol. 18 2008, pp. 453– 457, 2008.
 S. Sharma, S. Kaur, and M. J. Kaur, “Hybrid Clustering and Classification,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 5, no. 1, pp. 222–225, 2015.
 H. Qian, Y. Mao, W. Xiang, and Z. Wang, “Recognition of human activities using SVM multi-class classifier,” Pattern Recognit. Lett., vol. 31, pp. 100–111, 2010.
 T. Chakraborty, “EC3: Combining Clustering and Classification for Ensemble Learning,” J. Mach. Learn., vol. 13, no. 9, pp. 1–14, 2017.
 J. Fürnkranz, “Pairwise Classification as an Ensemble Technique,” Mach. Learn. ECML 2002, vol. 2430, no. 2000, pp. 9–38, 2002.
 J. R. Kwapisz, G. M. Weiss, and S. a. Moore, “Activity recognition using cell phone accelerometers,” ACM SIGKDD Explor. Newsl., vol. 12, p. 74, 2011.
 M. Shoaib, H. Scholten, and P. J. M. Havinga, “Towards Physical Activity Recognition Using Smartphone Sensors,” 2013 IEEE 10th Int. Conf. Ubiquitous Intell. Comput. 2013 IEEE 10th Int. Conf. Auton. Trust. Comput., pp. 80–87, 2013.
 I. P. Machado, A. Luisa Gomes, H. Gamboa, V. Paixao, and R. M. Costa, “Human activity data discovery from triaxial accelerometer sensor: Non-supervised learning sensitivity to feature extraction parametrization,” Inf. Process. Manag., vol. 51, no. 2, pp. 201– 214, 2015.
 T. S. Madhulatha, “an Overview on Clustering Methods,” IOSR J. Eng., vol. 2, no. 4, pp. 719–725, 2012.
 H. K. Al-Mohair, J. Mohamad Saleh, and S. A. Suandi, “Hybrid Human Skin Detection Using Neural Network and K-Means Clustering Technique,” Appl. Soft Comput., vol. 33, pp. 337–347, 2015.
 F. Attal, S. Mohammed, M. Dedabrishvili, F. Chamroukhi, L. Oukhellou, and Y. Amirat, “Physical Human Activity Recognition Using Wearable Sensors,” Sensors, vol. 15, no. 12, pp. 31314–31338, 2015.
 N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image Segmentation Using Kmeans Clustering Algorithm and Subtractive Clustering Algorithm,” in Procedia Computer Science, 2015, vol. 54, pp. 764– 771.
 D. R. Faria, C. Premebida, and U. Nunes, “A Probabilistic Approach for Human Everyday Activities Recognition using Body Motion from RGB-D Images,” in The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014, pp. 732–737.
 M. M. Jenghara and H. Ebrahimpourkomleh, “Rule Based Ensembles Using Pair Wise Neural Network Classifiers,” I.J. Intell. Syst. Appl., vol. 4, no. March, pp. 34–40, 2015.
 J. Bhatt, “A Survey on One Class Classification using Ensembles Method,” Int. J. Innov. Res. Sci. Technol., vol. 1, no. 7, pp. 19–23, 2014.
 L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
 M. N. Adnan and M. Z. Islam, “One-Vs-All Binarization Technique in the Context of Random Forest,” in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning., 2015, no. April, pp. 22–24.