**AUTHORS:**Nursel Selver Ruzgar

**Download as PDF**

**ABSTRACT:**
Today’s life, big data can be seen in many fields. There are many computer-based methods
developed and continuing to be developed to assess the big data more efficiently. Data mining is one of them.
In this paper, two Canadian banks’ daily stock market price changes are examined by ten data mining
algorithms to see which algorithm or algorithms classify the financial data well. For this purpose, thirty-seven
years of daily stock price changes for two Canadian banks with 21 independent variables and one dependent
variable, price, were obtained from NASDAQ. Ten data mining algorithms were applied to two datasets
separately and the performances of the algorithms were compared and tested based on accuracy, kappa statistic,
process time and confusion matrix. It was observed that tree algorithm, J48, and meta-analysis algorithms,
Meta-Attribute Selected Classifier, Meta-Classification via Regression and Meta-Logitboost, classified the
financial data with high accuracy. The results show that tree algorithm, J48, and the meta-analysis algorithms,
Meta-Attribute Selected Classifier, Meta-Classification via Regression and Meta-Logitboost, are promising
alternative to the conventional methods for financial prediction.

**KEYWORDS:**
- Classification, Logistic Regression, Fuzzyrough-NN, Genetic Programming, J48, Random Forest,
Navie Bayes, Navie Net, Meta-Analysis, Weka, Data mining

**REFERENCES:**

[
1] Inmon, W. H., Building Data Warehouse, QED/Wiley, Hoboken, NJ, USA, 2005.

[2] Triantaphyllou, E., Data Mining and Knowledge Discovery via Logic-Based Methods. New York: Springer, 2010.

[3] Kusrini, dan L.E.T. Algoritma, Data Mining, Andi Publishing, 2009, Yogyakarta. Indonesia.

[4] Ruzgar, N. S., Classification of Stock Market Price Change by Data Mining, The Journal of American Academy of Business, Cambridge, Vol. 25(2), 2020, pp.1-9.

[5] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Third Edition, Morgan Kaufmann Publishers, ISBN 978- 0123814791, July 2011.

[6] Sharma, N., Om, H., Early Detection and Prevention of Oral Cancer: Association Rule Mining on Investigations, WSEAS Transactions on Computers, Vol. 13, 2014, E-ISSN: 2224- 2872, pp: 1-8.

[7] Cheng, C. H., Chen, T. L., Wei, L.Y., A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting, Information Sciences, Vol. 180, 2010, pp. 1610– 1629.

[8] Dalloshi, P., Badivuku-Pantina, M., Empirical assessment of the impact of banking sector development on firm external financing, using the MELR model, WSEAS Transactions on Business and Economics, Vol. 15, 2018, pp. 512- 521.

[9] Ferreira L., Borenstein D., Righi, M. B., Filho D. Teixeira, A., A fuzzy hybrid integrated framework for portfolio optimization in private banking, Expert Systems with Applications, Vol. 92, 2018, pp. 350–362.

[10] Kumar, A., Kumar, S., Decision Tree based Learning Approach for Identification of Operating System Processes, WSEAS Transactions on Computers, Volume 13, 2014, pp. 277-288.

[11] Zeynu, S., Patil, S., Prediction of Chronic Kidney Disease Using Data Mining Feature Selection and Ensemble Method, WSEAS Transactions on Information Science and Applications, Vol. 15, 2018, pp. 168-176.

[12] Ivasic-Kos, M., Ipsic, I., Ribaric, S., Multilevel Image Annotation Using Bayes Classifier and Fuzzy Knowledge Representation Scheme, WSEAS Transactions on Computers, Vol. 13, 2014, pp. 635-644.

[13] Ramamurthy, B., Chandran, K.R., ShapeBasedImage Retrieval Using Canny Edge Detection and K-Means Clustering Algorithms for Medical Images. International Journal of Engineering Science and Technology, Vol. 3, 2011, pp. 1870–1877.

[14] Zeffora, J., Shobarani. A, R., Statistical Analysis of Random Forest on Real Estate Prediction, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Vol. 8 (8S), June 2019, pp. 640-644.

[15] Andriansah, Jl. R. C. and Achmad S.Jl. R. C., Comparative Accuracy of Regression Logistic Algorithm and C4.5 Based Chi Squared and Practical Swarm Optimization for Prediction Feasibility of Credit Giving, International Journal of Advance Studies in Computer Science and Engineering, IJASCSE, Vol. 7(7), 2018, pp. 1-7.

[16] Dalloshi, P., Badivuku-Pantina, M., Empirical assessment of the impact of banking sector development on firm external financing, using the MELR model, WSEAS Transactions on Business and Economics, Vol. 15, 2018, pp. 512- 521.

[17] Ruzgar, B., Ruzgar, N. S., Classification of the Insurance sector with logistic regression, International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 1(1), 2007, pp. 168-174, ISSN: 1998-0140, http://www.naun.org/journals/m3as/

[18] Ruzgar, N. S., Ruzgar B., Unsal, F., An Analysis of Price Movements Using the Rough Set Theory Approach, 19th International Conference on Applied Mathematics (AMATH '14), Mathematics and Computers in Science and Engineering Series, Vol. 38, 2014, pp. 91-98.

[19] Ruzgar, N. S., Ruzgar, B., Unsal, F., Rough set theory and discriminant analysis to classify financial data, International Journal of Economics and Statistics, Vol. 3, 2015, pp. 110- 116.

[20] Chen, Y-S., Cheng, C.-H., Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry, Knowledge-Based Systems, Vol. 39, 2013, pp. 224–239.

[21] Witten, I. H., & Frank, E., Data mining: Practical machine learning tools and techniques. 2nd ed., San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005.

[22] Nápoles, G., Mosquera, C., Falcon, R., Grau, I., Bello, R., Vanhoof, K., Fuzzy-Rough Cognitive Networks, Neural Networks, Vol. 97, 2018, pp. 19–27.

[23] Cornelis, C., De Cock, M., Radzikowska, A. M., Fuzzy rough sets: from theory into practice. In Handbook of granular computing, 2008, pp. 533–552

[24] Inuiguchi, M., Wu, W.-Z., Cornelis, C., Verbiest, NFuzzy-rough hybridization, Springer Berlin Heidelberg, 2015, pp. 425–451,

[25] Kim, M.J., Min, S.H., Han, I., An evolutionary approach to the combination of multiple classifiers to predict a stock price index, Expert Systems with Applications, Vol. 31, 2006, pp. 241–247

[26] Cheng, C-H., Chen, T-L., Liang-Ying W., A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting, Information Sciences, Vol.180, 2010, pp. 1610– 1629

[27] Vanitha, K., Libia Rani, G., Analysis of Classification and Clustering Algorithms using Weka For Banking Data, International Journal of Advanced Research in Computer Science, Vol. 1 (4), 2010, pp. 104-107

[28] Fan, W., Bifet, A., Mining Big Data: Current Status, and Forecast to the Future, SIGKDD Explorations, Vol. 14(2), 2012.

[29] Kasperczul, A., Dardzinska, A., Comparative Evaluation of the different data Mining Techniques used for the Medical Database, acta mechanica et automatica, vol.10(3), 2016) DOI 10.1515/ama-2016-0036

[30] Laurier, C., Meyers, O., Serra, J., Blech, M., Herrera, P., Serra, X., Indexing music by mood: design and integration of an automatic contentbased annotator. Multimedia Tools Applications, Vol. 48, 2010, pp. 161–184.

[31] McHugh, M. L., Interrater reliability: the kappa statistic, Biochem Med, Zagreb, Oct; 22(3), 2012, pp. 276–282.

[32] Cohen, W.W., Fast effective rule induction, in: Proceedings of the 12th International Conference on Machine Learning, 1995, pp. 115–123.

[33] Eugenio, B. D., Glass, M., The kappa statistic: a second look. Computational Linguistics, Vol. 30(1), 2004, pp. 95–101.

[34] Hemlata, Comprehensive Analysis of Data Mining Classifiers Using Weka, International Journal of Advanced Research in Computer Science (0976-5697), Vol. 9 (2), March-April 2018, pp. 718-723.

[35] Hussain, N. I., Choudhury, B., Rakshit, S., A Novel Method for Preserving Privacy in BigData Mining, International Journal of Computer Applications, (0975-8887) Vol. 103(16), October 2014,

[36] John, G. H., & Langley, P., Estimating continuous distributions in Bayesian classifiers, In Proceedings of the eleventh conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, 1995, pp. 338–345.

[37] Jensen, R., & Cornelis, C. In Proceedings of the 6th international conference on rough sets and current trends in computing, Vol. 5, 2008, pp. 310–319.

[38] Elmi, Z., Faez, K., Goodarzi, M., Goudarzi, N., Feature selection method based on fuzzy entropy for regression in QSAR studies, Research Article, Molecular Physics, Vol. 107(17), 2009, pp. 1787–1798.

[39] Quinlan, J. R., C4.5: programs for machine learning. Morgan Kauffman Publishers, 1993.

[40] Breiman, L., Random forests. Machine Learning, Vol. 45(1), 2001, pp. 5–32

[41] Devi, T. S., Sundaram, K. M., A Comparative Analysis of Meta and Tree Classification Algorithms Using Weka, International Research Journal of Engineering and Technology(IRJET), www.irjet.net, Vol.3(11) Nov-2016, pp. 77-83.

[42] Sumner, M., Frank, E., Hall, M., Speeding up logistic model tree induction. In Knowledge discovery in databases: PKDD, Springer, 2005, pp. 675–683.

[43] http://clouddc.chass.utoronto.ca.ezproxy.lib.ryers on. ca/ds/cfmrc/displayTSX.do?ed=2018&t=ts&f=da ily &lang=en#v2, Accessed: May 4, 2019.

[44] Laurier, C., Meyers, O., Serra, J., Blech, M., Herrera, P. and Serra, X., Indexing music by mood: design and integration of an automatic content-based annotator. Multimedia Tools Applications, Vol. 48, 2010, pp. 161–184.