WSEAS

Plenary Lecture

Better Decision Tree Models Regarding Class Distribution of Data

Professor Hyontai Sug
Division of Computer Engineering
Dongseo University
Korea
E-mail: sht@gdsu.dongseo.ac.kr

Abstract: Comprehensibility of the results of data mining is very important for the fields where the interpretation by human is critical like medicine field. Decision tree algorithms are one of good data mining tools that can generate understandable knowledge structures in tree shape. The training algorithms of decision trees have the property of giving higher preference to major classes for best accuracy. Major classes have more instances with better purity with respect to the distribution of class values. As a result, instances that belong to minority are neglected. But, minority instances often need our attention, because symptoms of illness is rarer cases than that of being healthy. As a way to overcome the problem when data set is limited, over-sampling and under-sampling have been considered a good strategy for more balanced classification. Over-sampling supplies the same instances multiple times, so that the trained knowledge models will have the tendency to rely on the identical over-sampled instances heavily. As a way to mitigate the problem synthetic minority over-sampling technique supplies new and artificial instances of a minor class based on interpolation to build better classification models. But, due to the limitations of data mining algorithms and data themselves, some instances can be classified incorrectly. As a way to build better data mining for a minority class without sacrificing overall accuracy, we may select good data instances including the artificial instances for our final data mining models. By checking the quality of the data instances, and supplying the good ones to build our target data mining models like decision trees, we may achieve our goals. Several examples from real world domain will be shown to demonstrate the effect of the suggested method.

Brief Biography of the Speaker: Dr. Hyontai Sug: received BS degree in computer science and statistics from Busan National University, Korea in 1983, and MS degree in applied computer science from Hankuk University of Foreign Studies, Korea in 1986, majoring natural language processing, and Ph.D. degree in computer and information science and engineering from University of Florida, USA in 1998, majoring data mining. He was a researcher of Agency for Defense Development, Korea from 1986 to 1992, and a full-time lecturer of Pusan University of Foreign Studies, Korea from 1999 to 2001. Currently, he is professor of Dongseo University, Korea from 2001. He published several noticeable articles in the field of data mining, so that he has been listed in Marquis who’s who in the world since 2006. His research interests include data mining especially in the field of decision trees and association rules, and he is also interested in database application development.

Plenary Lecture 2

Quick Links

Login

Bulletin Board