AUTHORS: Keon Myung Lee, Jaesoo Yoo, Jiman Hong
Download as PDF
ABSTRACT: Machine learning is an approach to develop some algorithm for problem solving from data of the problem domain without coding programs. Although there are various machine learning tools with which machine learning applications can be developed relatively easily, non-experts have yet difficulties in developing machine learning applications. To be a successful developer, it is required to understand machine learning algorithms and to make right design choices. This paper addresses the decision choices to be made and which tasks need to be automated by the platform for non-expert developers to get an effective and efficient machine learning application. It presents the autonomicity levels which specify the level of automation in machine learning application development. It describes the requirements of an autonomic machine learning platform which helps non-expert developers build a machine learning application. It also introduces an architecture of an autonomic machine learning platform.
KEYWORDS: machine learning, distributed computing, autonomic computing, machine learning platform
REFERENCES:
[1] Kraska, T., Talwalkar, A., Duchi, J., Griffith, R., Franklin, M. J., Jordan, M.: MLbase: A Distributed Machine-learning System. CIDR, Vol. 1, 2013.
[2] Bello-Orgaz, G., Jung, J. J., Camacho, D.: Social big data: Recent achievements and new challenges. Information Fusion, Vol. 28, pp. 45-59, 2016.
[3] Jha, S., Qiu, J., Luckow, A., Mantha, P., & Fox, G. C.: A tale of two data-intensive paradigms: Applications, abstractions, and architectures. Proceedings of 2014 IEEE International Congress on Big Data (BigData Congress), (pp. 645-652), 2014.
[4] Cai, Z., Gao, Z. J., Luo, S., Perez, L. L., Vagena, Z., Jermaine, C.: A comparison of platforms for implementing and running very large scale machine learning algorithms. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 1371-1382, 2014.
[5] Lee, K.M., Lee, S.Y., Lee, K.M., Lee, S.H.: Document Density and Frequency-Aware Cluster Identification for Spatio-Temporal Sequence Data, Vol. 93, No. 1, 2017.
[6] Brochu, E., Cora, V. M., & De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
[7] Bergstra, J. S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, pp. 2546- 2554, 2011.
[8] Johnson, V. E., Wong, W. H., Hu, X., Chen, C. T.: Image restoration using Gibbs priors: Boundary modeling, treatment of blurring, and selection of hyperparameter. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 5, pp.413-425, 1991.
[9] Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), pp.281- 305, 2012.
[10] Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of International Conference on Machine Learning, pp. 115-123, 2013, February.
[11] Thornton, C., Hutter, F., Hoos, H. H., LeytonBrown, K.: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 847-855, 2013, August.
[12] Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th international conference on Machine learning, pp. 473-480, 2007.
[13] Kang, S. J., Lee, S. Y., & Lee, K. M.: Performance comparison of OpenMP, MPI, and mapreduce in practical problems. Advances in Multimedia, Vol. 7, 2015.
[14] Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., Ramchandran, K.: Speeding up distributed machine learning using codes. Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1143-1147, 2016, July.
[15] Lee, K.M., Jeong, Y.-S., Lee, S.H., Lee, K.M.: Bucket-size balancing locality sensitive hashing using the map reduce paradigm, Cluster Computing, 2017.
[16] Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Su, B. Y.: Scaling Distributed Machine Learning with the Parameter Server. OSDI, Vol. 1, No. 10.4, p. 3, 2014.
[17] Sparks, E. R., Talwalkar, A., Smith, V., Kottalam, J., Pan, X., Gonzalez, J., Kraska, T.: MLI: An API for distributed machine learning. Proceedings of 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 1187- 1192, 2013, December.
[18] Singh, D., Reddy, C. K.: A survey on platforms for big data analytics. Journal of Big Data, Vol. 2, No. 24, 2015.