WSEAS

Plenary Lecture

Modern Parallel Computing Using Heterogeneous Systems with Multicore CPUs and Accelerators

Assistant Professor Paweł Czarnul
Dept. of Computer Architecture
Faculty of Electronics, Telecommunications and Informatics
Gdansk University of Technology
POLAND
E-mail: pczarnul@eti.pg.gda.pl

Abstract: The presentation focuses on the state-of-the-art in modern parallel and distributed computing including key challenges and solutions. Parallel processing using multicore CPUs, GPU and other accelerators such as Intel Xeon Phi will be discussed. Currently, there is a variety of programming APIs for parallel computing at various levels. This includes solutions for:

1. GPUs [1, 2] such as NVIDIA CUDA, OpenCL, OpenACC [3] as well as accelerators such as Intel Xeon Phi with MPI [4], OpenMP [5] and OpenCL,
2. multicore CPUs such as OpenMP or OpenCL,
3. mixed GPU and CPU systems with OpenCL or hybrid approaches e.g. MPI and CUDA etc.,
4. cluster level including e.g. MPI,
5. grid systems with grid middlewares such as Globus Toolkit, BeesyCluster [6] etc.

This brings us to the need for knowledge of both parallelization techniques as well as particular APIs in order to make the most of today's high performance computing systems. In view of this, a new KernelHive system is presented that allows easy parallelization of computations in multi-level heterogeneous environments including CPUs and GPUs. On one hand, it allows easy definition of applications using a graphical editor. A complex scenario might be expressed as an acyclic directed graph (workflow) in which vertices are assigned various types of code. These types include computational kernels in OpenCL, partitioning, merging and other pieces. Kernels are stored in a library from where can be reused for subsequent projects and applications. The user needs to point input data sets in possibly remote locations by giving proper URLs to data. If needed, new custom built kernels may be easily developed using an editor built into the GUI.
The GUI of the system communicates with the engine that is reponsible for management of several possibly distributed clusters from where dedicated managers communicate with the engine. Following a hierarchical structure, within clusters several node managers report to the cluster manager and within each node, management of compute devices such as CPUs and GPUs is performed. At this low level, several custom processing frameworks can be provided such as master-slave or SPMD.
The engine of the system has been designed in a modular way and allows new plugins for so-called optimizers. Optimizers provide solutions such as specific algorithms for data partitioning and scheduling of computations across underlying collection of compute devices taking into account granularity and communication. This is especially important because in modern HPC systems, performance is not the only optimization goal.
Goals such as performance under power consumption constraints, power efficiency of parallel systems [7] are discussed. Modern CPUs and accelerators provide ways for management of power consumption including imposing constraints. Such mechanisms can be used for optimization within systems such as KernelHive [8]. This might be useful in real life situations in which, for example, temporary power consumption limits may be imposed on computing servers.
Finally, visualization of system state, application progress as well as runtime representation of computations running on both CPUs and GPUs within KernelHive will be presented. KernelHive separates processing within OpenCL kernels from visualization code implemented in Java. In the former, preview objects can be populated with data and then flexibly visualized in the Java layer.

[1] D. B. Kirk, W. mei W. Hwu, Programming Massively Parallel Processors, Second Edition: A Hands-on Approach, Morgan Kaufmann, iSBN-13: 978-0124159921, 2012.
[2] J. Sanders, E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley Professional, iSBN-13: 978-0131387683, 2010.
[3] Reyes, R., Lopez-Rodrıguez, I., Fumero, J.J., de Sande, F.: A preliminary evaluation of OpenaACC implementations. The Journal of Supercomputing 65 (2013) 1063–1075
[4] William Gropp , Ewing L. Lusk, Anthony Skjellum. Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation), The MIT Press, 1999, ISBN 978-0262571326
[5] Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). The MIT Press (2007)
[6] Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the JEE-based BeesyCluster environment. Journal of Supercomputing 63 (2013) 46–71
[7] K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, G. D. Peterson, Power Aware Computing on GPUs, Application Accelerators in High-Performance Computing, Symposium on 0 (2012) 64–73, ISSN 2166-5133, doi:http://doi.ieeecomputersociety.org/10.1109/SAAHPC.2012.26.
[8] P. Czarnul, P. Rosciszewski, Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs, in: M. Chatterjee, J.-N. Cao, K. Kothapalli, S. Rajsbaum (Eds.), ICDCN, vol. 8314 of Lecture Notes in Computer Science, Springer, ISBN 978-3-642-45248-2, 66–80, 2014.

Brief Biography of the Speaker: Paweł Czarnul is an Assistant Professor at Dept. of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Poland. He obtained his MSc and PhD degrees from Gdansk University of Technology in 1999 and 2003 respectively. He spent 2001-2002 at Radiation Lab in Electrical Engineering and Computer Science Dept. of University of Michigan, USA working on code parallelization using dynamic repartitioning techniques. He is the author of over 70 publications in the area of parallel and distributed computing, including journals such as International Journal of High Performance Computing Applications, Journal of Supercomputing, Metrology and Measurement Systems, Multiagent and Grid Systems, Scalable Computing etc. and conferences such as EuroPVMMPI, PPAM, ICCS, IMCSIT and many others. He actively participated in 12 international and national grants, being the director of 3 projects. Currently the director of national project “Modeling efficiency, reliability and power consumption of multilevel parallel HPC systems using CPUs and GPUs”. A reviewer for international journals such as IEEE Transactions on Parallel and Distributed Systems, Journal of Parallel and Distributed Systems, Journal of Supercomputing, Computing and Informatics etc., for conferences including PPAM, GRID, ICCS, Workshops on Software Services, Workshop on Cloud-enabled Business Process Management, BalticDB&IS, IDAACS etc. An expert in an international board on Software Services within the SPRERS project.
His research interests include parallel and distributed computing, high performance computing including GPGPUs, Internet and mobile technologies. The leader of the BeesyCluster project (https://beesycluster.eti.pg.gda.pl:10030/ek/AS_LogIn) that provides middleware for a service oriented integration of computational and general purpose hardware resources with advanced workflow management system and support for clouds. A co-designer of the COMCUTE (http://comcute.eti.pg.gda.pl/) project for volunteer computing within user browser with control of redundancy and reliability. A co-designer of KernelHive (http://kask.eti.pg.gda.pl/en/projekty/kernelhive/) for automatic parallelization of computations among clusters with GPUs and multicore CPUs. The author of DAMPVM/DAC (http://pczarnul.eti.pg.gda.pl/DAMPVM.html) for automatic parallelization of divide-and-conquer applications.

Plenary Lecture 4

Quick Links

Login

Bulletin Board