Aachen, Training Programme , 2-5 September 2014
Day 1, Tuesday September 2
|09:00-10:30||Plenary Talk: HPC Today and Tomorrow [M. Müller, RWTH]|
|11:00-11:30||Parallel Programming Survey [C. Terboven, RWTH]
After an introduction to the principles of today's parallel computing architectures, the configuration of the new components of the RWTH Compute Cluster delivered by the company Bull will be explained.
|11:30-12:30||MPI: Introduction [H. Iliev, RWTH]
The Message Passing Interface (MPI) is the de-facto standard for programming large HPC Clusters. We will introduce the basic concepts and give an overview of some advanced features. Furthermore, we will introduce the TotalView debugger and a selection of performance tools.
|14:00-15:30||MPI: Advanced Issues|
|16:00-17:30||MPI: Hand-on practical|
|18:00-||Visit to Aachen Cathedral|
Day 2, Wednesday September 3: OpenMP
OpenMP is a widely used approach for programming shared memory architectures, which is supported by most compilers nowadays. We will cover the basics of the programming paradigm as well as some advanced topics, such as programming NUMA machines or clusters, coherently coupled with the vSMP software from ScaleMP. We will also cover a selection of performance and verification tools for OpenMP. The RWTH Compute Cluster comprises a large number of big SMP machines (up to 128 cores and 2 TB of main memory) as we consider shared memory programming a vital alternative for applications which cannot be easily parallelized with MPI. We also expect a growing number of application codes to combine MPI and OpenMP for clusters of nodes with a growing number of cores.
OpenMP [C. Terboven, RWTH]
|11:00-12:30||OpenMP: Advanced Issues|
|14:00-15:30||OpenMP: Hands-on practical|
Indiviually: 18.30-21.30: Aachener Domspringen (pole-vault competition; see http://www.netaachen.de/domspringen/)
Day 3, Thursday September 4: CUDA
Plenary Talk: HPC Software Development [C. Bischof, TU Darmstadt -The Importance of Software in High-Performance Computing]
Computers are getting ever more powerful, complex, and power-hungry. "Green IT" techniques aim at reducing operating costs, but do little to make computers easier to use. However, in the end it is software for real problems running on those computers that determines the impact realized from high-performance computing. In this talk, we illustrate recent developments in high-performance computing and advocate automated programming approaches to escape the so-called "software gap".
Through a combination of lectures and practicals this course provides an introduction to the development of CUDA programs for execution on NVIDIA GPUs. Topics covered in the lectures will include: an overview of GPU hardware and in particular SIMT multithreading and the different kinds of memory; thread blocks and warps; launching CUDA kernels; efficient use of shared memory; conditional code and warp divergence; parallel reductions; profiling program execution; availability of libraries; resources for further study.
|11:00-12:30||CUDA: Introduction [J. du Toit, NAG Manchester]|
|14:00-15:30||CUDA: Hands-on practical part I|
CUDA: Hands-on practical part II
Day 4, Friday September 5: OpenACC
OpenACC is a directive-based programming model for accelerators which enables delegating the responsibility for low-level (e.g. CUDA or OpenCL) programming tasks to the compiler. To this end, using the OpenACC API, the programmer can offload compute-intensive loops to an attached accelerator with little effort. The open industry standard OpenACC has been introduced in November 2011 and supports accelerating regions of code in standard C, C++ and Fortran. It provides portability across operating systems, host CPUs and accelerators.
During this workshop day, we will give an overview on OpenACC while focusing on NVIDIA GPUs. We will introduce the GPU architecture and explain very briefly how a usual CUDA program looks like. Then, we will dive into OpenACC and learn about its possibilities to accelerate code regions. We will cover topics such as offloading loops, managing data movement between host and device, tuning data movement and accesses, applying loop schedules, using multiple GPUs or interoperate with CUDA libraries. At the end, we will give an outlook to the OpenMP 4.0 standard that may include OpenMP for accelerators. Hands-on sessions are done on the RWTH Aachen GPU (Fermi) Cluster using PGI's OpenACC implementation.
|09:00-10:30||OpenACC Intro (GPU Introduction, OpenACC Basics (offloading), OpenACC Lab) [S. Wienke, RWTH]|
OpenACC Basics (data management) & OpenACC Advanced (heterogeneous computing)
|14:00-15:30||OpenACC Advanced (interoperability with CUDA & GPU libraries, loop schedules & launch configuration, maximizing global memory throughput)|
OpenACC Advanced (caching & tiling, multiple GPUs), comparison of OpenACC & OpenMP device constructs, OpenACC Lab