White Papers – Resource Management and Monitoring

On this page you will find PRACE White Papers related to Resource management and monitoring.

Title: Data Centre Infrastructure Monitoring

Authors: Norbert Meyer*
* Poznań Supercomputing and Networking Center, PSNC, Poland

Contributors: Andrzej Gośliński (PSNC), Radosław Januszewski (PSNC), Damian Kaliszan (PSNC), Ioannis Liabotis (GRNET), Jean-Philippe Nominé (CEA), François Robin (CEA), Gert Svensson (KTH, PDC), Torsten Wilde (LRZ), All PRACE partners contributing to the DCIM survey.

Abstract: Any data centre, especially an HPC centre, requires an advanced infrastructure which supports the efficient work of computing and data resources. Usually, the supporting environment is of the same complexity as other parts of the HPC data centre. The supporting environment includes chillers, coolers, pumps, valves, heat pumps, electrical distributions, UPSs, high and low voltage systems, dryers and air conditioning systems, flood, smoke and heat detectors, fire prevention systems and more. The variety of supporting equipment is very high, even higher than the IT infrastructure. This dictates the necessity to collect, integrate and monitor the instrumentation. In addition to monitoring, inventory system should be a part of each data centre. This report provides a summary of a DCIM survey collected from the most important HPC centres in Europe, analysis of controlling and monitoring software platforms available on the market with an assessment of the most wanted functionality from the users’ point of view. The analysis of requirements and potentially available functionality was summarised by a set of recommendations. Another critical issue is the policy and definition of the procedures to be implemented by the data centre owner and service provider to keep the necessary Service Level Agreement (SLA). Parts of the SLA should be reflected in the data centre infrastructure management. Apart from reliability and high availability you need to consider minimizing maintenance and operating costs, and the DCIM systems are very helpful for this purpose as well. The best practice information was presented at the “7th European Workshop on HPC Centre Infrastructures” organised in Garching (Germany) on April 2016. The list of recommendation and the conclusions chapters describes the essence of what should be expected from a well-designed DCIM system..

Download paper: PDF

Title: Resource Scheduling Best Practice in Hybrid Clusters

Authors: C. Cavazzonia, A. Federicob, D. Galettia, G. Morellib, A. Pierettib
a CINECA, via Magnanelli 6/3, 40033 Casalecchio di Reno, Italy
b CINECA, via dei Tizii 6/b, 00185 Roma, Italy

Abstract: HPC green thinking implies the reduction of power consumption which is in contrast to the need of an ever growing demand in terms of computational power. To circumvent this dilemma, several types of computing accelerators have been adopted. Using an accelerator means partial if not total code rewriting, with the aim of achieving a speed-up which would be difficult to attain with present CPU and hardware evolution. After an initial period of concern about this new technology, programmers have shown a growing interest in this field and several of the most used scientific codes have undergone an intense software restyling. Accelerators have introduced a new class of requests which need to be fulfilled by resource schedulers on hybrid clusters. Hence, exploring what schedulers can offer in terms of minimizing the effort and maximizing the resource exploitation has become a fundamental issue. As the CINECA supercomputing center runs a new generation hybrid cluster with two different accelerators, i.e. GPUs and MICs, it is involved in testing a resource scheduler, PBSPro in order to put its cluster to the best possible use.

Download paper: PDF

Title: Topologically Aware Job Scheduling for SLURM

Authors: Seren Soner1, Can Ozturan1*
1Computer Engineering Department, Bogazici University, Istanbul, Turkey

Abstract: SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list. In this work, we describe our new AUCSCHED3 SLURM scheduler plug-in that extends our earlier AUCSCHED2 plug-in with a capability to do topologically aware mappings of jobs on hierarchically interconnected systems like trees or fat trees. Our approach builds on our previous auction based scheduling algorithm of AUCSCHED2 and generates bids for topologically good mappings of jobs onto the resources. The priorities of the jobs are also adjusted slightly without changing the original priority ordering of jobs so as to favour topologically better candidate mappings. SLURM emulation results are presented for a heterogeneous 1024 node system which has 16 cores and 3 GPUs on each of its nodes. The results show that our heuristic generates better topological mappings than SLURM/Backfill. AUCSCHED3 is available at http://code.google.com/p/slurm-ipsched/.

Download paper: Download paper: PDF

Title: Extending SLURM with Support for GPU Ranges

Authors: Seren Soner,Can Ozturan, Itir Karac
Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: SLURM resource management system is used on many TOP500 supercomputers. In this work, we present enhancementsthat we added to our AUCSCHED heterogeneous CPU-GPU scheduler plug-in whose -rst version was released inDecember 2012. In this new version, called AUCSCHED2, two enhancements are contributed: The -rst is the extensionof SLURM to support GPU ranges. The current version of SLURM supports speci-cation of node range but not of GPUranges. Such a feature can be very useful to runtime auto-tuning applications and systems that can make use of variablenumber of GPUs. The second enhancement involves the implementation of a new integer programming formulation inAUCSCHED2 that drastically reduces the number of variables. This allows faster solution and larger number of bids tobe generated. SLURM emulation results are presented for the heterogeneous 1408 node Tsubame supercomputer whichhas 12 cores and 3 GPU’s on each of its nodes. AUCSCHED2 is available at Download paper: http:/.

Download paper: PDF

Title: An Auction Based SLURM Scheduler for Heterogeneous Supercomputers and its Comparative Performance Study

Authors: Seren Soner,Can Ozturan, Itir Karac
Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: SLURM is a resource management system that is used on many TOP500 supercomputers. We present a heterogeneousCPU-GPU scheduler plug-in, called AUCSCHED, for SLURM that implements an auction based algorithm. In orderto tune the topological mapping of jobs to resources, our plug-in determines at scheduling time, for each job, the bestresource choices based on node contiguity from available ones. Each of these choices is then expressed as a bid that ajob makes in an auction. Our algorithm takes a window of jobs from the front of the job queue, generates multiple bidsfor available resources for each job, and solves an assignment problem that maximizes an objective function involvingpriorities of jobs. We generate several CPU-GPU synthetic workloads and perform realistic SLURM emulation teststo compare the performance of our auction based scheduler with that of SLURM’s own back—ll scheduler. In general,AUCSCHED has a few percentage points of better utilization over SLURM/BF plug-in but topologically SLURM/BFis leading to less fragmentation whereas AUCSCHED is leading to less spread. SLURM’s as well as our plug-in producehigh utilizations around 90% when workloads are made up of jobs requesting no more than 1 GPU per node. On theother hand, when workloads contain jobs that request 2 GPUs per node, it is observed that the system utilization dropsdrastically to the 65-75% range both when our AUCSCHED and SLURM’s own plug-in are used. This points to theneed to further study of scheduling jobs that utilize multiple GPU cards on nodes. Our plug-in which builds on ourearlier plug-in called IPSCHED is available at: http:/.

Download paper: PDF

Title: Topology Aware Task-To-Processor Assignment

Authors: Reha OguzSelvitopi, Ata Turk, Altay Guvenir, Cevdet Aykanat
Bilkent University, ComputerEngineering Department, 06800 Ankara, Turkey

Abstract: Topology aware mapping has started to attain interest again by the development of supercomputers whose topologies consist of thousands of processors with large diameters. In such parallel architectures, it is possible to obtain performance improvements for the executed parallel programs via careful mapping of tasks to processors by considering properties of the underlying topology and the communication pattern of the mapped program. One of the most widely used metric for capturing a parallel program’s communication overhead is the hop-bytes metric which takes the processor topology into account which is in contrast to the assumptions made by the wormhole routing. In this work, we propose a KL-based iterative improvement heuristic for mapping tasks of a given program to the processors of the parallel architecture where the objective is the reduction of the communication volume that is modeled with the hop-bytes metric. We assume that the communication pattern of the program is known beforehand and the processor topology information is available. The algorithm basically tries to improve a given initial mapping with a number of successive task swaps defined within a given processor neighborhood. We test our algorithm for different number of tasks and processors and demonstrate its results by comparing it to random mapping, which is widely used in recent supercomputers.

Download paper: PDF

Title: Integer Programming Based Heterogeneous CPU-GPU Cluster Scheduler for SLURM Resource Manager

Authors: Seren Soner,Can Ozturan, Itir Karac
Computer Engineering Department,Bogazici University, Istanbul, Turkey

Abstract: We present an integer programming based heterogeneous CPU-GPU cluster scheduler for the widely used SLURM resource manager. Our scheduler algorithm takes windows of jobs and solves an allocation problem in which free CPU cores and GPU cards are allocated collectively to jobs so as to maximize some objective function. We perform realistic SLURM emulation tests using the Effective System Performance (ESP) workloads. The test results show that our scheduler produces better resource utilization and shorter average job waiting times. The SLURM scheduler plug-in that implements our algorithm is available at http:/.

Download paper: PDF

Title: Design Development and Improvement of Nagios System Monitoring for Large Clusters

Authors: DanielaGaletti, Federico Paladin
SuperComputing Applications andInnovation Dept., CINECA, Bologna, Italy

Abstract: This document describes the work of design, development and improvement of the Nagios moni toring system done in Cineca and used for the Tier-1 systems participating in the PRACE projects. Starting from the issues arisen by the complexity of the HPC systems and the related monitoring activities, the targeted solutions and their implementation are explained. The most important aspects of the implementation and the specific issues related to HPC will be described with a specific attention to the exascale clusters.

Download paper: PDF