Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Remote submission of IMAS HPC workflows may become a problem given the requirement for an IMAS environment installed as well as the wide amount of supercomputers availablebe challenging due to main factors: connecting and communicating with heterogenous supercomputers, and being able to run the IMAS environment inside a supercomputer within the user space. To approach these issues, an approach with virtualized environment with IMAS installed and a remote submission system has been designed.

 

The use case considered is: an IMAS developer working on a environment with IMAS installed . These environments wants to run an IMAS workflow on a big supercomputer. The environment were the developer will be working will be mainly the Marconi Gateway paritiond, ITER cluster and  and occasionally a local user computer. From there, the workflow will be submitted from a remote supercomputer (mainly PRACE HPC facilities).

 

 

Image Removed

 

 

Getting started

This tutorial describes the steps to submit an example uDocker IMAS workflow workflow image to a remote supercomputer. To do so we will make usage of uDocker and SUMI Fig. 1 describes the scheme of the methodology developed to submit remote workflows. Working on the Gateway, ITER cluster or local machine, we configure SUMI tools (SUbmission Manager for IMAS) .

These work on different sides of the system: local machine and remote HPC system:

  • Connect from a local computer to a remote cluster to submit my workflow: IMAS
  • Bring IMAS environment to supercomputer heterogenous systems: uDocker image

This tutorial assumes that the user has a function machine with a distribution of GNU/Linux installed.

which allow the submission of jobs to remote queuing systems of supercomputers. These supercomputers will run the IMAS workflow using a Docker image which will be running on top of uDocker.

 

Image Added

Figure 1. Scheme for the remote submission of IMAS workflows involving HPC codes.

 

Getting started

This tutorial describes the steps to submit an example uDocker IMAS workflow workflow image from Gateway or ITER cluster to a remote supercomputer. To do so we will make usage of uDocker and SUMI (SUbmission Manager for IMAS).

These work on different sides of the system: local machine and remote HPC system:

  • Connect from a local computer to a remote cluster to submit my workflow: IMAS
  • Bring IMAS environment to supercomputer heterogenous systems: uDocker image

The following tutorial has been tested The following tutorial has been tested with the following machines.

  • ITER cluster @ITER
  • Marconi @Cineca
  • Marconi Gatway @Cineca
  • Eagle @PSNC

This The tutorial follows the next steps

...

SUMI is a tool capable of submitting jobs to remote HPC clusters and upload and retrieve data from them.

Install SUMI on Marconi Gateway

The code is available in the following link:

https://bitbucket.org/albertbsc/sumi/

New releases of the code can be found in the following link:

https://bitbucket.org/albertbsc/sumi/downloads/

The following subsections describe how to install it for Gateway and ITER cluster.

Install SUMI on Marconi Gateway

This subsection This subsection describes how to install SUMI on Marconi Gateway. If you want to install it on ITER cluster, please move to the following subsection.

In this tutorial the local computer used will be Marconi Gateway. Given that SUMI uses 2.7 python version, first we will load the correspondent module

Code Block
languagebash
module load python/2.7.12

SUMI depends on two powerfull powerful libraries to perform its tasks. These are Paramiko (data transfer) and SAGA (remote job submission). To install the dependencies we need to download python libraries, but given that we do not have root permissions we will make usage of a virtual environment which will allow to install these libraries locally. For this purpose we will use "virtualenv". It creates a python environment in local folders and allows to create a virtual python environment where we can install libraries locally. To set it up we run the following commands:

...

Code Block
languagebash
source sumi-virtualenv/bin/activate.csh

 

Our terminal prompt will now show the folder name in front of out username in the following way:

...

Once the dependencies have been installed, we can download and configure SUMI. To retrieve the code , clone the code from the repositoryrun the following command

Code Block
wget -qO- https://bitbucket.org/albertbsc/sumi/downloads/sumi-0.1.0.tar.gz | tar xvz

...

Code Block
languagebash
setenv PATH $PATH\:$HOME/sumi/bin

For Bash shells

Code Block
languagebash
export PATH=$PATH:$PWD/sumi/bin/

SUMI requires two configuration job files which contain the information of the jobs to be sumitted submitted and the HPC cluster where we are going to submit. For this we need to create the configuration folder and copy the configuration files jobs.conf and servers.conf from sumi/conf/ directory.

...

Now, we are ready to run SUMI. Execute the option "-h" to see all the options

 

Code Block
$ sumi -h
usage: sumi.py [-h] [-r] [-u UP] [-d DOWN] [-m MACHINE [MACHINE ...]]
               [-j JOBS [JOBS ...]]

Submission Manager for IMAS
optional arguments:
  -h, --help            show this help message and exit
  -r, --run             Run the configured job on a specific cluster
  -u UP, --upload UP    Upload local file
  -d DOWN, --download DOWN
                        Download remote file
  -m MACHINE [MACHINE ...], --machine MACHINE [MACHINE ...]
                        List of machines were to submit
  -j JOBS [JOBS ...], --job JOBS [JOBS ...]
                        List of jobs to be submitted

...

Code Block
languagebash
module load python/2.7.15

SUMI depends on two powerfull powerful libraries to perform its tasks. These are Paramiko (data transfer) and SAGA (remote job submission). To install the dependencies we need to download python libraries, but given that we do not have root permissions we will make usage of a virtual environment the option "–user" for pip, which will allow to install these libraries locally. For this purpose we will use "virtualenv". It creates a python environment in local folders and allows to create a virtual python environment where we can install libraries locally. To set it up we run the following commands:To install the dependencies, now we can run the "pip" command To install the dependencies, now we can run the "pip" command which will install the python libraries in our local virtualenv

Code Block
pip install saga-python==0.50.01 --user
pip install paramiko --user

Once the dependencies have been installed, we can download and configure SUMI. To retrieve the code , clone the code from the repositoryrun the following command

Code Block
wget -qO- https://bitbucket.org/albertbsc/sumi/downloads/sumi-0.1.0.tar.gz | tar xvz

...

SUMI requires two configuration job files which contain the information of the jobs to be sumitted submitted and the HPC cluster where we are going to submit. For this we need to create the configuration folder and copy the configuration files jobs.conf and servers.conf from sumi/conf/ directory.

...

Passwordless configuration with Marconi


To use SUMI we need Because SUMI uses SSH channels to submit jobs and to transfer data we need a passwordless connection . This means that to avoid the password requirement for every step. To achieve so the targeted HPC machine needs to have the public key of our account at Gateway on the list of allowed keys. If we don't have a ~/.ssh/id_rsa.pub file * file in our Gateway/ITER account we have to generate our keys

Code Block
languagebash
ssh-keygen


This will generate ~/.ssh/id_rsa files. To copy our public key in the list of allowed connections we will make usage of the "ssh-copy-id" command to copy from Gateway to Marconi

Code Block
languagebash
ssh-copy-id username@login.marconi.cineca.it

Now, if we stablish establish an SSH connection, the prompt will not ask for any password and will give us a terminal inside Marconi.

...

Known issues

When setting up a uDocker Docker image locally be sure that there is enough quota for your user. Otherwise it may crash or show untar related problems as the following when running the udocker load command.

    Error: failed to extract container:
    Error: loading failed

...

The Docker image is avaiable at Gateway and you can use it on any machine, including your own laptop

 

Code Block
ls ~g2tomz/public/imas-installer*

...

Code Block
$HOME/.local/bin/udocker load -i imas-installer-20180921112143.tar.xz

$HOME/.local/bin/udocker create --name=imas imas-installer:20180921112143

...

 

Test image on Marconi

Once the image has been loaded we can interact with it and get a bash inside the container. To interact with the image we can get a bash with the following command

...

Code Block
module load imas kepler
module load keplerdir
imasdb test
export USER=imas
kepler -runwf -nogui -user imas /home/imas/simple-workflow.xml

...

Running these commands inside the terminal will make the workflow start running

Configure and submit workflow, then check output on GW

To configure a job we have to edit the files copied on the

Code Block
mkdir $HOME/.sumi
cp sumi/conf/*.conf $HOME/.sumi

jobs.conf

...

Code Block
    [test]
    udocker = udocker.py
    arguments =
    cpus = 1
    time = 1
    threads_per_process = 1

servers.conf

...

. This will mean that we will be running a Kepler workflow inside a PRACE machine which does not have the IMAS environment installed. But we will be running inside the Docker image. This is a sample of the output:

 

Code Block
$HOME/.local/bin/udocker run imas /bin/bash
 
 ******************************************************************************
 *                                                                            *
 *               STARTING f3e7e0cb-ea21-3e9c-a826-7e5256354c57                *
 *                                                                            *
 ******************************************************************************
 executing: bash
f3e7e0cb$ module load imas kepler
f3e7e0cb$ module load keplerdir
f3e7e0cb$ imasdb test
f3e7e0cb$ export USER=imas
f3e7e0cb$ kepler -runwf -nogui -user imas /home/imas/simple-workflow.xml
The base dir is /home/imas/keplerdir/kepler/kepler
Kepler.run going to run.setMain(org.kepler.Kepler)
JVM Memory: min = 1G,  max = 8G, stack = 20m, maxPermGen = default
adding $CLASSPATH to RunClassPath: /usr/share/java/jaxfront/JAXFront Eclipse Example Project/lib/xercesImpl.jar:/usr/share/java/jaxfront/JAXFront Eclipse Example Project/lib/jaxfront-swing.jar:/usr/share/java/jaxfront/JAXFront Eclipse Example Project/lib/jaxfront-core.jar:/home/imas/imas/core/imas/3.20.0/ual/3.8.3/jar/imas.jar:/usr/share/java/saxon/saxon9he.jar:/usr/share/java/saxon/saxon9-test.jar:/usr/share/java/saxon/saxon9-xqj.jar
      [run] log4j.properties found in CLASSPATH: /home/imas/keplerdir/kepler/kepler/kepler-2.5/resources/log4j.properties
      [run] Initializing Configuration Manager.
      [run] Setting Java Properties.
      [run] Copying Module Files.
      [run] Initializing Module: core.
      [run] Initializing Module: gui.
      [run] Kepler Initializing...
      [run] Starting HSQL Server for hsqldb
      [run] INFO  (org.kepler.util.sql.HSQL:_getConnection:771) started HSQL server at jdbc:hsqldb:hsql://localhost:24131/hsqldb;filepath=hsqldb:file:/home/imas/.kepler/cache-2.5/cachedata/hsqldb
      [run] Starting HSQL Server for coreDB
      [run] INFO  (org.kepler.util.sql.HSQL:_getConnection:771) started HSQL server at jdbc:hsqldb:hsql://localhost:44781/coreDB;filepath=hsqldb:file:/home/imas/KeplerData/modules/core/db-2.5/coreDB
      [run] Debug execution mode
      [run]
      [run] Synchronized execution mode: true
      [run] Wait for Python to finish: true
      [run] Out idx chosen from first input IDS: true
      [run] Input IDSs slice mode: false
      [run] Output IDSs slice mode: false
      [run] creation of a temporary file...

 

Configure and submit workflow, then check output on GW

To configure a job we have to edit the files copied on the

Code Block
mkdir $HOME/.sumi
cp sumi/conf/*.conf $HOME/.sumi

jobs.conf

The configuration file jobs.conf located at local directory $HOME/.sumi/ contains the configuration for the jobs to be run. The sample configuration file located at $SUMI_DIR/conf/jobs.conf has the following content.

Code Block
[test]
udocker = udocker.py
arguments =
cpus = 1
time = 1
threads_per_process = 1

servers.conf

The configuration file servers.conf located at local directory $HOME/.sumi/ contains the configuration for the servers where SUMI will connect The sample configuration file located at $SUMI_DIR/conf/servers.conf has the following content.


Code Block
[machine]
server = example.com
user = username
manager = slurm
protocol = ssh
upload_files =
upload_to =
download_files =
download_to =

To configure the login node of the remote supercomputer just specify the login node address, your user name and the name of the resource manager where the accepted are sge, slurm and pbs.

SUMI allows to upload and download files automatically before and after the execution. For this we can assume a directory "mywf" in our local directory where we have our script.sh with all the instructions we want to run as shown below

Code Block
#!/bin/bash

module load imas kepler
module load keplerdir
imasdb test
export USER=imas
kepler -runwf -nogui -user imas /home/imas/simple-workflow.xml

Inside the "wf" directory we will also have a new version of our "simple-workflow.xml" file which will overwrite the existing one. These files will be copied inside the image which is contained in the directory ".udocker/containers/imas/ROOT/". Once our job has finished we want to copy the IDS files to our local Gateway/ITER. The following configuration is for a Gateway machine, but this only affect to the paths of "upload_files" and "download_to"

Code Block
[marconi]
server = login.marconi.cineca.it
user = USER
manager = slurm
protocol = ssh
upload_files = /afs/eufus.eu/g2itmdev/USER/mywf/*
upload_to = /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/
download_files = /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/*
download_to = /afs/eufus.eu/g2itmdev/user/USER/public/imasdb/test/3/0/

The job configuration is the following

Code Block
[test]
udocker = $HOME/.local/bin/udocker
arguments = run imas /bin/bash -l script.sh
cpus = 1
time = 20
threads_per_process = 1

Once the job has been configured we can run it using the following command

Code Block
sumi -r -j test -m marconi

 

The output will first show how the job is being configured and how the connection is set up

No Format
2018/12/19 17:09:47 INFO     SUMI: Starting
2018/12/19 17:09:47 INFO     SUMI: Reading local configuration
2018/12/19 17:09:47 INFO     Job: configuring
2018/12/19 17:09:52 INFO     SUMI: uploading files
2018/12/19 17:09:52 INFO     Connected (version 2.0, client OpenSSH_6.6.1)
2018/12/19 17:09:53 INFO     Authentication (publickey) successful!

Copy the files from Gateway to Marconi

 

No Format
2018/12/19 17:09:54 INFO     [chan 0] Opened sftp connection (server version 3)
2018/12/19 17:09:54 INFO     SUMI: scp /afs/eufus.eu/g2itmdev/user/USER/mywf/simple-workflow.xml   /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/
2018/12/19 17:09:54 INFO     SUMI: scp /afs/eufus.eu/g2itmdev/user/USER/mywf/script.sh   /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/
2018/12/19 17:09:54 INFO     [chan 0] sftp session closed.

 

Submit the job to the remote queueing system of Marconi and wait for the job to finish

No Format
2018/12/19 17:09:59 INFO     Job: starting
2018/12/19 17:09:59 INFO     Job: ID [slurm+ssh://login.marconi.cineca.it]-[3264873]
2018/12/19 17:10:03 INFO     Job: state Pending
2018/12/19 17:10:03 INFO     Job: waiting
2018/12/19 17:17:13 INFO     Job: State Done
2018/12/19 17:17:13 INFO     Job: Exitcode 0

Once it has finished SUMI retrieves the results and finishes

No Format
2018/12/19 17:17:13 INFO     SUMI: downloading files
2018/12/19 17:17:13 INFO     Connected (version 2.0, client OpenSSH_6.6.1)
2018/12/19 17:17:14 INFO     Authentication (publickey) successful!
2018/12/19 17:17:14 INFO     [chan 1] Opened sftp connection (server version 3)
2018/12/19 17:17:14 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_10001.characteristics   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:23 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_10001.datafile   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:23 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_10001.tree   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:31 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_19999.characteristics   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:39 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_19999.datafile   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:39 INFO     SUMI: scp /marconi/home/userexternal/USER/.udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_19999.tree   /afs/eufus.eu/g2itmdev/user/user/public/imasdb/test/3/0/
2018/12/19 17:17:47 INFO     [chan 1] sftp session closed.
2018/12/19 17:17:47 INFO     SUMI: Done

 

Once we have them we can check whether the results are correct by running idsdump this will show the correct structure of the IDS generated which will demonstrate that the stricture is correct and that they were generated correctly.

Code Block
idsdump 1 1 pf_active

This will check that the whole process produced valid IDS.

 

Run as MPI job

The goal is to submit HPC demanding workflows to supercomputers. Therefore, the image must give support to code which run MPI codes. When running an MPI job, the MPI libraries can be installed in. The uDocker instructions describe how to install OpenMPI inside the image, but the aim is to use the MPI libraries of the host. One of the main challenges when running MPI codes is using the MPI libraries of the host system. That is because MPI libraries and configuration is optimized for the underlying system

When binding the paths to allow the uDocker image to use the MPI libraries is important to export all the environment variables. This is because there are many MPI variables that are loaded in the enviornment variables and then used by the compiler to access the parameters and determine the behaviour. To tell udocker to get the enviornment varibale we wll use the paramether "–hostenv".

The following sections describe how to run the MPI codes inside a uDocker image in a cluster in different supercomputers

Marconi

As mentioned before, Marconi is one of the targeted supercomputers for this project. In this example the loaded variables will be the Intel MPI library.

Because the binary will use the underlying enviornment and the lirbaries will be loadaded, then the binary that will be used can be compiled in the host system and then run on the guest system.

Code Block
languagebash
module load intel/pe-xe-2018--binary
module load intelmpi/2018--binary
mpicc code.c -o binary

 

In the case of Marconi by running the command "module show" it can be checked where the Intel libraries are installed to allow the image to access the path. In this case the path is /cineca/prod/opt/compilers/intel.  This path must be accessible inside uDocker to retrieve all the necessery data, because of that it needs to be specified the "-v" paramethers allowing the mount inside the image a path from the host. In the described case, to get a bash command line and run the following command

Code Block
languagebash
udocker.py run --hostenv -v /cineca/prod/opt/compilers/intel/ -w /cineca/prod/opt/compilers/intel/  imas /bin/bash

 

MareNostrum IV

MareNostrum IV has a different setup but a similar approach. In this case the test usage will be also the Intel MPI library

Code Block
languagebash
module load intel/2017.4
module load impi/2017.4 

 

The modules have to be loaded in the host system to define the environment variables. Once it is done we can launch a shell script which runs the MPI code inside using the following way.

 

Code Block
languagebash
udocker.py run --hostenv -v /gpfs/apps/MN4/INTEL/2017.4/compilers_and_libraries_2017.4.196/linux/ -v $HOME -w $HOME  imaswf /bin/bash -c "chmod +x launch_mm.sh; ./launch_mm.sh"

 

 

Code Block
[machine]
server = example.com
user = username
manager = slurm
protocol = ssh
upload_files =
upload_to =
download_files =
download_to =

 

To configure the login node of y our cluster just specify the login node address, your user name and the name of the resource manager where the accepted are sge, slurm and pbs.

SUMI allows to upload and download files automatically. For this we can assume a directory "mywf" in our remote Marconi directory and another one in our local Gateway account, as well as a mywfresults folder on Gateway. This will

 

Code Block
[marconi]
server = login.marconi.cineca.it
user = my_username
manager = slurm
protocol = ssh
upload_files = /afs/eufus.eu/g2itmdev/user/my_username/mywf/*
upload_to = /marconi/home/userexternal/agutierr/mywf/
download_files = /marconi/home/userexternal/agutierr/mywf/test*
download_to = /afs/eufus.eu/g2itmdev/user/g2agutie/mywfresults/

Once the job has been configured we can run it using the following command

Code Block
sumi -r -j test -m marconi

 

This will copy the files, run the workflow and retrieve the output results. Once we have them we can check whether the

Code Block
idsdump 1 1 pf_active

And this will show the correct structre of the IDS generated.