Getting started
Remote submission of IMAS HPC workflows may become a problem given the requirement for an IMAS environment installed as well as the wide amount of supercomputers available. To approach these issues, an approach with virtualized environment with IMAS installed and a remote submission system has been designed.
This tutorial describes the steps to submit an example uDocker IMAS workflow workflow image to a remote supercomputer. To do so we will make usage of uDocker and SUMI (SUbmission Manager for IMAS).
This tutorial assumes that the user has a function machine with a distribution of GNU/Linux installed.
The following tutorial has been tested in the following machines.
- Marconi @Cineca
Marconi Gateway @CINECA has some limitation regarding the SSH passwordless setup due to Kerberos authentication and AFS. The inability of setting it up limits the capacities of the libraries used by SUMI (Paramiko and Saga Python). However, it is intended to be solved in the coming future.
IMAS @ uDocker
We will make usage of uDocker. uDocker is based on Docker but it does not require root permissions. Therefore, it does not have some features as TCP/IP sniffing or other action that require root privileges. uDocker will allow to simulate an environment with root permissions (UID = 0) and install IMAS inside.
Therefore, the very first step is installing uDocker. uDokcer will be set up once and used many times
Installing uDocker
Some supercomputers do not allow outgoing connections. Because of this it may be required to download locally the following files and copy them manually to the cluster we want to connect to.
- https://download.ncg.ingrid.pt/webdav/udocker/udocker-1.1.2.tar.gz
- https://raw.githubusercontent.com/indigo-dc/udocker/devel/udocker.py
To access the example workflow copy the following IMAS image from Marconi Gateway and copy it to the targeted cluster.
- ~g2tomz/public/imas-installer-20180921112143.tar.xz
Once the files have been copied, edit the udocker.sh script. Set up the variables IMAS_IMAGE, UDOCKER_TARBALL and then run the script.
bash udocker.sh
This will install uDocker as well as create the IMAS container inside uDocker.
Known issues
When setting up a uDocker image locally be sure that there is enough quota for your user. Otherwise it may crash or show untar related problems as the following when running the udocker load command.
Error: failed to extract container:
Error: loading failed
Setting up SUMI
SUMI (SUbmission Manager for IMAS) is a tool developed for ITER IMAS (Integrated Modelling & Analysis Suite). SUMI aims to launch jobs remotely to HPC machines with a uDocker installation and it is able to manage data transfer among local machine and remote file system. SUMI allows to configure the connection to different supercomputers and configure jobs to be launched to those machines.
Installing SUMI locally is needed to be able to submit
SUMI depends on two free software libraries. Both work with Python 2.x, specifically versions newer than 2.7.x
Downloading SUMI
Download the source code from BitBucket
https://bitbucket.org/albertbsc/sumi/src/master/
Export the variable $SUMI_DIR with the path to SUMI and include the bin directory inside the PATH variable
#!bash
export SUMI_DIR=/path/to/sumi
export PATH=$PATH:$SUMI_DIR/bin
chmod +x $SUMI_DIR/bin/sumi
If you want these variables to persist includes the previous lines in your .bashrc file.
Passwordless configuration
SAGA and Paramiko require a passwordless connection to work.
Copy your public key to the remote host
ssh-copy-id -i ~/.ssh/id_rsa.pub username@example.com
jobs.conf
The configuration file jobs.conf located at local directory $HOME/.sumi/ contains the configuration for the jobs to be run. The sample configuration file located at $SUMI_DIR/conf/jobs.conf has the following content.
[test]
udocker = udocker.py
arguments =
cpus = 1
time = 1
threads_per_process = 1
servers.conf
The configuration file servers.conf located at local directory $HOME/.sumi/ contains the configuration for the servers where SUMI will connect The sample configuration file located at $SUMI_DIR/conf/servers.conf has the following content.
[machine]
server = example.com
user = username
manager = slurm
protocol = ssh
To configure the login node of y our cluster just specify the login node address, your user name and the name of the resource manager where the accepted are sge, slurm and pbs.
Remote execution
There can be different machines and different jobs specified. Every time that we run SUMI we have to specify to which supercomputer are we submitting to and which of the configured jobs are we going to submit. Otherwise SUMI does not run anything and is conservative. To specify the job and machine we have to use the parameters -j and -m respectively. For example to run a given job named testjob and a machine testmachine we should execute the following command:
sumi -r -j testjob -m testmachine
Be aware that SUMI is case sensitive when specifying the names.
Example case
In the case we have a container named "imas" and want to run a simple case interacting with the container image, configure a job "test" with the following parameters.
[test]
udocker = udocker.py
arguments = run imas /bin/echo 'hello'
cpus = 1
time = 1
threads_per_process = 1
Once configured, and configured you machine with specific parameters, run SUMI with the following code.
sumi -r -j test -m MACHINE_NAME
This will generate and STDOUT and STDERR file on your remote cluster with the output of the image id and "hello".