You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

1. Getting started

Remote submission of IMAS HPC workflows may become a problem given the requirement for an IMAS environment installed as well as the wide amount of supercomputers available. To approach these issues, an approach with virtualized environment with IMAS installed and a remote submission system has been designed.

This tutorial describes the steps to submit an example uDocker IMAS workflow workflow image to a remote supercomputer. To do so we will make usage of uDocker and SUMI (SUbmission Manager for IMAS).

These work on different sides of the system: local machine and remote HPC system:

  • Connect from a local computer to a remote cluster to submit my workflow: IMAS
  • Bring IMAS environment to supercomputer heterogenous systems: uDocker image

This tutorial assumes that the user has a function machine with a distribution of GNU/Linux installed.

The following tutorial has been tested in the following machines.

  • Marconi @Cineca
  • Marconi Gatway @Cineca
  • Eagle @PSNC

This tutorial follows the next steps

  1. Install SUMI on the Gateway

  2. Test connection with Marconi

  3. Install uDocker on Marconi

  4. Test image on Marconi

  5. Configure sample job
  6. Submit workflow

  7. Retrieve data

  8. Test output on GW

     

2. Install SUMI on the Gateway

SUMI is a tool capable of submitting jobs to remote HPC clusters and upload and retrieve data from them.

In this tutorial the local computer used will be Marconi Gateway. Given that SUMI uses 2.7 python version, first we will load the correspondent module

module load python/2.7.12

SUMI depends on two powerfull libraries to perform its tasks. These are Paramiko (data transfer) and SAGA (remote job submission). To install the dependencies we need to download python libraries, but given that we do not have root permissions we will make usage of a virtual environment which will allow to install these libraries locally. For this purpose we will use "virtualenv". Virtualenv creates a python environment in local folders and allows to create a virtual python environment where we can install libraries locally. To set it up we run the following commands:

 

    mkdir sumi-virtualenv

    virtualenv sumi-virtualenv

Once the virtualenv folder has been configured, we can load the environment.

In the case of TCSH shells (as Gateway)

source sumi-virtualenv/bin/activate.csh

In the case of Bash shells (most used shells)

    source sumi-virtualenv/bin/activate

Our terminal prompt will now show the folder name in front of out username in the following way:

    [sumi-virtualenv] <g2user@s65

To install the dependencies, now we can run the "pip" command which will install the python libraries in our local virtualenv

    pip install saga-python==0.50.01 paramiko

Once the dependencies have been installed, we can download and configure SUMI. To retrieve the code, clone the code from the repository

    git clone https://albertbsc@bitbucket.org/albertbsc/sumi.git

 

This will create a local folder named "sumi". To include it in the $PATH environment variable we need to run the software.

For TCSH shell systems (as Gateway)

setenv PATH $PATH\:$HOME/sumi/bin

For Bash shells

export PATH=$PATH:$PWD/sumi/bin/

SUMI requires two configuration job files which contain the information of the jobs to be sumitted and the HPC cluster where we are going to submit. For this we need to create the configuration folder and copy the configuration files jobs.conf and servers.conf from sumi/conf/ directory.

 

    mkdir $HOME/.sumi

    cp sumi/conf/*.conf $HOME/.sumi

 

Now, we are ready to run SUMI. Execute the option "-h" to see all the

 

sumi -h

usage: sumi.py [-h] [-r] [-u UP] [-d DOWN] [-m MACHINE [MACHINE ...]]

               [-j JOBS [JOBS ...]]

Submission Manager for IMAS

optional arguments:

  -h, --help            show this help message and exit

  -r, --run             Run the configured job on a specific cluster

  -u UP, --upload UP    Upload local file

  -d DOWN, --download DOWN

                        Download remote file

  -m MACHINE [MACHINE ...], --machine MACHINE [MACHINE ...]

                        List of machines were to submit

  -j JOBS [JOBS ...], --job JOBS [JOBS ...]

                        List of jobs to be submitted

 

 

Then, SUMI has been installed successfully.

 

3. Configure sample job and big Marconi connection

To configure a job we have to edit the files copied on the

3.1. jobs.conf

The configuration file jobs.conf located at local directory $HOME/.sumi/ contains the configuration for the jobs to be run. The sample configuration file located at $SUMI_DIR/conf/jobs.conf has the following content.

    [test]
    udocker = udocker.py
    arguments =
    cpus = 1
    time = 1
    threads_per_process = 1

3.2. servers.conf

The configuration file servers.conf located at local directory $HOME/.sumi/ contains the configuration for the servers where SUMI will connect The sample configuration file located at $SUMI_DIR/conf/servers.conf has the following content.


[machine]
server = example.com
user = username
manager = slurm
protocol = ssh
upload_files =
upload_to =
download_files =
download_to =

 

To configure the login node of y our cluster just specify the login node address, your user name and the name of the resource manager where the accepted are sge, slurm and pbs.

SUMI allows to upload and download files automatically. For this we can assume a directory "mywf" in our remote Marconi directory and another one in our local Gateway account, as well as a mywfresults folder on Gateway. This will

 

[machine]
server = login.marconi.cineca.it
user = my_username
manager = slurm
protocol = ssh
upload_files = /afs/eufus.eu/g2itmdev/user/my_username/mywf/*
upload_to = /marconi/home/userexternal/agutierr/mywf/
download_files = /marconi/home/userexternal/agutierr/mywf/test*
download_to = /afs/eufus.eu/g2itmdev/user/g2agutie/mywfresults/

 

4. Test connection with Marconi

To use SUMI we need a passwordless connection. This means that Marconi needs to have the public key of our account at Gateway on the list of allowed keys. To copy our key we make usage of the "ssh-copy-id" command

ssh-copy-id username@login.marconi.cineca.it

Now, if we stablish an SSH connection, the prompt will not ask for any password and will give us a terminal inside Marconi.

    ssh username@login.marconi.cineca.it

 

 

4.1. Installing uDocker

We will make usage of uDocker. uDocker is based on Docker but it does not require root permissions. Therefore, it does not have some features as TCP/IP sniffing or other action that require root privileges. uDocker will allow to simulate an environment with root permissions (UID = 0) and install IMAS inside.

Therefore, the very first step is installing uDocker. uDokcer will be set up once and used many times

Some supercomputers do not allow outgoing connections. Because of this it may be required to download locally the following files and copy them manually to the cluster we want to connect to.


To access the example workflow copy the following IMAS image from Marconi Gateway and copy it to the targeted cluster.

  • ~g2tomz/public/imas-installer-20180921112143.tar.xz


Once the files have been copied, edit the setup_udocker.sh script. Set up the variables IMAS_IMAGE, UDOCKER_TARBALL and then run the script.

    bash setup_udocker.sh

This script will install udocker but it will not load the Docker image inside. For this, we need to run the following commands

     $SUMI_UDOCKER_DIR/udocker.py load -i $IMAS_IMAGE

     $SUMI_UDOCKER_DIR/udocker.py create --name=imas imas-installer:20180529181447

 

4.2. Known issues

When setting up a uDocker image locally be sure that there is enough quota for your user. Otherwise it may crash or show untar related problems as the following when running the udocker load command.

    Error: failed to extract container:
    Error: loading failed

5. Setting up SUMI

SUMI (SUbmission Manager for IMAS) is a tool developed for ITER IMAS (Integrated Modelling & Analysis Suite). SUMI aims to launch jobs remotely to HPC machines with a uDocker installation and it is able to manage data transfer among local machine and remote file system. SUMI allows to configure the connection to different supercomputers and configure jobs to be launched to those machines.

Installing SUMI locally is needed to be able to submit


SUMI depends on two free software libraries. Both work with Python 2.x, specifically versions newer than 2.7.x

5.1. Downloading SUMI

Download the source code from BitBucket

https://bitbucket.org/albertbsc/sumi/src/master/

Export the variable $SUMI_DIR with the path to SUMI and include the bin directory inside the PATH variable


    #!bash
    export SUMI_DIR=/path/to/sumi
    export PATH=$PATH:$SUMI_DIR/bin
    chmod +x $SUMI_DIR/bin/sumi


If you want these variables to persist includes the previous lines in your .bashrc file.

5.2. Passwordless configuration


SAGA and Paramiko require a passwordless connection to work.

Copy your public key to the remote host

    ssh-copy-id -i ~/.ssh/id_rsa.pub username@example.com

 

6. Remote execution


There can be different machines and different jobs specified. Every time that we run SUMI we have to specify to which supercomputer are we submitting to and which of the configured jobs are we going to submit. Otherwise SUMI does not run anything and is conservative. To specify the job and machine we have to use the parameters -j and -m respectively. For example to run a given job named testjob and a machine testmachine we should execute the following command:

    sumi -r -j testjob -m testmachine

Be aware that SUMI is case sensitive when specifying the names.

 

6.1. Example case


In the case we have a container named "imas" and want to run a simple case interacting with the container image, configure a job "test" with the following parameters.


    [test]
    udocker = udocker.py
    arguments = run imas /bin/echo 'hello'
    cpus = 1
    time = 1
    threads_per_process = 1


Once configured, and configured you machine with specific parameters, run SUMI with the following code.

    sumi -r -j test -m MACHINE_NAME

This will generate and STDOUT and STDERR file on your remote cluster with the output of the image id and "hello".

 

  • No labels