ANAKIN: Is it possible to learn this power?
PALPATINE: Not from a Jedi.

Star Wars: Revenge of the Sith

This tutorial is about working with IMAS in an isolated environment (containers), which you can instantiate anytime, copy and reuse with little effort on different machines. After this tutorial you will gain knowledge of:

different virtualization paradigms,
using Docker tool to instantiate and manage containers,
working with IMAS running in a container.

Virtualization and Containerization

Virtualization is a process of simulating some logical resources (CPU, memory, storage). Usually, this term refers to running whole operating systems (often multiple at once) on the same set of physical resources. This provides two major benefits. First of all, the computing/storage power of the physical resources is usually better utilized by multiple workloads coming from different virtual operating systems. Second, virtualization provides the benefit of isolating whole environment from the host and from other guest systems. This elevates security and makes it easier to maintain quality services running in dedicated environments. Furthermore, as virtualization becomes more and more popular, standardized approaches begin to appear, which improves interoperability and ease of use.

A related technique of deploying isolated environments is called containerization. In this approach, the host operating system encapsulates an application and its own environment. Note, that containers do not emulate the full operating system stack with hardware drivers, but rather it reuses what host OS provides. This makes containers much more lightweight, quick to instantiate and introducing less overhead. At the same time, the benefits of virtualization are still there. The same physical resources can host multiple running containers, each being isolated from the rest with their own dedicated environments and settings.

Docker

Docker is the most popular project related with containerization. It established technical details of container image format, of the Dockerfile (a recipe on how to create an image) and of the Docker tool to manage images and running containers. It also made it possible to arrange multiple containers in a well-defined network to cooperate to reach a common goal. Currently, Docker has several alternatives, but majority of them support Docker formats and mimic Docker tool behavior as it became a de facto standard.

Installation

Please follow the steps from official documentation:

Images

All running containers start from some base image. You can find lots of open source images at Docker Hub including basic OSes (ubuntu, centos), popular database engines (postgres, mysql) and others.

To download an image from Docker Hub, use: docker pull <image-name>
To list available images, use: docker images
To remove image, use: docker rmi <image-name>

Example 1. Pulling an image

Scientific Worfklows > 11. Docker - running IMAS based containers > docker.gif

Containers

To run a container, use: docker run <image-name> <command>
To list containers, use: docker ps
To copy a file from/into a running container, use: docker cp <file-1> <file-2>
To execute a command in a running container, use: docker exec <container-id> <command>
To remove container, use: docker rm <container-id>

Each command supports additional flags passed along the main arguments. Please make sure to check docker help <command> for more information.

The most often used flags are:

--name for docker run to specify friendly name for the container
-i, --interactive and -t, --tty for docker run and docker exec when you want to work in an interactive shell inside of the container

Example 2. Starting a container

Scientific Worfklows > 11. Docker - running IMAS based containers > ubuntu.gif

Exercises

Checking /etc/os-release of different containers

(Almost) Every Linux OS has a file named /etc/os-release with some information about the platform
To see that the containers have indeed their own isolated environments, check the output of docker run <image-name> cat /etc/os-release for a few different image names: ubuntu, debian, postgres, mysql, alpine
In Docker Hub, each image is actually a combination of name and tag separated by colon. Check again the contents of /etc/os-release for: ubuntu:xenial, ubuntu:bionic, centos:7, centos:8

Working interactively in a container

Run a new container with an interactive session: docker run -it ubuntu /bin/bash
You are now logged as root, update APT cache: apt update
Install fortune and cowsay: apt install -y fortune cowsay
Run a few times: /usr/games/cowsay $(/usr/games/fortune)

Running a service in a container

Create a new directory on your computer: mkdir /tmp/docker-exercise
Add some HTML content, for example: echo '<img src="https://picsum.photos/200"/>' > /tmp/docker-exercise/index.html
Run a container with --publish flag to forward network traffic from host's 8080 port to container's 80 port: docker run --volume /tmp/docker-exercise/:/usr/local/apache2/htdocs/ --publish 8080:80 httpd:2.4
Open in a web browser: http://localhost:8080

Note, that flag --volume of docker run requires that given paths are absolute, not relative.

Singularity

Singularity is an alternative way to run containers, often with the purpose of scientific computing on HPCs. Whereas Docker excels at microservices i.e. small and single-application containers with a single TCP port open to the public, Singularity aims to provide reproducible results embedded in a container image available to the public. Because of that, several details such as image tagging and even image format differ between Singularity and Docker (with a conversion tool available). Additionally, Singularity has a different mode of operation which does not require superuser privileges, therefore it is more suitable to be installed on HPCs. It also has a native support for MPI and GPGPU calculations.

Availability

Singularity is installed on Marconi HPC and available there after executing: module load singularity

Instructions to install Singularity are available in the documentation.

Exercises

Start Singularity container

Search for Ubuntu image: singularity search ubuntu
Download latest image: singularity pull library://library/default/ubuntu
Notice that the image is in fact a file downloaded to your local directory: ls -l ubuntu_latest.sif
Run the image: ./ubuntu_latest.sif
Notice that you are mapped to the same user as in the host machine, run whoami and id
Notice that you are seeing your own $HOME directory from the container: ls -l
Check what system are you currently running: cat /etc/os-release

Note that while you are inside container (ubuntu_latest.sif) executed by singularity your system will report as

> cat /etc/os-release
NAME="Ubuntu"
VERSION="18.10 (Cosmic Cuttlefish)"
ID=ubuntu
ID_LIKE=debian
...
...

while Marconi node will present itself as

> cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
...
...

uDocker

uDocker was created in H2020 project INDIGO-DataCloud. It uses Docker images natively, but provides a userspace tool to run and manage containers. Because it does not require superuser privileges, it is also a good fit as a container tool to be used in HPCs. uDocker also supports MPI calculations and provides ways to access host's GPU.

Availability

uDocker is contained in a single Python file making it easy to deploy in distributed systems. To install it on Gateway execute:

export WORK=/pfs/work/$USER
mkdir -p ~/.local/bin ${SCRATCH}
ln -s ${WORK}/udocker ~/.udocker
 
wget https://github.com/indigo-dc/udocker/releases/download/1.3.4/udocker-1.3.4.tar.gz
tar zxvf udocker-1.3.4.tar.gz -C ${WORK}/
chmod u+rx ${WORK}/udocker
 
#test udocker exec
${WORK}/udocker/udocker --help
 
#install
${WORK}/udocker/udocker install

The same steps can be repeated on other machines, including HPC or your local computer. Just change WORK and SCRATCH variables in the first two lines. Note, that SCRATCH will be used during import of new images and WORK will be used to store images and files on all running containers. Choose them wisely to accommodate possibly big amount of data.

If ${WORK}/udocker/udocker command throws errors try to add python3 command before path.

module load itm-python/3.10
python3 ${WORK}/udocker/udocker --help
python3 ${WORK}/udocker/udocker install

Exercises

Start uDocker container

Download latest Ubuntu image: ${WORK}/udocker/udocker pull ubuntu
Verify available images: ${WORK}/udocker/udocker images
Create a container: ${WORK}/udocker/udocker create --name=udocker-ubuntu ubuntu
List running containers: ${WORK}/udocker/udocker ps
Run an interactive shell: ${WORK}/udocker/udocker run udocker-ubuntu /bin/bash
Check user: whoami
Check operating system: cat /etc/os-release
Exit from the container: exit
Check operating system at Gateway: cat /etc/os-release
Delete the container: ${WORK}/udocker/udocker rm udocker-ubuntu

As in case of Singularity, here, you can spot the difference as well. As long as you are inside uDocker based container you will see

> cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
...
...

while Gateway node reports itself as

> cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
...
...

IMAS image

IMAS environment is available as a Docker image. There are two flavors of the image: batch and GUI. Each image has a predefined environment, so unlike on Gateway, you do not need to load any modules or call imasdb to get started. Both images them are exported as archives and made available on the Gateway.

Loading images to uDocker

To load the images, please run:

${WORK}/udocker/udocker load -i ~g2tomz/public/imas-fc2k-latest.tar.xz
${WORK}/udocker/udocker load -i ~g2tomz/public/imas-gui-latest.tar.xz

Each step takes a bit longer than 1 minute without any progress indication, so please wait until it is finished.

Exercises

Running Python script to create IDSes

Make sure that the images are loaded: ${WORK}/udocker/udocker images
Create container:${WORK}/udocker/udocker create --name=imas imas/fc2k:3.24.0-4.2.0-2.5p4-3.0.5-4.6.5
Run an interactive shell: ${WORK}/udocker/udocker run imas /bin/bash

Prepare and run a Python script which will create IDS pf_active in shot 1 and run 1:

cat << EOF > put_pf.py
  
import imas
if __name__ == '__main__':
    ids = imas.ids(1, 1, 1, 1)
    ids.create_env('imas', 'test', '3')
    ids.pf_active.ids_properties.comment = 'Test data'
    ids.pf_active.ids_properties.homogeneous_time = 0
    ids.pf_active.coil.resize(2)
    ids.pf_active.coil[0].name = 'COIL 1A'
    ids.pf_active.coil[1].name = 'COIL 2B'
    number = 10
    ids.pf_active.coil[0].current.data.resize(number)
    ids.pf_active.coil[0].current.time.resize(number)
    for i in range(number):
        ids.pf_active.coil[0].current.data[i] = 2 * i
        ids.pf_active.coil[0].current.time[i] = i
    number = number + 2
    ids.pf_active.coil[1].current.data.resize(number)
    ids.pf_active.coil[1].current.time.resize(number)
    for i in range(number):
        ids.pf_active.coil[1].current.data[i] = 2 * i + 10
        ids.pf_active.coil[1].current.time[i] = i + number
    ids.pf_active.put()
EOF
  
python put_pf.py

You can now exit container shell. Note, that the container is not executing anything, but it persists and all files are kept in ~/.udocker. Copy the generated pulsefile directly into your Gateway's collection of pulsefiles:
cp ${WORK}/udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_10001.* ~/public/imasdb/test/3/0/
Verify that the IDS pf_active has been created: idsdump $USER test 3 1 1 pf_active

Running Kepler workflow in a container

Create container: ${WORK}/udocker/udocker create --name=imas-gui imas/gui:3.24.0-4.2.0-2.5p4-3.0.5-4.6.5
Run the default application (VNC server) with port mapping: ${WORK}/udocker/udocker run --publish 15901:5901 imas-gui
In another terminal, open VNC viewer, connect to localhost:15901 and use imas as the password: vncviewer localhost:15901
You will see Openbox desktop environment, with Kepler loading automatically (please wait until it is ready).
Design an example workflow like the one below, which will read pf_active IDS from shot 1 and run 1:
Run the workflow and notice that it fails due to lack of pulsefile with required content. This is because in the previous exercise you were running in the imas container, isolated from the imas-gui one.
You can now do one of the following:
1. Create the pulsefile again
  1. Run in another terminal in the Gateway: ${WORK}/udocker/udocker run imas-gui /bin/bash
  2. Repeat step 4 from previous exercise
2. Copy pulsefile between containers
  1. Run cp ${WORK}/udocker/containers/imas/ROOT/home/imas/public/imasdb/test/3/0/ids_10001.* ${WORK}/udocker/containers/imas-gui/ROOT/home/imas/public/imasdb/test/3/0/
Start the workflow again. The expected result: