You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

1. Container image

  • Images are binary files containing all the data and metadata required to start the container
  • They can be built locally or downloaded from remote locations
  • The most common standard is the Docker image format

1.1. Docker image format

  • Docker image is a tar archive with metadata and layers

  • Each layer consists of its own metadata and another tar archive with the set of changes the layer introduces

  • The first metadata file in the image is manifest.json:

    [
      {
        "Config": "f63181f19b2fe819156dcb068b3b5bc036820bec7014c5f77277cfa341d4cb5e.json",
        "RepoTags": [
          "ubuntu:latest"
        ],
        "Layers": [
          "151ae8ef4f042fd5173fd2497f0a365b4413468163e7bd567146f29dcfea3517/layer.tar",
          "2872658e1abe34d0c7391abbc0848fdeddb456659e39511df0574fcfc8b7ad70/layer.tar",
          "2b83a9243dd8405d0811beeb14aeb797745b100e4538d056adb63fcc6b47c59f/layer.tar"
        ]
     }
    ]
    
  • It contains:

    • Config -- path to configuration file (architecture requirements, etc.),
    • RepoTags -- the list of tags used,
    • Layers -- paths to tar files containing layer information

1.2. OCI image format

  • An alternative format was proposed by OCI (Open Container Initiative)

  • It is also a tar archive containing metadata and layers in the form of embedded tar archives

  • The first metadata file is index.json:

    {
      "schemaVersion": 2,
      "manifests": [
        {
          "mediaType": "application/vnd.oci.image.manifest.v1+json",
          "digest": "sha256:7ad481b55901a1b5472c0e1b3fbf0bf2867dc38feb6eb7a18cd310f00208e05c",
          "size": 658
        }
      ]
    }
    
  • The manifest contains paths to configuration and layers:

    {
      "schemaVersion": 2,
      "config": {
        "mediaType": "application/vnd.oci.image.config.v1+json",
        "digest": "sha256:10bdc2317d43a5421151e135881e172002c7d61e934de7e1e79df560a151f112",
        "size": 2427
      },
      "layers": [
        {
          "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
          "digest": "sha256:f3f8f4bd7c131f4d967bc162207ab72c24f427915682f895eb4f793ad05d7e35",
          "size": 29989546
        },
        {
          "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
          "digest": "sha256:0188b501936213b7cd0b5333245960781a8b035249cfa427fe9a229fe557c624",
          "size": 924
        },
        {
          "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
          "digest": "sha256:db861e57845ea7ba52a2ac277abbdd8cd04bda5db69c49bf95be49d11e5a47e1",
          "size": 202
        }
      ]
    }
    

1.3. Local vs remote

  • Both formats describe how the image is stored as a local file

  • When transferring the image from remote server, the client asks for a list of layers, then checks its cache contents and finally downloads only the missing layers

  • This allows to reuse layers in images depending on each other

  • For example:

    • Let's say image A is based on ubuntu
    • Image B extends A by adding Python executable on top
    • Image C also extends A, but it adds Apache HTTP server instead
    • A first-time user of B image will download layers from ubuntu, then from A and finally from B
    • When he/she wants to use C image, only the layers with Apache HTTP server will be downloaded as all previous are still in cache

if both mybackend and myfrontend images are based on ubuntu, then the ubuntu layers are downloaded and cached only ones

2. Building container images

  • The most straightforward approach is by using the Dockerfile format
  • The Dockerfile is a text file with commands describing the recipe to build the image
  • The first command is FROM <image> which instructs what to base the image on
  • The RUN <command> executes a command in the builder context
  • The CMD <command> configures the default command a container will run upon creation
  • The EXPOSE <port> adds metadata about a port a service inside the image will listen on (Important! The container creator decided which ports to publish and how. Exposing a port in Dockerfile serves as a form of documentation.)
  • The ENV <key>=<value> sets environment variables' values
  • The COPY <src> <dest> allows to copy files from the host to the image
  • The WORKDIR <dir> sets the current working directory as seen inside the container
  • The ARG <name>=<default> configures a build-time argument that can be changed by the image builder

2.1. Example

  • Contents of index.html:

    <h1>Hello World from Docker!</h1>
    
  • Contents of Dockerfile:

    FROM ubuntu
    ENV DEBIAN_FRONTEND=noninteractive
    COPY index.html /var/www/html/index.html
    RUN apt-get update -y
    RUN apt-get install -y apache2
    EXPOSE 80
    CMD ["/usr/sbin/apachectl", "-DFOREGROUND"]
    

3. Optimization

  • The layer system allows caching, but it also has its consequences you need to be aware of

  • Scenario 1. Image A creates /bigfile, image B extending A deletes it. This fact is merely masked -- i.e. the user of B will not see /bigfile, but the file will still be part of image B and it will still take a lot of space

  • Scenario 2. One of the layers contains secrets (passwords, unencrypted private keys, etc.), the next layers delete them. Even though the secretes are not directly available, one can extract tar archives of every layer and get access to them anyway

  • In Dockerfile, each command creates a separate layer, so you should usually do this:

    • Combine subsequent RUN commands:

      -RUN touch /test1
      -RUN date > /test2
      +RUN touch /test1 && date > /test2
      
    • In a combined RUN, make sure to delete all temporary files:

      -RUN apt-get update -y
      -RUN apt-get install -y git
      -RUN rm -rf /var/lib/apt/lists/*
      +RUN apt-get update -y \
      + && apt-get install -y git \
      + && rm -rf /var/lib/apt/lists/*
      
      -RUN curl URL > archive.tar
      -RUN tar xf archive.tar
      -RUN rm archive.tar
      +RUN curl URL | tar x
      
  • Building procedure also makes use of caching

  • Scenario 1:

    • Layer 1: Install Apache HTTP server
    • Layer 2: Copy index.html
  • Scenario 2:

    • Layer 1: Copy index.html
    • Layer 2: Install Apache HTTP server
  • Both scenarios will create equivalent images, but if you change index.html, then in Scenario 2 both layers will be rebuilt, while in Scenario 1 only the second one

  • A good practice is to order the layers according to probability of being changed (the more probable, the later should it be)

3.1. Example after optimization

FROM ubuntu
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y \
    && apt-get install -y \
        apache2 \
    && rm -rf /var/lib/apt/lists/*
EXPOSE 80
CMD ["/usr/sbin/apachectl", "-DFOREGROUND"]
COPY index.html /var/www/html/index.html

4. Multi-stage Dockerfiles

  • Sometimes to build an image you need to generate the resources e.g. compile and build a JAR file from a Java project

  • In such cases, usually the source codes and a set of tools required to process them is not necessary in the final image

  • To optimize the image, you could try performing the following in a single RUN command:

    • Transfer source codes
    • Install build-time dependencies (compilers, etc.)
    • Build everything
    • Transfer the generated resource to its final destination
    • Remove all immediate files
  • This is a lot to be done in a single command, so it is prone to errors and hard to debug

  • To overcome this, you can use multi-stage building, which is like building multiple images simultaneously and freely transferring files between them

  • Each stage starts with its own FROM <image> AS <stage> command

  • You can copy files between images created in separate stages using COPY --from=<stage> syntax

4.1. Example

  • Contents of hello.go:

    package main
    import "fmt"
    func main() {
        fmt.Println("hello world")
    }
    
  • Contents of Dockerfile:

    FROM golang AS builder
    COPY hello.go hello.go
    RUN go build hello.go
    
    FROM ubuntu
    COPY --from=builder /go/hello /usr/bin/hello
    CMD /usr/bin/hello
    

5. BuildKit

  • Starting with version 18.09, Docker is shipped with two build engines: the legacy one (used by default) and the BuildKit

  • To use the new engine, you have to set the following environment variable DOCKER_BUILDKIT=1

  • The new engine has the following advantages:

    • Independent stages' steps are executed in parallel

    • You can pass private SSH keys to the build process and be sure they do not end up in any layer or metadata:

      RUN --mount=type=ssh <command>
      
      docker build --ssh default .
      
    • Similarly, you can pass any secrets to the build process:

      RUN --mount=type=secret,id=mysecret cat /run/secrets/mysecret
      
      docker build --secret id=mysecret,src=file.txt .
      
  • And the following disadvantage:

    • It is harder to debug problems

6. Debugging (legacy engine)

  • Many things might go wrong during Docker image preparation
  • When building with the legacy engine, each command in Dockerfile creates a layer, which gets stored under unique id
  • If something goes wrong, you can instantiate an interactive container from the last layer that was built successfully and look for clues

6.1. Example

  • Contents of Dockerfile

    FROM alpine
    RUN date > /tmp/build-date.txt
    RUN cat /tmp/build-dat.txt > /tmp/final-date.txt
    
  • Results of docker build .:

    Sending build context to Docker daemon  2.048kB
    Step 1/3 : FROM alpine
     ---> 14119a10abf4
    Step 2/3 : RUN date > /tmp/build-date.txt
     ---> Running in 3fe480f490d7
    Removing intermediate container 3fe480f490d7
     ---> 7eadcff6ea01
    Step 3/3 : RUN cat /tmp/build-dat.txt > /tmp/final-date.txt
     ---> Running in 7cba04ecd3e3
    cat: can't open '/tmp/build-dat.txt': No such file or directory
    The command '/bin/sh -c cat /tmp/build-dat.txt > /tmp/final-date.txt' returned a non-zero code: 1
    
  • The result of FROM command is stored as 14119a10abf4

  • The result of the first RUN command is stored as 7eadcff6ea01

  • The second RUN fails, so you debug it by starting an interactive container (the -it switch):

    docker run -it 7eadcff6ea01
    
  • Now you can try to execute the command that failed: cat /tmp/build-dat.txt

  • Then figure out why it failed, what could cause it, etc.

7. Debugging (BuildKit)

  • With BuildKit layers are no longer stored after each command in the Dockerfile
  • To improve performance, BuildKit only stores the image when a stage is finished
  • To debug problems, you have to abuse this rule and introduce artificial stage beginnings and ends

7.1. Example

  • For the same Dockerfile as before, with BuildKit you will see this output (for TTY output style plain configured by running docker build --progress plain .):

    #2 [internal] load .dockerignore
    #2 sha256:28b059ecac284a33ba98daa285c6a068d86485b54afc2e67f18e2bd1640d871a
    #2 transferring context: 2B done
    #2 DONE 0.1s
    
    #1 [internal] load build definition from Dockerfile
    #1 sha256:cbd3d6400308afcf33c0910b894d2e44156fc4127a0db290d19df5a4e8eae37e
    #1 transferring dockerfile: 37B done
    #1 DONE 0.3s
    
    #3 [internal] load metadata for docker.io/library/alpine:latest
    #3 sha256:d4fb25f5b5c00defc20ce26f2efc4e288de8834ed5aa59dff877b495ba88fda6
    #3 DONE 0.0s
    
    #4 [1/3] FROM docker.io/library/alpine
    #4 sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7
    #4 CACHED
    
    #5 [2/3] RUN date > /tmp/build-date.txt
    #5 sha256:684d70446f71e64256b21f59555e6fedc1eac55780675519af54f9e174fd16e1
    #5 DONE 1.2s
    
    #6 [3/3] RUN cat /tmp/build-dat.txt > /tmp/final-date.txt
    #6 sha256:bde8a93c3b05727094dd3d24e010c506f451a33718cc07f32f4e6b1ccab0b645
    #6 0.925 cat: can't open '/tmp/build-dat.txt': No such file or directory
    #6 ERROR: executor failed running [/bin/sh -c cat /tmp/build-dat.txt > /tmp/final-date.txt]: exit code: 1
    ------
     > [3/3] RUN cat /tmp/build-dat.txt > /tmp/final-date.txt:
    ------
    executor failed running [/bin/sh -c cat /tmp/build-dat.txt > /tmp/final-date.txt]: exit code: 1
    
  • Although you can notice lines starting with sha256:..., they do not correspond to intermediate layers' ids

  • To be able to debug the problem as previously, you have to alter the Dockerfile by (1) naming the debugged stage if not yet done and (2) adding FROM scratch line just before the command you need to debug:

    -FROM alpine
    +FROM alpine AS debug
     RUN date > /tmp/build-date.txt
    +FROM scratch
     RUN cat /tmp/build-dat.txt > /tmp/final-date.txt
    
  • Now you can set BuildKit to only consider debug target:

    docker build --progress plain --target debug .
    
    #1 [internal] load build definition from Dockerfile
    #1 sha256:ffe58018ac4c453ce043471e51216f7528c1fe315a9a83fa1ad276df0ac9f8a6
    #1 transferring dockerfile: 157B done
    #1 DONE 0.1s
    
    #2 [internal] load .dockerignore
    #2 sha256:dda1a34cfb4f2eb8169d58953a937062de5c70a0ae78ca49118e49ea8279a7b7
    #2 transferring context: 2B done
    #2 DONE 0.2s
    
    #3 [internal] load metadata for docker.io/library/alpine:latest
    #3 sha256:d4fb25f5b5c00defc20ce26f2efc4e288de8834ed5aa59dff877b495ba88fda6
    #3 DONE 0.0s
    
    #4 [debug 1/2] FROM docker.io/library/alpine
    #4 sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7
    #4 DONE 0.0s
    
    #5 [debug 2/2] RUN date > /tmp/build-date.txt
    #5 sha256:684d70446f71e64256b21f59555e6fedc1eac55780675519af54f9e174fd16e1
    #5 CACHED
    
    #6 exporting to image
    #6 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
    #6 exporting layers
    #6 exporting layers 0.4s done
    #6 writing image sha256:0846d70d4f562fa95835da5893cb53beb7f623cf228f57460bd298a0f87da680 0.0s done
    #6 DONE 0.5s
    
  • Finally you can start an interactive container by taking the hash value of the written image:

    docker run -it 0846d70d4f562fa95835da5893cb53beb7f623cf228f57460bd298a0f87da680
    

8. IMAS Docker

8.1. General remarks

  • The IMAS Docker build project is available at git.iter.org as IMEX/imas-container
  • It contains ansible-container/, buildah/ and docker/ subdirectories
  • Only the last one is actively developed
  • The Dockerfile requires BuildKit as it uses --mount=type=ssh
  • This also means that if you want to build IMAS Docker, you need to configure SSH agent and add there the private key which gives access to git.iter.org projects

8.2. build.sh

  • This Bash script accepts the following command line options:

    -f            disable cache (build everything from scratch)
    -u            build with UDA
    -c CPUs       number of CPUs [default=$(nproc)]
    -t target     build only one target
    
  • There are three auxiliary Bash functions defined:

    • latest_git_tag url blacklisted. Returns the latest (in the meaning of sort --version-sort) tag in a git repository given with the url. If blacklisted is given, such tag will be ignored. For example, the UDA repository has tag code_camp_cadarache which should be ignored.
    • latest_stable_git_tag url. As above, but the tag has to contain stable keyword (applicable for MDS+)
    • latest_released_git_tag url. As above, but the tag has to contain rel keyword (applicable for Ant)
  • Next, in the build.sh script you can set either of these tags to specific values:

    tag_al=4.8.7                        # FIXME: 4.8.7 is used by ETS
    tag_ant=1.10.6                      # FIXME: later versions of ant fail to build...
    tag_blitz=
    tag_cmake=v3.20.0
    tag_dd=3.31.0                       # FIXME: 3.31.0 is used by ETS
    tag_fc2k=
    tag_hdf5=hdf5-1_12_0
    tag_installer=
    tag_kca=
    tag_kepler_installer=
    tag_kp=
    tag_lapack=
    tag_mdsplus=stable_release-7-96-15  # FIXME: later versions of mdsplus fail to build...
    tag_tigervnc=
    tag_uda=2.3.1                       # FIXME: uda/2.3.1 is known to work well with uda-plugins/1.2.0
    tag_uda_plugins=1.2.0               # FIXME: uda/2.3.1 is known to work well with uda-plugins/1.2.0
    ver_kb=
    
  • All unset values will be checked using functions defined previously

8.3. Dockerfile

  • There are 14 stages in the Dockerfile, all based on Ubuntu 18.04

    • common-builder has compilers and libraries for building Blitz++, HDF5 and MDS+. It installs CMake from GitHub, because the version in Ubuntu 18.04 repo is too old

    • blitz-builder builds Blitz++ in /opt/blitz

    • hdf5-builder builds HDF5 in /opt/hdf5

    • mdsplus-builder builds MDS+ in /opt/mdsplus

    • imas-git-puller pulls all repositories from git and it is the only stage that accesses SSH keys

    • base contains a long list of applications, libraries and environment variables that will be used by any other imas/* image

    • base-devel adds Intel compilers on top of that and it uses them to compile BLAS and LAPACK

    • ual-devel does the following: (1) compiles UDA and UDA Plugins, (2) compiles IMAS without Fortran interface, (3) compiles Fortran interface, (4) installs IMAS, (5) builds UDA Plugins again

      • UDA has a cyclic dependency: IMAS requires UDA, UDA Plugins require IMAS
      • Fortan interface takes the longest to compile and requires the most amount of RAM. If you have trouble building the image, set CPU count in parallel building to a smaller value (see -c CPUs in build.sh description)
    • kepler-devel adds Ant, JAXFront and Kepler

      • JAXFront is a licensed software and IMAS Docker uses the free edition
    • fc2k-devel adds FC2K

    • ual starts from base image and copies from ual-devel all things built previously (BLAS, LAPACK, UDA, IMAS)

      • This way the ual image is free of the *-devel software (i.e. the Intel compilers) and IMAS source codes
    • kepler extends it and copies from kepler-devel

    • fc2k extends it and copies from fc2k-devel

    • gui extends it and adds XFCE4 and TigerVNC

  • There are 7 images produced:

    • imas/ual-devel
    • imas/kepler-devel
    • imas/fc2k-devel
    • imas/ual
    • imas/kepler
    • imas/fc2k
    • imas/gui

8.4. files/{base,ual,kepler,fc2k,gui}

  • There are several files used in Dockerfile available in files/* directories

    • In files/base you can find blas-ifort.pc and lapack-ifort.pc files preconfigured for BLAS and LAPACK (they do not come with the upstream package, so they were crafted manually)

    • In files/kepler you can find modulefile for JAXFront (it also had to be crafted manually)

    • In files/{ual,kepler} you can find Makefile.Docker.Ubuntu which is a configuration file used by IMAS Installer or Kepler Installer

      • In these files you select what to build e.g. you can switch off gfortran compilation of Fortran interface
    • In files/fc2k you can find install_Docker.Ubuntu.xml and settings_Docker.Ubuntu.xml which control the installation and usage of FC2K

    • In files/{ual,kepler,fc2k} you can find docker-entrypoint.sh which is set to be the entrypoint of corresponding images. The role of these scripts is to load necessary modules (e.g. IMAS or UDA) and execute imasdb test

8.5. push.sh and save.sh

  • The Bash script push.sh tags all images with registry prefix and pushes them there
  • The Bash script save.sh saves the non-devel images as .tar.zst in /tmp directory


  • No labels