1. IMAS – introduction to basic concepts

1.1. Key IMAS element: the Data Model

Joint scientific exploitation of ITER by multiple teams requires a Data Model to name and communicate physical and technical information → ITER Physics Data Model
The Data Model provides information for data providers and data consumers on…
- What data exist ?
- What are they called ?
- How are they structured as seen by the user ?
IMAS components use the ITER Physics Data Model to:
- Read & Write simulation results or experimental data
- Interface codes together

1.2. The ITER Physics Data Model

Aims at being the main gate to data for scientific exploitation, both for code interfacing and hands-on data browsing
is unique for simulated and experimental data (same data structures)
is device-generic → usable for ITER or any other fusion device
has precise design rules for global homogeneity
has precise lifecycle procedure to be able to evolve and be jointly developped by multiple teams

1.3. Access Layer

API providing access methods (read/write) to an ITER physics Database based on the ITER Physics Data Model
Provided in Fortran, C++, Matlab, Java, Python
The only effort for using the Data Model is to map the input/output of your code to the Data Model and add some GET/PUT commands
The access methods are writing to a local database stored in your account
These local databases can be shared among users (for reading only) and can be accessed remotely

1.4. First use case: User or code accessing Data Base through Access Layer

1.5. Interface Data Structures (IDS) to couple codes

For Integrated Modelling, the Data Model also defines Interface Data Structures (IDS). These are structures within the Data Model that are used as standard interfaces between codes
Solves the N2 problem (large number of components from various ITER members expected in IMAS)
The usage of the IDSs makes the coupling of codes straightforward if they are in the same programming language
The usage of the IDSs + AL allows coupling of codes even if they are not written in the same language
The usage of the IDSs does NOT constrain your choice of coupling method. Codes can be coupled as:
- Subroutines within a main program
- Executables within a script
- Components within a workflow engine

1.6. Second use case: codes coupled together directly (same language) or through AL (different languages)

1.7. Workflow Engine and Component Generator to facilitate the development of Integrated Modelling workflows

An Integrated Modelling simulation is described as a workflow with physics codes as components (modules)
The workflow engine allows users to
- Design the workflow
- Choose its components and tune their code-specific parameters
- Execute the workflow
The workflow engine will be used to help designing sophisticated workflows (e.g. Plasma Reconstruction chain, fully modularized Transport Solver, …)
- It is intuitive enough for allowing “mere users” developping their own workflows
- It hides the complexity of code coupling, data transfer, remote job submission, …
- It allows sharing codes and workflows
- It allows coupling to the PCS Simulation Platform
Component generator: is a user tool that turns an IDS-compliant physics code into a component of the workflow

1.8. Physics codes + Data Access wrapped into a workflow component

1.9. Workflow components coupled and executed within a workflow engine

1.10. The layered structured: from the physics solver to the launcher

The structure is layered so that functionalities are clearly separated
It is generic and independent of e.g. the launching script/workflow engine
The Physics solver part (dark green) is not changed, it is not linked to the ITER Data Model. It may use the ITER Data Model internally or not.
The architecture is identical to the case of a component called within a workflow engine
The Physics Subroutine can be directly reused to generate a workflow component – the IMAS Infrastructure provides a tool that generates the component (pink part) automatically
Exception: for codes handling massive amounts of data, Data Access is usually parallelised and must be done inside the physics_solver (no processor has enough memory to gather all data)

2. More details on the ITER Physics Data Model

2.1. Data Model: Interface Data Structure

The Data Model has a tree structure, for the sake of clarity
At the top level, a collection of modular structures representing
- Abstract physical quantities (e.g. distribution functions)
- Tokamak subsystems (e.g. PF systems)
These modular structures have the appropriate granularity for exchange in an IM workflow → they also represent standardised interfaces for communication between codes, named Interface Data Structure (IDS)
- Each has an “ids_properties” substructure (metadata + comments + timebase usage)
- Each has a “code” substructure (trace the code-specific parameters of the code that has generated this IDS)
- Each has a generic timebase (“time”)

2.2. Data Model: Occurrences

There can be multiple instances, or “occurrences” of a given IDS in a Database Entry (see 5.2) or used in an IMAS workflow. These occurrences can correspond to different methods for computing the physical quantities of the IDS, or to different functionalities in a workflow (e.g. store initial values, prescribed values, values at next time step, …).

By default, the IDS name without specification of the occurrence number (e.g. “equilibrium”) corresponds to occurrence “0”. IDS occurrences above the default value (occurrence “0”) are accessed by concatenating the name of the IDS with the occurrence number, with a “/” in between. For example “equilibrium/2” is the name of the occurrence number 2 of the equilibrium IDS. Note that “equilibrium/0” is not valid (temporary limitation).

In the present implementation, there is a pre-set maximum number of occurrences of a given IDS usable in a Database entry or in a workflow. This number is indicated in the documentation in the “Max. occurrence number” column of the list of IDS table. This limitation should be removed in the future

2.3. Data Model documentation

Dynamically generated
Open the documentation by typing: dd_doc

2.4. What is in the documentation ?

List of all IDSs. For each of them, a detailed documentation:
Full path name: name of all variables of the IDSs, with their path in the structure. Replace “/” by the structure operator in a programming language, e.g. “%” in Fortran, “.” in C++, Matlab, Java, Python
Description
Definition
Units in []
In {}, whether it is STATIC (constant over a range of pulses, e.g. machine configuration), CONSTANT (constant over the pulse or the simulation), or DYNAMIC (time-dependent within the pulse or the simulation)
Data_Type: indicates whether it is a string, an integer or a real, and its dimension (0D, 1D, 2D, …)
Coordinates: for each dimension, the full path name to the related coordinate. If the dimension simply refers to a quantity not present in the Data Model, it is indicated as “1…N”

2.5. Exercise: use the Data Model documentation

Go to the DM documentation and answer the following questions:
How many IDSs have been defined ?
- Answer: 46
Where can I find the toroidal flux profile calculated by my equilibrium code ?
- Answer: In the equilibrium IDS, search for “toroidal flux”, found at path time_slice(:)/profiles_1d/phi
What are its units ?
- Answer: Wb
Does it vary during the pulse ?
- Answer: Yes, it is dynamic
How many dimensions does it have ?
- Answer: 1D (float)
What are its axes ?
- Answer: time_slice(:)/profiles_1d/psi
Assume I have retrieved a full equilibrium structure in my Fortran program, what syntax would I use for this variable ?
- Answer: equilibrium%time_slice(:)%profiles_1d%phi

2.6. Arrays of structure

Arrays of structures are used when a list of objects have nodes of different sizes, in order to avoid creating large sparse arrays
Two kinds of arrays of structure are distinguished:
- Case 1: The structure contains asynchronous nodes, e.g. PF coils may be acquired with different timebases. See pf_active/coil is a vector, in Fortran: pf_active%coil(i1). For each coil, the current is a “data+time” structure, i.e. each coil current has its own timebase:
  - pf_active%coil(i1)%current%data(itime)
  - pf_active%coil(i1)%current%time(itime)
- These Case 1 AoS are used essentially in IDSs representing tokamak subsystems
Two kinds of arrays of structure are distinguished:
- Case 2: The coordinate of the array of structure is a timebase. An index of the array of structure represents a time slice. As a consequence, the structure contains only dynamic and synchronous nodes, e.g. equilibrium/time_slice(itime). This time slice representation allows the size of the children to vary as a function of time (e.g. variable grid size).
- These Case 2 AoS are used essentially in IDSs representing abstract physical quantities.

2.7. IDS can be used in two different ways: Homogeneous timebase or not

The Data Model provides the flexibility that every node has its own timebase. This is mandatory to represent experimental data as it has been acquired.
- pf_active%coil(i1)%current%data(itime)
- pf_active%coil(i1)%current%time(itime)
However, a frequent use case is that a code will provide its output IDS(s) on a unique timebase
Therefore there is a simplifying option to use an IDS structure with a homogeneous timebase, i.e. that will apply to all dynamic nodes of the IDS. This timebase is located at the top level of every IDS
- pf_active%time
The ids_properties/homogeneous_time flag tells whether the IDS has been written with a homogeneous (unique) timebase (1) or not (0). In the latter case, the coordinates documentation provides the information on the localisation of the timebase in the structure.
The code writing the IDS has the responsibility of defining this parameter and fill the appropriate time coordinate(s)

2.8. IDS and time slices

An IDS potentially contains many time slices, possibly in different time bases
Because this is used frequently during workflows, time slicing operations are allowed by the Access Layer
- GET_SLICE returns an IDS with all time dimensions of size 1 (representing thus a single "time slice"). Dynamic signals are interpolated (different options available)
- PUT_SLICE appends the content of an IDS variable (with all time dimensions of size) to an IDS stored on disk. This allows accumulating time slices in an IDS progressively during a time loop
More options can be added in the future
In a Kepler workflow, only the reference of the IDS is circulating, so operations can be performed on this IDS either in SLICE mode or in FULL mode (applies to the full IDS with all time slices)

2.9. Data Entries

A Data Entry is a collection of potentially all IDS
Multiple occurrences of a given IDS can co-exist, e.g. multiple equilibria calculated by different codes / assumptions
A Data Entry is defined by:
- IMAS version
- User name
- Machine name
- Pulse number
- Run number
The recommended usage for a Simulation is that
- The simulation starts by reading data from an Input Data Entry (can be the from of another User)
- During the simulation, intermediate results are stored in a temporary “work”
- Entry (another Run number)
- During or at the end of the run, the results intended to be archived are written to an Output Data Entry (Run number)

2.10. Where to find your local IMAS data repository ?

Answer #1: you do not care because the Access Layer will know where to find it
Answer #2: you care if you want to test that your program has indeed written something
ls -gtr ~/public/imasdb/test/3/mdsplus/0
- test - machine name
- 3 - IMAS version (major)
- 0 - Additional folder level to store RUN numbers beyond 10000

The file names are:
- ids_PulseRun.tree
- ids_PulseRun.datafile
- ids_PulseRun.characteristics
Where Pulse is the pulse number and Run is the 4 rightmost digits of the run number of the Data Entry.
Example: PULSE 22, RUN 2 consists of 3 files:
- ids_220002.tree
- ids_220002.datafile
- ids_220002.characteristics

2.11. Data Dictionary Lifecycle

An IMAS DD version is defined by 3 levels of revisions named M.N.i (Major.Minor.Micro) : example 3.16.0
The degree of evolution flexibility depends of the lifecycle status of the considered part of the DD:
- “Alpha” parts of the DD may freely evolve (through micro revisions)
- “Active” parts of the DD may only evolve through backward compatible changes (minor revision) – or trigger a major revision
- “Obsolescent” parts of the DD are kept for temporary backward compatibility of modules but might not exist in the next major revision
JIRA trackers / GIT pull requests provide an effective way to ask for DD evolutions and public release of a new DD version
The IMAS installer provides a mechanism to install and work with any private development branch of the DD

3. Conclusion

This introduction presented the IMAS basic concepts and some details on the ITER Physics Data Model
Other training sections will guide you for interfacing codes, generating workflow components, running workflows
The Physics Data Model User Guide is the technical reference for using IMAS and the Access Layer. You can find it (and many other information) on https://confluence.iter.org/display/IMP
In case of any question, raise it on https://jira.iter.org

Page tree

05.1. IMAS - basic topics - Data Model(POZ'19Oct)