1. IMAS – introduction to basic concepts
1.1. Key IMAS element: the Data Model
- Joint scientific exploitation of ITER by multiple teams requires a Data Model to name and communicate physical and technical information → ITER Physics Data Model
- The Data Model provides information for data providers and data consumers on…
- What data exist ?
- What are they called ?
- How are they structured as seen by the user ?
- IMAS components use the ITER Physics Data Model to:
- Read & Write simulation results or experimental data
- Interface codes together
1.2. The ITER Physics Data Model
- Aims at being the main gate to data for scientific exploitation, both for code interfacing and hands-on data browsing
- is unique for simulated and experimental data (same data structures)
- is device-generic → usable for ITER or any other fusion device
- has precise design rules for global homogeneity
- has precise lifecycle procedure to be able to evolve and be jointly developped by multiple teams
1.3. Access Layer
- API providing access methods (read/write) to an ITER physics Database based on the ITER Physics Data Model
- Provided in Fortran, C++, Matlab, Java, Python
- The only effort for using the Data Model is to map the input/output of your code to the Data Model and add some GET/PUT commands
- The access methods are writing to a local database stored in your account
- These local databases can be shared among users (for reading only) and can be accessed remotely
1.4. First use case: User or code accessing Data Base through Access Layer
1.5. Interface Data Structures (IDS) to couple codes
- For Integrated Modelling, the Data Model also defines Interface Data Structures (IDS). These are structures within the Data Model that are used as standard interfaces between codes
- Solves the N2 problem (large number of components from various ITER members expected in IMAS)
- The usage of the IDSs makes the coupling of codes straightforward if they are in the same programming language
- The usage of the IDSs + AL allows coupling of codes even if they are not written in the same language
- The usage of the IDSs does NOT constrain your choice of coupling method. Codes can be coupled as:
- Subroutines within a main program
- Executables within a script
- Components within a workflow engine
1.6. Second use case: codes coupled together directly (same language) or through AL (different languages)
1.7. Workflow Engine and Component Generator to facilitate the development of Integrated Modelling workflows
- An Integrated Modelling simulation is described as a workflow with physics codes as components (modules)
- The workflow engine allows users to
- Design the workflow
- Choose its components and tune their code-specific parameters
- Execute the workflow
- The workflow engine will be used to help designing sophisticated workflows (e.g. Plasma Reconstruction chain, fully modularized Transport Solver, …)
- It is intuitive enough for allowing “mere users” developping their own workflows
- It hides the complexity of code coupling, data transfer, remote job submission, …
- It allows sharing codes and workflows
- It allows coupling to the PCS Simulation Platform
- Component generator: is a user tool that turns an IDS-compliant physics code into a component of the workflow
1.8. Physics codes + Data Access wrapped into a workflow component
1.9. Workflow components coupled and executed within a workflow engine
1.10. The layered structured: from the physics solver to the launcher
- The structure is layered so that functionalities are clearly separated
- It is generic and independent of e.g. the launching script/workflow engine
- The Physics solver part (dark green) is not changed, it is not linked to the ITER Data Model. It may use the ITER Data Model internally or not.
- The architecture is identical to the case of a component called within a workflow engine
- The Physics Subroutine can be directly reused to generate a workflow component – the IMAS Infrastructure provides a tool that generates the component (pink part) automatically
- Exception: for codes handling massive amounts of data, Data Access is usually parallelised and must be done inside the physics_solver (no processor has enough memory to gather all data)
2. More details on the ITER Physics Data Model
2.1. Data Model: Interface Data Structure
- The Data Model has a tree structure, for the sake of clarity
- At the top level, a collection of modular structures representing
- Abstract physical quantities (e.g. distribution functions)
- Tokamak subsystems (e.g. PF systems)
- These modular structures have the appropriate granularity for exchange in an IM workflow → they also represent standardised interfaces for communication between codes, named Interface Data Structure (IDS)
- Each has an “ids_properties” substructure (metadata + comments + timebase usage)
- Each has a “code” substructure (trace the code-specific parameters of the code that has generated this IDS)
- Each has a generic timebase (“time”)
2.2. Data Model documentation
Dynamically generated
Open the documentation by typing: dd_doc
2.3. What is in the documentation ?
- List of all IDSs. For each of them, a detailed documentation:
- Full path name: name of all variables of the IDSs, with their path in the structure. Replace “/” by the structure operator in a programming language, e.g. “%” in Fortran, “.” in C++, Matlab, Java, Python
- Description
- Definition
- Units in []
- In {}, whether it is STATIC (constant over a range of pulses, e.g. machine configuration), CONSTANT (constant over the pulse or the simulation), or DYNAMIC (time-dependent within the pulse or the simulation)
- Data_Type: indicates whether it is a string, an integer or a real, and its dimension (0D, 1D, 2D, …)
- Coordinates: for each dimension, the full path name to the related coordinate. If the dimension simply refers to a quantity not present in the Data Model, it is indicated as “1…N”
2.4. Exercise: use the Data Model documentation
- Go to the DM documentation and answer the following questions:
- How many IDSs have been defined ?
- Where can I find the toroidal flux profile calculated by my equilibrium code ?
- What are its units ?
- Does it vary during the pulse ?
- How many dimensions does it have ?
- What are its axes ?
- Assume I have retrieved a full equilibrium structure in my Fortran program, what syntax would I use for this variable ?
2.5. Arrays of structure
- Arrays of structures are used when a list of objects have nodes of different sizes, in order to avoid creating large sparse arrays
- Two kinds of arrays of structure are distinguished:
- Case 1: The structure contains asynchronous nodes, e.g. PF coils may be acquired with different timebases. See pf_active/coil is a vector, in Fortran: pf_active%coil(i1). For each coil, the current is a “data+time” structure, i.e. each coil current has its own timebase:
- pf_active%coil(i1)%current%data(itime)
- pf_active%coil(i1)%current%time(itime)
- These Case 1 AoS are used essentially in IDSs representing tokamak subsystems
- Case 1: The structure contains asynchronous nodes, e.g. PF coils may be acquired with different timebases. See pf_active/coil is a vector, in Fortran: pf_active%coil(i1). For each coil, the current is a “data+time” structure, i.e. each coil current has its own timebase:
- Two kinds of arrays of structure are distinguished:
- Case 2: The coordinate of the array of structure is a timebase. An index of the array of structure represents a time slice. As a consequence, the structure contains only dynamic and synchronous nodes, e.g. equilibrium/time_slice(itime). This time slice representation allows the size of the children to vary as a function of time (e.g. variable grid size).
- These Case 2 AoS are used essentially in IDSs representing abstract physical quantities.
2.6. IDS can be used in two different ways: Homogeneous timebase or not
- The Data Model provides the flexibility that every node has its own timebase. This is mandatory to represent experimental data as it has been acquired.
- pf_active%coil(i1)%current%data(itime)
- pf_active%coil(i1)%current%time(itime)
- However, a frequent use case is that a code will provide its output IDS(s) on a unique timebase
- Therefore there is a simplifying option to use an IDS structure with a homogeneous timebase, i.e. that will apply to all dynamic nodes of the IDS. This timebase is located at the top level of every IDS
- pf_active%time
- The ids_properties/homogeneous_time flag tells whether the IDS has been written with a homogeneous (unique) timebase (1) or not (0). In the latter case, the coordinates documentation provides the information on the localisation of the timebase in the structure.
- The code writing the IDS has the responsibility of defining this parameter and fill the appropriate time coordinate(s)
2.7. Data Entries
- A Data Entry is a collection of potentially all IDS
- Multiple occurrences of a given IDS can co-exist, e.g. multiple equilibria calculated by different codes / assumptions
- A Data Entry is defined by:
- IMAS version
- User name
- Machine name
- Pulse number
- Run number
- The recommended usage for a Simulation is that
- The simulation starts by reading data from an Input Data Entry (can be the from of another User)
- During the simulation, intermediate results are stored in a temporary “work”
- Entry (another Run number)
- During or at the end of the run, the results intended to be archived are written to an Output Data Entry (Run number)