Actor execution and debug modes

Scientific Worfklows > FC2K actors settings > image2021-4-12_12-12-50.png

Kepler actor, could be run in two main execution modes: JNI and 'standalone' and related debugger modes.

Execution modes

JNI mode

User codes are run directly from an actor, within a Kepler process, using Java JNI mechanism.

Standalone mode

When 'standalone' mode is set, Kepler runs an actor, as an independent system process, waiting only for its finish and produced outcomes.

'Standalone' mode uses more system resources (it is a separate system process!), however running actor in this mode may solve memory issues related with: insufficient memory owned by JVM or memory overriding (user code is run in separate memory space).

More in details - when actor with is fired in 'standalone' mode:

All actors files (user library, wrapper, standalone.exe) are copied to ~/public/KEPLEREXECUTION/<actor_dir> folder
An input.txt file is created (it contains all actor inputs, like CPO/IDS meta descriptions, strings, primitives values, XML params strings, etc)
A standalone C/Fortran executable (<actor_name>.exe) is run. It:
- reads input data from input.txt
- calls actor wrapper (defined in FortranWrapper.f90)

Please notice that 'standalone' mode is different from JNI execution:

Actor is run separately (e.g. no preceding actors in w-f can affect its execution)
User code is run from C/Fortran binary (in real case, in w-f, it is run by JNI)
It is run with different memory settings as in real w-f execution

Debugging a user code

Sequential codes

A user codes could be debugged using a chosen debugger in a way corresponding to execution modes described above.

JNI/Attach - Debugger attaches to a running Kepler process:
- User could debug what REALLY happens in w-f, including JNI calls, influence of previous actors, etc, etc...
- Proces cannot be restarted. Stopping or killing the proces being debugged kills JVM.
Standalone - debugger 'owning' executable (<actor_name>.exe) is started with as a separate process
- The code being debugged is run in somehow 'artificial' environment that differs from (usually used) JNI mode of an actor
- No preceding actors in w-f can affect its execution so error related with memory issues usually cannot be reproduce

MPI codes

Debugging MPI codes is available only from commandline. User should go to an actor folder (~/public/KEPLEREXECUTION/<actor_dir>), and run 'mpiexec' providing appropriate debug options

An example:

mpiexec <debugger switch>  -np 2 ./<actor_name>_exe

Unfortunately not only switches values differ regarding to MPI implementation, but also 'intel's' and 'gnu's' mpiexec behaves differently: under intel one it is impossible to restart an application that already finished...

MPI Vendor	TotalView	GDB
Intel	-tv	-gdb
GNU	-tv (deprected} -- debug --debugger totalview	-- debug --debugger gdb

Actor cache

To add a little complexity to the FC2K generated wrappers would be to add an optional new port (triggered by a choice in the GUI) which would take as input 0 or 1.

In the case of a 1, the physics code would be called in the usual way, but the result would also be "remembered" internally by the wrapper.
In the case of a 0, the physics code would not be called, but the result remembered by the wrapper would be replayed.

This would allow for cleaner, less complicated workflows and a great deal more flexibility in changing the frequency with which codes are called.

Scientific Worfklows > FC2K actors settings > ActorCache.jpg

An additional, Boolean, port useCachedResults has been added to actors generated by FC2K

Cache OFF

Port useCachedResults is not connected or it is connected but set to false.

Actor execution:

User subroutines is called
No operations on cache are performed

Cache ON

Port useCachedResults is connected and set to true.

Actor execution:

First run of an actor after setting useCachedResults to true
- Cache is empty so user subroutine is executed to produce data
- Output parameters are stored in cache (memory)
Next run of an actor
- User subroutine is not executed
- Output parameters are read from cache (memory)
- Independent on, how many times actor is fired (e.g. iteration number), exactly the same results are returned
Setting useCachedResults again to false invalidates the cache

Sandbox

"Sandbox" - a directory, in which actor will be run. Before execution of user codes wrapped by FC2K generated actor, directory will be changed to "sandbox", and after actor finishes, current directory will be switched back to previous value. The name (path) of "sandbox" directory will be created automatically or specified by user in actor configuration dialog.

Actor will use existing directory or will create it, if directory not exists. All directories created automatically or having user specified relative path will be created under <SANDBOX_ROOT> ( $ITMSCRATCH/KEPLER_SANDBOXon the Gateway).

FC2K settings

Scientific Worfklows > FC2K actors settings > FC2K-Settings.png

"Use sandbox" - enables / disables "sandbox"

"Sandbox" disabled

Actor behavior not changed, comparing to previous versions
A temporary directory is created in "Standalone" "Batch" "MPI" or "Debug-standalone" execution modes ( $HOME/public/KEPLEREXECUTION/<actor_name>_<timestamp> )

"Sandbox" enabled:

actor uses sandbox
a sandbox parameters shown in actor configuration dialog (see next paragraph)

Actor configuration dialog

Scientific Worfklows > FC2K actors settings > ActorParameters-Sandbox.png

Run in sandbox
- Values: TRUE/FALSE
- Defines if application could be run in any directory or in specified one ("sandbox")
- Default value: FALSE
Sandbox lifetime
- Values: "Actor execution", "Workflow execution"
- Defines if sandbox directory should be accessible only for given execution of particular actor ("Actor execution") or during the whole run of the workflow ("Workflow execution")
- Default value: "Actor execution"
Clean up sandbox
- Values: TRUE/FALSE
- Determines if content of checkbox should be cleaned up before
  - Every execution of the actor (if lifetime is set to "Actor execution")
  - First execution of the actor in workflow (if lifetime is set to "Workflow execution")
  - Default value: TRUE
  If an option "Clean up sandbox" is selected, it deletes the whole content of directory

Sandbox directory path

Text field
Empty field:
- Default value: <SANDBOX_ROOT>/<UNIQUE_ACTOR_INSTANCE_NAME>_<PROCESS_ID>
- name is unique to given instance of an actor, in case if there are several instances of one actor in WF
- name is unique to given instance of a running Kepler, in case if there are several instances of Keplers' running in parallel

User specified value

It may be only relative to <SANDBOX_ROOT> - a directory with user specified name will be created under <SANDBOX_ROOT> (if not exists)
FC2K performs no action on provided name (i.e. it is used "as it is" without any changes to make it unique etc.)
User specified name may contain system environment variables

Directory within <SANDBOX_ROOT> could be a link to any other existing directory. It allows to use directories from the outside of sandbox to be used. (Please do this responsible - potential risk of data lost if an option "Clean up sandbox" or "Delete sandbox" is selected.

Delete sandbox
- Values: TRUE/FALSE
- Determines if sandbox dir should be cleared:
  - when actor finishes (in case if lifetime is set to "Actor execution")
  - when workflow finishes in case if lifetime is set to "Workflow execution")
- Default value: TRUE

'Dummy' actors

When porting a workflow to a new platform, or to a new data-version, it often happens that some subset of the actors are not immediately available in the new environment. Rather than build a new workflow with these actors removed, and then have to re-build the workflow as and when actors become available, user can temporarily replace the missing actors with a generic dummy actor which:

have the same number and types of input and output ports as the missing actor
return gracefully with an error if ever activated

FC2K settings

Scientific Worfklows > FC2K actors settings > image2020-6-10_12-2-24.png

Actor generation:

- user will not have to provide library containing physics code (but (s)he may to)

- all other actor data has to be specified (as it were a "regular" actor)

- no C/F wrappers will be generated (only Java/Python code)

Runtime actions:

- user code will be not called (actually everything will be handled by Java actor, without calling wrappers, etc, etc)

- actor will return immediately with an ERROR, (msg: "Actor <name> should not be called")

Replacing a "dummy" by "regular" actor:

- User opens FC2K actor.xml project
- Checkbox "Create 'dummy' actor" should be unchecked
- User specifies libraries with physics code
- Regenerate an actor
- Fully functional "regular" actor is crreated

This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014-2018 under grant agreement No 633053.The scientific work is published for the realization of the international project co-financed by Polish Ministry of Science and Higher Education in 2019 and 2020 from financial resources of the program entitled "PMW"; Agreement No. 5040/H2020/Euratom/2019/2 and 5142/H2020-Euratom/2020/2”.