1. Actor execution and debug modes
Kepler actor, could be run in two main execution modes: JNI and 'standalone' and related debugger modes.
1.1. Execution modes
1.1.1. JNI mode
User codes are run directly from an actor, within a Kepler process, using Java JNI mechanism.
1.1.2. Standalone mode
When 'standalone' mode is set, Kepler runs an actor, as an independent system process, waiting only for its finish and produced outcomes.
'Standalone' mode uses more system resources (it is a separate system process!), however running actor in this mode may solve memory issues related with: insufficient memory owned by JVM or memory overriding (user code is run in separate memory space).
More in details - when actor with is fired in 'standalone' mode:
- All actors files (user library, wrapper, standalone.exe) are copied to ~/public/KEPLEREXECUTION/<actor_dir> folder
- An input.txt file is created (it contains all actor inputs, like CPO/IDS meta descriptions, strings, primitives values, XML params strings, etc)
- A standalone C/Fortran executable (<actor_name>.exe) is run. It:
- reads input data from input.txt
- calls actor wrapper (defined in FortranWrapper.f90)
Please notice that 'standalone' mode is different from JNI execution:
- Actor is run separately (e.g. no preceding actors in w-f can affect its execution)
- User code is run from C/Fortran binary (in real case, in w-f, it is run by JNI)
- It is run with different memory settings as in real w-f execution
1.2. Debugging a user code
1.2.1. Sequential codes
A user codes could be debugged using a chosen debugger in a way corresponding to execution modes described above.
- JNI/Attach - Debugger attaches to a running Kepler process:
User could debug what REALLY happens in w-f, including JNI calls, influence of previous actors, etc, etc...
- Proces cannot be restarted. Stopping or killing the proces being debugged kills JVM.
- Standalone - debugger 'owning' executable (<actor_name>.exe) is started with as a separate process
- The code being debugged is run in somehow 'artificial' environment that differs from (usually used) JNI mode of an actor
- No preceding actors in w-f can affect its execution so error related with memory issues usually cannot be reproduce
1.2.2. MPI codes
Debugging MPI codes is available only from commandline. User should go to an actor folder (~/public/KEPLEREXECUTION/<actor_dir>), and run 'mpiexec' providing appropriate debug options
An example:
mpiexec <debugger switch> -np 2 ./<actor_name>_exe
Unfortunately not only switches values differ regarding to MPI implementation, but also 'intel's' and 'gnu's' mpiexec behaves differently: under intel one it is impossible to restart an application that already finished...
MPI Vendor | TotalView | GDB |
---|---|---|
Intel | -tv | -gdb |
GNU | -tv (deprected} -- debug --debugger totalview | -- debug --debugger gdb |
2. Actor cache
An additional, Boolean, port useCachedResults has been added to actors generated by FC2K
2.1. Cache OFF
Port useCachedResults is not connected or it is connected but set to false.
Actor execution:
- User subroutines is called
- No operations on cache are performed
2.2. Cache ON
Port useCachedResults is connected and set to true.
Actor execution:
- First run of an actor after setting useCachedResults to true
- Cache is empty so user subroutine is executed to produce data
- Output parameters are stored in cache (memory)
- Next run of an actor
- User subroutine is not executed
- Output parameters are read from cache (memory)
- Independent on, how many times actor is fired (e.g. iteration number), exactly the same results are returned
- Setting useCachedResults again to false invalidates the cache
3. Sandbox
"Sandbox" - a directory, in which actor will be run. Before execution of user codes wrapped by FC2K generated actor, directory will be changed to "sandbox", and after actor finishes, current directory will be switched back to previous value. The name (path) of "sandbox" directory will be created automatically or specified by user in actor configuration dialog.
Actor will use existing directory or will create it, if directory not exists. All directories created automatically or having user specified relative path will be created under <SANDBOX_ROOT> ( $ITMSCRATCH/KEPLER_SANDBOXon the Gateway).
3.1. FC2K settings
"Use sandbox" - enables / disables "sandbox"
"Sandbox" disabled
- Actor behavior not changed, comparing to previous versions
- A temporary directory is created in "Standalone" "Batch" "MPI" or "Debug-standalone" execution modes ( $HOME/public/KEPLEREXECUTION/<actor_name>_<timestamp> )
"Sandbox" enabled:
- actor uses sandbox
- a sandbox parameters shown in actor configuration dialog (see next paragraph)
3.2. Actor configuration dialog
- Run in sandbox
- Values: TRUE/FALSE
- Defines if application could be run in any directory or in specified one ("sandbox")
- Default value: FALSE
- Sandbox lifetime
- Values: "Actor execution", "Workflow execution"
- Defines if sandbox directory should be accessible only for given execution of particular actor ("Actor execution") or during the whole run of the workflow ("Workflow execution")
- Default value: "Actor execution"
- Clean up sandbox
- Values: TRUE/FALSE
- Determines if content of checkbox should be cleaned up before
- Every execution of the actor (if lifetime is set to "Actor execution")
- First execution of the actor in workflow (if lifetime is set to "Workflow execution")
- Default value: TRUE
If an option "Clean up sandbox" is selected, it deletes the whole content of directory
- Sandbox directory path
- Text field
- Empty field:
- Default value: <SANDBOX_ROOT>/<UNIQUE_ACTOR_INSTANCE_NAME>_<PROCESS_ID>
- name is unique to given instance of an actor, in case if there are several instances of one actor in WF
- name is unique to given instance of a running Kepler, in case if there are several instances of Keplers' running in parallel
- User specified value
- It may be only relative to <SANDBOX_ROOT> - a directory with user specified name will be created under <SANDBOX_ROOT> (if not exists)
- FC2K performs no action on provided name (i.e. it is used "as it is" without any changes to make it unique etc.)
- User specified name may contain system environment variables
Directory within <SANDBOX_ROOT> could be a link to any other existing directory. It allows to use directories from the outside of sandbox to be used. (Please do this responsible - potential risk of data lost if an option "Clean up sandbox" or "Delete sandbox" is selected.
- Delete sandbox
- Values: TRUE/FALSE
- Determines if sandbox dir should be cleared:
- when actor finishes (in case if lifetime is set to "Actor execution")
- when workflow finishes in case if lifetime is set to "Workflow execution")
- Default value: TRUE
4. 'Dummy' actors
When porting a workflow to a new platform, or to a new data-version, it often happens that some subset of the actors are not immediately available in the new environment. Rather than build a new workflow with these actors removed, and then have to re-build the workflow as and when actors become available, user can temporarily replace the missing actors with a generic dummy actor which:
- have the same number and types of input and output ports as the missing actor
- return gracefully with an error if ever activated
4.1. FC2K settings
4.2. Actor generation:
- user will not have to provide library containing physics code (but (s)he may to)
- all other actor data has to be specified (as it were a "regular" actor)
- no C/F wrappers will be generated (only Java/Python code)
4.3. Runtime actions:
- user code will be not called (actually everything will be handled by Java actor, without calling wrappers, etc, etc)
- actor will return immediately with an ERROR, (msg: "Actor <name> should not be called")
4.4. Replacing a "dummy" by "regular" actor:
- User opens FC2K actor.xml project
- Checkbox "Create 'dummy' actor" should be unchecked
- User specifies libraries with physics code
- Regenerate an actor
- Fully functional "regular" actor is crreated
Acknowledgement
This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014-2018 under grant agreement No 633053.The scientific work is published for the realization of the international project co-financed by Polish Ministry of Science and Higher Education in 2019 and 2020 from financial resources of the program entitled "PMW"; Agreement No. 5040/H2020/Euratom/2019/2 and 5142/H2020-Euratom/2020/2”.