Bulk Data Transfer service provides an efficient and reliable solution for transferring datasets into the cloud computing environments of EOSC EU Node and transferring them back to the end-user premises and repositories. Bulk Data Transfer Service of LOT2 is an infrastructure oriented massive data transfer service. It is intended to support high-volume data transfers among distant sites. By leveraging the service, data can be moved directly to the backend storage of EOSC EU Node infrastructure, from where they can be accessed by VMs and containers in the cloud computing platform
Bulk Data Transfer service allows to move user data from outside of the EOSC EU Node down to the data storage back-end of the LOT2 compute services. These data, once transferred, can be accessed by applications and service running in the OpenStack-based virtual compute infrastructure and OKD-based container platform.
Various use-cases can be implemented by Bulk Data Transfer service including:
Current implementation of the Bulk Data Transfer service for EOSC EU Node is based on File Transfer Service (FTS). Other implementations based on cloud-native protocols such as S3 are to be provided in future.
FTS is File Transfer Service It is used across project that deal with large volumes of scientific data that has to be moved around the geographically distributed data storage infrastructure.
It has been designed and developed in CERN (https://github.com/cern-fts). Historically, FTS main application was to automate transfer of large data volumes (in range of petabytes) within the large collaborations. For that purpose FTS supports 3rd party transfer, among other using GridFTP as well as transfer status monitoring, transfer restarts and built-in transfer optimisation.
The overall added value of the Bulk Data Transfer services is performance and reliability of data transfers.
FTS-based implementation uses GridFTP protocol to enable multi-threaded data transport that helps to overcome negative impact of the network latency in the long distance links. FTS also takes care of completing the data transfer tasks defined by users, it monitors their progres and status and can restart failed transfers if needed.
While delivering specialised functionality that is applied in large scale data management projects, Bulk Data Transfer service can be integrated and in generic research and business workflows that involve large data transport. It is possible to integrate FTS with today typical cloud computing platforms. EOSC EU Node provides such an integration.
The data transfers are organised into data transfer tasks/jobs. The task/job specification includes, as the minimum, the indication of the source and target locations (URLs) for the data transfer.
The managed Bulk Data Transfer jobs can be triggered and monitored using any of the FTS servers available in the EOSC EU Node installation including servers at PSNC and Safespring. The list of the FTS servers configured for the EOSC EU node is included in the table below.
Site | Componen | hostname *) | Port # | URL |
PSNC | FTS3rest | 8446 | https://fts.eu-1.datatransfer.open-science-cloud.ec.europa.eu:8446/ | |
PSNC | FTSmon | 8449 | https://fts.eu-2.datatransfer.open-science-cloud.ec.europa.eu:8449/ | |
Safespring | FTS3rest | fts02.staging.eosc.safedc.services | 8446 | https://fts.eu-2.datatransfer.open-science-cloud.ec.europa.eu:8446/ |
Safespring | FTSmon | fts02.staging.eosc.safedc.services | 8449 | https://fts.eu-2.datatransfer.open-science-cloud.ec.europa.eu:8449/ |
*) NOTE: temporarily the list includes staging installation, in the final version of the documentation it will include the production version host names
The interaction with the service is possible by using the CLI tools, that contact the API endpoint of the indicated/relevant FTS server or directly using the REST API of the FTS servers. This user guide focuses on using FTS CLI tools for interacting with the Bulk Data Transfer service of EOSC EU node.
The list of the FTS servers configured for EOSC EU node is included in the table below. The table lists also the host names of GridFTP servers in EOSC EU Node as well as their URLs. The URLs provided in the table can be used to specify the the target or the source of the managed transfer jobs, to be used while performing transfers into the EOSC EU Node or from the EOSC EU Node, respectively.
Site | Component | hostname *) | Port # | URL to be used in the FTS CLI tools |
PSNC | GridFTP server | gridftp01.eu-1.datatransfer.open-science-cloud.ec.europa.eu | 2811 | gsiftp://gridftp01.eu-1.datatransfer.open-science-cloud.ec.europa.eu:2811 |
PSNC | GridFTP server | gridftp02.eu-1.datatransfer.open-science-cloud.ec.europa.eu | 2811 | gsiftp://gridftp02.eu-1.datatransfer.open-science-cloud.ec.europa.eu:2811 |
Safespring | GridFTP server | gridftp03.eu-2.datatransfer.open-science-cloud.ec.europa.eu | 2811 | gsiftp://gridftp03.eu-2.datatransfer.open-science-cloud.ec.europa.eu:2811 |
Safespring | GridFTP server | gridftp04.eu-2.datatransfer.open-science-cloud.ec.europa.eu | 2811 | gsiftp://gridftp04.eu-2.datatransfer.open-science-cloud.ec.europa.eu:2811 |
*) NOTE: temporarily the list includes staging installation, in the final version of the documentation it will include the production version host names
Please note that, user of the FTS service typically interacts with the FTS server only and does not interact with GridFTP servers directly It is FTS server that triggers and supervise the 3-rd party transfers that are performed by GridFTP servers.
EOSC EU node includes two FTS servers, one per each location for increased reliability. GridFTP servers are also instantiated at both sites that enables efficient data transfer to any of the sites. For instance, transferring data to/from PSNC requires using GridFTP running at PSNC. Similarly, staging data into Safespring compute infrastructure requires using GridFTP servers running there.
This section of the user guide focuses on using FTS CLI tools for interacting with the Bulk Data Transfer service of EOSC EU node. The Web interface of FTS is not supported in EOSC EU Node.
If CLI client is to be used with the service, the following minimum requirements have to be met:
The following CLI command has to be used, in order to initiate the FTS monitored transfer:
/bin/fts-rest-transfer-submit --verbose $IN $OUT
where $IN and $OUT are the source and target URLs of the GridFTP servers holding the data, along with the path to the data to be transferred.
Example command is presented below:
/bin/fts-rest-transfer-submit --verbose gsiftp://gridftp01.staging.eosc.pcss.pl:2811/data/output//testfile_1G_2.bin gsiftp://gridftp03.staging.eosc.safedc.services:2811/data/output/testfile_1G_2.bin
Transfer can be monitored using various approaches. FTS CLI tool can be used in order to list the transfers initiated by a user along with their status information. In addition, FTSmon monitoring console of FTS can be used in order to get the graphical overview of the transfer jobs handled by the FTS server.
The following CLI command has to be used, in order to monitor the transfers initiated by a user:
/bin/fts-rest-transfer-list | egrep -i "Request ID|ACTIVE|Status"
The command should display the list of the transfers initiated along with their state. The detailed explanation of the command output is provided in the tool documentation.
Example output of the command can be seen below:
/bin/fts-rest-transfer-list | egrep -i "Request ID|ACTIVE|Status"
In addition to the FTS CLI, FTSmon module can be used to get the graphical overview of the transfer jobs handled by FTS servers. The FTSmon monitoring consoles are available for FTS servers running in the EOSC EU Node. The list of URLs where these consoles can be reached is included in the table above.
Example view on the WebFTS console is presented in the picture below.
The picture shows one active transfer among two GridFTP servers at PSNC ("Active: 1") as well as the status and statistics of other jobs that are or were running among the GridFTP server pairs.
Note that FTSmon presents only the aggregated information on the transfers triggered in particular relations. Additional information on the transfer jobs can be examined by navigating to detailed task status information and statistics pages in the web interface. The detailed job information is available to users through the CLI monitoring tool - see the previous subsection for details.
Detailed instructions of using FTS and GridFTP can be found in the documentation of these particular products pointed below.
CLI tools documentation: https://fts3-docs.web.cern.ch/fts3-docs/docs/cli.html
GridFTP clients documentation: https://gridcf.org/gct-docs/6.0/gridftp/user/index.html#gridftp-user-quickstart
: