Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Bulk Data Transfer service provides an efficient and reliable solution for transferring datasets into the cloud computing environments of EOSC EU Node and transferring them back to the end-user premises and repositories. Bulk Data Transfer Service of LOT2 is an infrastructure oriented massive data transfer service. It is intended to support high-volume data transfers among distant sites. By leveraging the service, data can be moved directly to the backend storage of EOSC EU Node infrastructure, from where they can be accessed by VMs and containers in the cloud computing platform

Bulk Data Transfer service allows to move user data from outside of the EOSC EU Node down to the data storage back-end of the LOT2 compute services. These data, once transferred, can be accessed by applications and service running in the OpenStack-based virtual compute infrastructure and OKD-based container platform. 

Bulk Data Transfer use-cases

...

Current implementation of the Bulk Data Transfer service for EOSC EU Node is based on the CERN-developed File Transfer Service (FTS). Other implementations based on the cloud-native protocols such as S3 are considered to be provided in future.

What is FTS

FTS is File Transfer Service, designed and developed in CERN (https://github.com/cern-fts). It Service It is used across project that deal with large volumes of scientific data that has to be moved around the geographically distributed data storage infrastructure. 

It has been designed and developed in CERN (https://github.com/cern-fts). Historically, FTS main application was to automate transfer of large data volumes (in range of petabytes) within the large collaborations. For that purpose FTS supports 3rd party transfer, among other using GridFTP as well as transfer status monitoring, transfer restarts and built-in transfer optimisation.

...

FTS-based implementation uses GridFTP protocol to enable multi-threaded data transport that helps overcoming to overcome negative impact of the network latency impact in the long distance links. Moreover, FTS also takes care of completing the data transfer tasks defined by users; , it monitors their progres and status and can restart the failed transfer transfers if needed.

While delivering specialised functionality that used to be is applied in large scale data management projects, Bulk Data Transfer service can be integrated and in generic research and business workflows that involve large data transport and compute, as it . It is possible to integrate it FTS with today typical cloud computing platforms. EOSC EU Node provides such an integration.

How to use FTS

The data transfers are organised into data transfer tasks/jobs. The task/job specification includes, as the minimum, the indication of the source and target locations (URLs) for the data transfer. It may also include extra parameters.

The managed Bulk Data Transfer jobs can be triggered and monitored using any of the FTS servers available in the EOSC EU Node installation including servers at PSNC and Safespring. The list of the FTS servers configured for the EOSC EU node is included in the table below.

Site

Component
 

hostname *)

Port number

URL

PSNC

FTS3rest

fts01.staging.eosc.pcss.pl

8446 

https://fts01.staging.eosc.pcss.pl:844

PSNC

FTSmon

fts01.staging.eosc.pcss.pl

8449

https://fts01.staging.eosc.pcss.pl:8449/fts3/ftsmon/

Safespring

FTS3rest

fts02.staging.eosc.safedc.services

8446 

https://fts02.staging.eosc.safedc.services:844

Safespring

FTSmon

fts02.staging.eosc.safedc.services

8449

https://fts02.staging.eosc.safedc.services:8449/fts3/ftsmon/ 

*) NOTE: temporarily the list includes staging installation, in the final version of the documentation it will include the production version host names

The actual interaction with the service is possible by using the CLI tools, that contact the API endpoint of the indicated/relevant FTS server or directly using the REST API of the FTS servers. The This user guide focuses on using FTS CLI tools for interacting with the Bulk Data Transfer service of EOSC EU node.

The 3-rdy party transfers actually performed by GridFTP servers; GridFTP servers are also instantiated at both sites that enables efficient data transfer to any of the sites. The list of the FTS servers configured for EOSC EU node is included in the table below. The table lists also the host names of GridFTP servers in EOSC EU Node as well as their URLs. The URLs provided in the table can be used to specify the the target or the source of the managed transfer jobs, to be used while performing transfers into the EOSC EU Node or from the EOSC EU Node, respectively.

Site

Component
 

hostname *)

Port number

URL to be used
in the FTS CLI tools

PSNC

GridFTP server

gridftp01.staging.eosc.pcss.pl

2811

gsiftp://gridftp01.staging.eosc.pcss.pl:2811

PSNC

GridFTP server

gridftp02.staging.eosc.pcss.pl

2811

gsiftp://gridftp01.staging.eosc.pcss.pl:2811

Safespring

GridFTP server

gridftp01.staging.eosc.safedc.services

2811

gsiftp://gridftp01.staging.eosc.safedc.services:2811

Safespring

GridFTP server

gridftp02.staging.eosc.safedc.services

2811

gsiftp://gridftp02.staging.eosc.safedc.services:2811

*) NOTE: temporarily the list includes staging installation, in the final version of the documentation it will include the production version host names

The table above lists the host names of particular GridFTP servers in EOSC EU Node installation as well as their URLs.

Please note that, user of the FTS service typically interacts with the FTS server only and does not interact with GridFTP servers directly It is FTS server that triggers and supervise the 3-rd party transfers that are performed by GridFTP servers directly. The URLs provided in the table can be used to specify the the target or the source of the managed transfer jobs, to be used while performing transfers into the EOSC EU Node or from the EOSC EU Node, respectively.

EOSC EU node includes two FTS servers, one per each location for increased reliability. GridFTP servers are also instantiated at both sites that enables efficient data transfer to any of the sites. For instance, transferring data to/from PSNC requires using GridFTP running at PSNC. Similarly, staging data into Safespring compute infrastructure requires using GridFTP servers running there. 

Using FTS

This section of the user guide focuses on using FTS CLI tools for interacting with the Bulk Data Transfer service of EOSC EU node.
The  The Web interface of FTS is not supported in EOSC EU Node.

...