Fusion using AWS Batch and S3 object storage
Fusion simplifies and makes more efficient the execution of Nextflow pipelines with AWS Batch in several ways:
- No need to use the AWS CLI tool for copying data from and to S3.
- No need to create a custom AMI or create custom containers to include the AWS CLI tool.
- Fusion uses an efficient data transfer and caching algorithm that provides much faster throughput compared to AWS CLI and does not require the local copy of the data files.
- Replacing the AWS CLI with a native API client, the transfer is much more robust at scale.
A minimal pipeline configuration looks like the following:
process.executor = 'awsbatch'
process.queue = '<YOUR AWS BATCH QUEUE>'
aws.region = '<YOUR AWS REGION>'
fusion.enaled = true
wave.enabled = true
In the above snippet replace YOUR AWS BATCH QUEUE
and YOUR AWS REGION
with corresponding AWS Batch queue and region
of your choice, and save it to a file named nextflow.config
into the pipeline launching directory.
Then launch the pipeline execution with the usual run command:
nextflow run <YOUR PIPELINE SCRIPT> -w s3://<YOUR-BUCKET>/work
Replacing YOUR PIPELINE SCRIPT
with the URI of your pipeline Git repository
and YOUR-BUCKET
with a S3 bucket of your choice.
To achieve best performance make sure to setup a SSD volumes as temporary directory. See the section SSD storage for details.