Archive Migration through Workflow Automation

N. Podhorszki, B. Ludscher, and S. Klasky (USA)


Scientific workflow, data transfer, software tools, dis tributed application


The Center for Plasma Edge Simulation project aims to automate the tedious tasks of simulation monitoring, data archival and coupling simulation codes using the Kepler scientific workflow environment. The technology has been successfully applied for migrating a combustion data archive of 10TB from NERSC to ORNL, where there were no other automated solutions for this task. This paper de scribes the workflow that migrates large files from mass storage systems using external tools and temporary staging to disks, performing different stages in a pipeline-parallel fashion, parallelizing file transfers and doing special check pointing to make the workflow restartable and also per form operations that failed earlier. The advantage of creat ing/using such a workflow over specialized data migration services is its independence from specific systems so it can be used by configuring the external tools to be used. The advantage over scripts is the robust exection (handling fail ures and timeouts) and efficiency (parallelization wherever possible).

Important Links:

Go Back