Filesystems and data management

There are four main filesystems to be familiar with on Pawsey. Acacia is cloud object storage while the other three form part of the Setonix physical disk space:

Filesystem Default quota Default iNodes* Intended use
/home 1 GB per user 10K per user User configuration files and ssh keys
/scratch/<pawsey-project-id> 1 PB per project 1M per user Compute job input and output (high performance)
/software/projects/<pawsey-project-ID> 256 GB per project 100K per user Self installed tools and containers
Acacia 100 GB per user + 1 TB per project NA Persistent storage (not backed up)

/home

USyd users familiar with Artemis and/or NCI Gadi may be accustomed to the 10 GB /home quota per user. The more stringent hard limit of 1 GB per user is more than offset by the 100 GB persistent storage per user provided on Acacia and the 256 GB per project space on /software.

Use of /home should be restricted to configuration files and ssh keys. Workflow scripts, self-installed tools, singularity containers, R packages etc will quickly exceed the /home quota. These should be stored on /software/<pawsey-project-id>. Input and output files should make appropraite use of /scratch and Acacia.

Tip

Regularly check for and clean unneeded files from /home which may be generated by software as temporary or cache files. These are often stored in hidden directories (their name starts with a dot). VS Code and R packages are common culprits!

/scratch

Pawsey has a generous upper limit of 1 petabyte (PB) per project on /scratch. To afford this, the file retention period on /scratch is 21 days. There is no quarantine period for files purged from /scratch so users are encouraged to actively incorporate data staging between /scratch and Acacia for their workflows and regularly backup to USyd RDS. Note that the 21 day count commences from when the file was last read or modified, not necessarily from when it was created.

Important

Do not use touch to attempt to circumvent the Pawsey purge policy! This will degrade performance for all users and potentially result in your Pawsey access being revoked.


Pawsey encourages all users to actively manage their scratch utilisation through monitoring scratch usage and using munlink to remove large numbers of files from scratch rather than waiting for the purge policy to clean up for you. Active management of this shared finite resource ensures optimal performance of the filesystem, leading to faster execution times and a better experience for all users.

/software

Pawsey has numerous tools installed globally as modules. Tools not globally available can be self-installed into /software/projects/<pawsey-project-ID>. By providing a dedicated space for custom tools, Pawsey is meeting a common need among HPC users: a space where all members of a project can store shared tools and workflow files, without being subjected to purge or isolated within a user’s /home.

Software on Setonix will be covered in more detail in the Software section.

Acacia

Acacia is Pawsey’s warm tier object storage cluster. It is a performant and highly scalable resource, but differs from traditional filesystems by arranging data as objects in a bucket, rather than files in a folder hierarchy.

Each user is provided with 100 GB of private Acacia storage, and each project with 1 TB. Pawsey will consider increases above 1 TB based on genuine need; please contact help@pawsey.org.au. For large and ongoing Acacia requirements, please contact SIH to discuss options.

Acacia is not readable from or writable to by the compute nodes. This means that input data stored on Acacia must first be copied from (“staged”) to /scratch before being read by the compute job. Output that is written by the job to /scratch may also be staged back to Acacia. These events occur in different jobs, but can be easily connected through job dependencies following Pawsey’s template scripts for a staged workflow.

As an S3 bucket, Acacia provides the added benefits of sharing data with collaborators and mounting locally to expedite transfer between Pawsey and your local computer and/or USyd RDS. This will be covered in the section on data transfer.

For further information on using Acacia, please refer to Pawsey’s Acacia user guide.

Further reading