How is the system set up?

NCI provides computational and data-storage resources to researchers and postgraduate students at Australian higher education institutions and Australian public sector research organisations.

This section will point you to the right sections of the NCI documentation and user guides to get you started on the Gadi HPC system.

Overview of NCI Gadi HPC

NCI Gadi is one of Australia’s most powerful supercomputers, designed to support advanced computational research.

Component Details
Compute - Nodes: 4,962
- Processors: Intel Sapphire Rapids, Cascade Lake, Skylake, and Broadwell CPUs
- GPUs: NVIDIA V100 and DGX A100 GPUs
- Performance: Over 10 petaflops of peak performance :contentReferenceoaicite:0
Storage - Disk Drives: 7,200 4-Terabyte hard disks in 120 NetApp disk arrays
- Capacity: 20 Petabytes total usable capacity
- Performance: 980 Gigabytes per second maximum performance :contentReferenceoaicite:1
Filesystems - Total Capacity: Approximately 90 Petabytes
- Global Lustre Filesystems: Five, with an aggregate I/O performance of around 450 GB/second
- IO Intensive Platform: Dedicated filesystem using 576 2-Terabyte NVMe drives, achieving around 960 Gigabytes per second cumulative performance :contentReferenceoaicite:2
Archival Storage - Capacity: Over 70 Petabytes of archival project data stored in state-of-the-art magnetic tape libraries :contentReferenceoaicite:3
Networking - Interconnect: 100-gigabit network links connecting high-performance computing with high-performance data :contentReferenceoaicite:4
Cloud Systems - Nirin Cloud: High-availability and high-capacity zone integrated with Gadi and NCI’s multi-Petabyte national research data collections, comprising Intel Broadwell and Sandy Bridge processors and NVIDIA K80 GPUs :contentReferenceoaicite:5

Conditions of use

All users of NCI agree that they will keep themselves informed of, and comply with, all relevant legislation and The Australian National University policies and rules.

All users must acknowledge and understand that a breach of these will result in not only a loss of access to NCI resources but the user may be subject to Federal criminal prosecution resulting in fines and/or gaol legislated under the Acts listed above.

Key components of the Gadi HPC system

See the Gadi Resources Guide for a detailed explanation of the following.

Computing nodes

Navigating the system

This section contains some example commands to run on the Gadi HPC system. To generalise across multiple groups we have used <project> in place of a specific project code. When running these commands, please replace <project> with your project code.

E.g. if your project code is aa00, you would run:

cd /scratch/aa00

Login nodes

These nodes are the gateway for Gadi for users to access the resources of the HPC cluster. It is how you log in to Gadi, move around the filesystem, submit jobs to the scheduler, and do small tasks like view the contents of a file.

Compute nodes

These nodes are the workhorses of any HPC. They are dedicated for executing computational tasks as delegated by the job scheduler when you submit a job. There are various types of compute nodes with different hardware, built for different purposes on Gadi. Depending on the resource requirements of your job (e.g. high memory, GPUs) and the queue you specify, your job will be sent to a specific type of compute node. You can find a breakdown of their technical specifications here.

‼️ Pay Attention ‼️

Compute nodes on Gadi don’t currently have access to external internet. If any tasks within a submitted job on the compute node need to access the internet, they will fail. These jobs should be run separtately on the copyq using the data mover nodes. Or else use an ARE job.

Data mover nodes

These nodes are designed specifically for fast data movement. You can use these nodes to transfer files to and from Gadi at high-speed. Steps outlined here. A script for moving data between USyd RDS and Gadi is provided in /g/data/scripts and explained in the following section, transferring data.

Filesystems

$HOME

When you first log in to Gadi, you’ll be placed in your personal $HOME directory (i.e. /home/555/aa1234). You are the only person who can access this directory. No work should be done in here, but you may wish to install things like custom R or Python libraries here. It is backed up but you have a 10Gb storage limit.

You can navigate back here at any point if required by running:

cd ~

/scratch

All Gadi projects have a dedicated /scratch allocation that is only accessible to members of your project. This is only intended for active work on big files and not for long-term storage. This is not backed up and any files not accessed for 100 days will be purged from the system, so be sure to back up your work to RDS. Your /scratch contains a directory for each user (denoted by their Gadi username), however you can organise things however you wish here.

You can navigate to your /scratch space by running:

cd /scratch/<project>

/g/data

Some Gadi projects, have a dedicated g/data allocation that is only accessible to members of that group. This in intended for long-term large data storage. This is not backed up though, so ensure you transfer all important files back to RDS. If you are unsure if your project has a /g/data allocation, you can check by running:

cd /gdata/<project>

/g/data/if89

You can also access a communally maintained bioinformatics software project at /g/data/if89. Users currently have to request access to if89, but it is a good place to find bioinformatics software previously installed by others. To request access to if89:

  1. Log into MyNCI
  2. Navigate to Projects and Groups
  3. Search for if89 and request access
  4. Click join.

Access needs to be manually approved. If you are experiencing delays of >24 hours, please contact SIH.

/apps

This directory is accessible to all Gadi users. It is a read-only system containing centrally installed software applications and their module files. You can check what software is installed here:

ls /apps

You can use any software that is installed here by first loading the module file, e.g.:

module load samtools

Then run the tool as per it’s user guide, e.g.:

samtools view -H sample.bam

Queues

Like on Artemis, the job scheduler is PBSPro, however it is implented in a slightly different way. To run jobs on Gadi, users should submit to a specific queue on a corresponding node. The queue and node you choose to run on will depend on the types of resources your job needs. Pipelines your group use have already been configured to run on specific queues.

For custom PBS scripts, you can work out what queue to run your job on by checking the NCI queue documentation and queue limits explainer. Most jobs will be suitable for normal or normalbw queues. The normal queues have more nodes available for your jobs, and will allow users, and jobs that require a specialised queue, to get fair access to those resources. Express queues are designs to support work that needs a faster turnaround, but will be charged accordingly at a higher service unit charge.

All materials copyright Sydney Informatics Hub, University of Sydney