What does the system expect of its users?
- The do’s and don’ts of using Gadi
- SIH Into to Gadi HPC tutorial
- Pro tips for bioinformatics on HPC webinar
What is high performance computing?
High performance computing refers to the use of parallel processing techniques to solve complex computation problems efficiently. HPC systems, like Gadi, consist of clusters of interconnected computers, each equipped with multiple processors and large amounts of memory. These systems are designed to handle massive datasets and perform computations at speeds far beyond those achievable by your personal computer.
Why do we need HPC for bioinformatics?
In bioinformatics, researchers deal with massive datasets generated by technologies such as next-generation sequencing (genomics, transcriptomics) and mass spectrometry (proteomics). Analysing these datasets requires computationally intensive tasks such as sequence alignment, genome assembly, and statistical analysis. HPC systems provide the computational power and memory resources necessary to process these datasets efficiently.
Expectations
Gadi is a shared resource and its efficient use not only ensures fair access for all users but also helps minimise the environmental impact of high-performance computing, as systems like Gadi consume significant energy resources. When you are using a system like Gadi, there are potentially hundreds of other users accessing the system at the same time as you. For Gadi to remain efficient and usable, everyone needs to be courteous and use the system with consideration for others.
Here are some tips to help you be a good citizen of the HPC community:
1. Use job queues appropriately
Gadi runs a PBSpro job scheduler that manages the allocation of resources to users. When you submit a job, it is placed in a queue and will run when the requested resources become available. Unlike on Artemis where your job is allocated to a suitable queue based on your resource request, Gadi users need to explicitly request their job is sent to a specific queue. It is important for you to pick a job queue that is appropriate for your job.
2. Responsibly manage your data
/scratch
is not a safe space for long term data storage. If it has not been accessed in 100 days, it will be subjected to NCI’s clean up policy. If you have a /g/data
allocation, this is a better place to store your data whilst working on Gadi. Once you have finished your analysis, it is best practice to move your data to a more permanent storage solution, like RDS.
3. Don’t request more resources than you need
Don’t request resources that you won’t need, it will only result in your job and other users jobs being held up, and you wasting your service unit allocation. The PBS scheduler will find time for 2 cpus faster than 4 cpus, so in the interest of speed, be efficient. Given the bursty nature of some jobs, it can be hard to know what resources a tool needs. We suggest the following:
- Step 1: Consult the software documentation
- Often, developers will outline the minimum amount of RAM (memory) and whether a tool is multi-threaded (e.g. use >1 CPU or GPU)
- Step 2: Run a test job using our Gadi benchmarking tool
- This will give you a good idea of how much resources you need to request for your main job.
- Step 3: Ask for help
4. Keep track of your resource usage
- Monitor your jobs
- Monitor your project allocation
- What does a job cost?
- Why are my jobs not running?
Running jobs on gadi requires users to have sufficient compute hours available. These compute hours are granted to projects rather than directly to the user. It is important to communicate with your project team to ensure you are not using more than your fair share of resources. You can monitor your project’s usage by running:
nci_project -P <project> -v
If you are consistently overusing resources, you may need to look into optimising your workloads and/or requesting more resources from NCI. Get in touch with SIH to discuss your options.
At completion, your project is only charged the SU actually consumed by the job (ie based on walltime used, not walltime requested). Like Artemis, Gadi produces PBS logs. The “.o” job log will report the compute used (similar to the Artemis “.o” and “usage” logs combined).
Like KSU, each project is assigned a finite amount of disk space and iNode (index node - can be likened to the total number of files and folders). You MUST monitor your disk and iNode usage, and this can be done with the command:
lquota
which shows disk resource availability for every project you are a member of.
It is important to have an understanding of how much output your job will create, and ensure that you can remain within quotas/limits. Jobs can fail with “disk quota exceeded” messages.
All materials copyright Sydney Informatics Hub, University of Sydney