What does the system expect of its users?
Gadi is a shared resource and its efficient use not only ensures fair access for all users but also helps minimise the environmental impact of high-performance computing, as systems like Gadi consume significant energy resources. When you are using a system like Gadi, there are potentially hundreds of other users accessing the system at the same time as you. For Gadi to remain efficient and usable, everyone needs to be courteous and use the system with consideration for others.
To help you be a good citizen of the NCI HPC community, please review the do’s and don’ts of using Gadi as well as the general tips below.
Use job queues appropriately
Gadi runs a PBSpro job scheduler that manages the allocation of resources to users. When you submit a job, it is placed in a queue and will run when the requested resources become available. Unlike on Artemis where your job is allocated to a suitable queue based on your resource request, Gadi users need to explicitly request their job is sent to a specific queue. It is important for you to pick a job queue that is appropriate for your job.
Responsibly manage your data
/scratch
is not a safe space for long term data storage. If it has not been accessed in 100 days, it will be subjected to NCI’s clean up policy. If you have a /g/data
allocation, this is a better place to store data that you will regularly need as input for your jobs. While gdata is not subjected to purge like scratch, it is not backed up. You should back up all input, job scripts and important output files to RDS. Please follow the data transfer between Gadi and RDS guide for the best ways to do this.
Don’t request more resources than you need
Don’t request resources that you won’t need, it will only result in your job and other users jobs being held up, and you wasting your service unit allocation. It can be hard to know what resources a tool needs, and this can vary on different hardware. We suggest the following:
- Step 1: Consult the software documentation
- Often, developers will outline the minimum amount of RAM (memory) and whether a tool is multi-threaded (e.g. use >1 CPU or GPU)
- Step 2: Run a test job using our Gadi benchmarking tool
- This will give you a good idea of how much resources you need to request for your main job.
- Step 3: Ask for help
Keep track of your resource usage
Running jobs on gadi requires users to have sufficient service units (compute hours) available. It is also important to monitor your use of physical disk space and inodes on scratch, home and gdata. Please see the accounting section or the NCI pages below for more details:
Don’t misuse the login nodes
Login nodes are for logging in to the system, basic file and directory navigation commands, submitting jobs etc. Login nodes are not for large data transfers, compute tasks, excessive qstat queries, or submitting jobs via high-iteration for loops. Doing so will overload these nodes, causing a slow-down and frustration for all users. NCI monitors login node traffic and inappropriate use will be targeted. Please see the sections on data transfer and parallel jobs for recommended strategies for these tasks.