Artemis as a Remote Server
Linux background
Artemis runs on a Linux operating system (currently CentOS v6.10). Becoming familiar with Linux commands can be quite challenging for those users who are used to graphical systems such as Windows. A Basic knowledge of Linux is a preferable for this course. A recommended course to learn the basics is by Software Carpentry.
Using Linux involves interacting with the system through commands entered in a command-line interface (CLI). The pattern of commands tend to follow:
command [options] [arguments]
For instance, This command on your local terminal ls -ltr ~
lists all files in the home directory.
(ls
) is the command. Options include: long format (-l
), sorted by modification time (-t
) in reverse order with the most recently modified files appearing at the bottom (-r
). The argument is (~
) which is shorthand for the home directory.
nano is the text editor we will be using to make and adjust files and is preinstalled on Artemis. Using nano will be demonstrated.
The Scheduler.
On Artemis, PBS Pro is the name of the scheduler that manages computer resources and jobs (what people want to run). The scheduler manages available capacity and the competing jobs wanting to access its resources via a queue. One a job is submitted to the scheduler (via qsub
command ), a job is placed in the queue until spare capacity is found to run it. Generally the larger the resources you are requesting, the longer your job will be placed in a queue.
PBS Scripts are how we communicate our requirements to the scheduler. PBS Declarations are the syntax of doing so and is a combination of a directive, a flag and options. All lines starting with #PBS are PBS declarations and the scheduler will try to interpret its meaning. An example of a PBS declaration is:
#PBS -l select=1:ncpus=8:mem=16gb
This specifies a resource request with the flag -l
, consisting of 1 node (select=1
), 8 CPUs (ncpus=8
) and 16 GB of RAM (mem=16GB
).
A series of PBS declarations are housed within a PBS script.
Resource requests are usually a combination of:
select
: the number of compute nodes (most often 1 unless MPI is being used)ncpus
: the number of CPU coresmem
: the amount of RAMngpus
: the number of GPU coreswalltime
: the length of time all these resources will be made available to the requesting job
To highlight walltime, you only have the collect of resources for a defined duration. If you have used your allocated walltime, the scheduler will stop your task irrespective of if your job is still running.
Generally the larger the resources requests, the longer (given spare capacity of the HPC and competing jobs) your job will most likely be placed in the queue before running. The scheduler controls the queue.
Common PBS Flags are:
Option | Description | Notes |
---|---|---|
-P |
Project short name | Required directive on Artemis |
-N |
Job name | Name it whatever you like; no spaces |
-l |
Resource request | a combination of resources as mentioned |
-q |
Job queue | Defaults to defaultQ. Use dtq for data transfer only |
-M |
Email address | Notifications can be sent by email on certain events |
-m |
Email options: abe | Receive notification on (a)bort, (b)egin, or (e)nd of jobs |
-I |
Interactive mode | Opens a shell terminal with access to the requested resources |
Other HPC systems might use other Schedulers requiring a different syntax (SLURM), but the concepts are the same.
Login vs Compute Nodes
Once you logon on to Artemis you are placed on the login nodes. Avoid running scripts there as you wont be able to access the potential resources available. PBS script submitted to the scheduler ensure your job is transferred from the login nodes to the compute nodes that access larger resources.