More ways to automate

Overview

Teaching: 20 min
Exercises: 20 min
Questions
  • Further ways to automate jobs, with arrays and scripts

Objectives
  • Grow your toolkit of automation methods

Automating with Bash scripts

This episode explores using Bash functions and scripting to automate, or ‘batch’, computations on Artemis.

The PBS job name as a variable

In the previous Episode we used the PBS_ARRAY_INDEX variable as a tool to run slightly different jobs using the same PBS script. This works because the array index of a subjob is an arbitrary variable that nothing in the PBS execution process depends on, so we can use it for our needs.

Another such arbitrary variable is the PBS job name, the human-readable name we have been giving our jobs so that we could more easily keep track of them. The key here is that we don’t need to set the job name via a PBS directive in a PBS script – we can also just pass it to PBS as an option in the call to qsub.

To see this in action, navigate to the Povray directory, and open single_image.pbs in your preferred editor.

cd ../Povray

nano single_image.pbs
single_image.pbs in nano


Note that there is no job name directive in this script – no #PBS -N Name line. However, the script does refer to a variable PBS_JOBNAME, which PBS creates for us when we run qsub. This script enters a folder called $PBS_JOBNAME and then uses the file $PBS_JOBNAME.pov as input.

What values will $PBS_JOBNAME need to take?

Answer

ls */*.pov
castle/castle.pov      glass/glass.pov              plants/plants.pov
escargot/escargot.pov  plants/exgrass3.pov          snow/snow.pov
fridge/fridge.pov      plants/plants_demo_pano.pov

The values $PBS_JOBNAME should take are: castle, escargot, fridge, glass, plants, and snow.

What will happen if we included exgrass3 or plants_demo_pano?

Make any needed changes to the single_image.pbs script, and submit it with the first value for name above, passed to the qsub command with -N:

qsub -N castle single_image.pbs

Monitor your job as usual, and when it is done check that it was successful. In addition to the log files and Exit Status: 0, there should now be a .png image file created in the castle directory.

If you have enabled x-window forwarding (ie you used ssh -X or X-Win32 on Windows), then you should be able to display the image. Use the display command from the ImageMagick image processing suite:

module load imagemagick
display castle/castle.png &
The image castle.png displayed in ImageMagick and served by XQuartz


FOR loops

Now that we have convinced ourselves this works, we really don’t have to manually run qsub for each image we wish to render. In fact, we can very easily write a Bash script to loop over the images and run qsub for us.

An example of looping in Bash can be seen in the script loop.sh. Display its contents:

cat loop.sh
[jdar4135@login2 Povray]$ cat loop.sh
#! /bin/bash

# Iterate over list of words/strings
words=(One Two Three '4 on the floor')
for i in "${words[@]}"
do
	echo The string is $i
done
printf "\n"

# Iterate over range of letters
for i in {a..e}
do
   echo The letter is $i
done
printf "\n"

# Iterate over range of numbers
for i in {1..4}
do
   echo The number is $i
done
printf "\n"

# Iterate over range of non-sequential numbers
weeks=(2 3 {5..10})
for i in "${weeks[@]}"
do
	echo The week is $i
done
printf "\n"

# Iterate over range of floating point numbers between 2 and 3 with a step value of 0.1
for i in $(seq 2.0 0.1 3.0)
do
	echo The decimal is $i
done
printf "\n"

Run this script with bash. What does it do? Note the syntax of each loop construct:

for VAR in VAR_LIST
do
  FOO BAR
done

This is why we call these ‘FOR’ loops; they iterate over a list of variables, once for each value the variable can take.


Now, let’s return to our single_image.pbs script. Can you write a FOR loop to submit a job for each of the images we identified earlier?

Answer

#!/bin/bash

images=(castle escargot fridge glass plants snow)

for image in ${images[@]}
do
    qsub -N $image single_image.pbs
done

Did you remember to include the #!/bin/bashhashbang’ to let the OS know what language your script is in?

In the examples above, a couple of different Bash elements are used. There is the ‘brace’ sequence expansion {i..f..s}; there is the use of the seq function inside a command substitution $(..); and there is the use of a Bash array VAR=(A B C ..).


Have a look at the script povray.sh. Does the loop there match the one you wrote? What Bash features are used?

cat povray.sh

Make any required changes to povray.sh and run it with bash povray.sh. This runs qsub with each iteration of the loop and submits all the jobs. When the job has completed, view each of the images to make sure they were rendered.

Note that the syntax used to retrieve elements from a Bash array of length N is ${VAR[i]} where i runs from 0 to N-1 (ie, Bash uses ‘zero-indexing’). The entire array can be accessed with ${VAR[*]}, and an m element range starting with the j-th by ${VAR[*]:j:m}.1


Indexing Bash arrays

FOR loops are very handy, and generally efficient structures in low-level languages like Bash. However, for submitting a large number of jobs to the PBS Scheduler an array job is preferable. This is because it’s less work for the scheduler to manage; it’s also easier to keep track of for you!

Have a look again at single_image.pbs:

cat single_image.pbs
[jdar4135@login2 Povray]$ cat single_image.pbs
#!/bin/bash

#PBS -P Training
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -l walltime=0:05:00
#PBS -q small-express

module load povray

cd $PBS_O_WORKDIR/$PBS_JOBNAME
povray res $PBS_JOBNAME.pov

How could you adapt this script to run as an array job? What could you replace $PBS_JOBNAME with?
How would you set it for each array index?
(Hint: Look back at the FOR loop you wrote above)


Extra hint (only if you’re stuck!)

What would the following Bash script do?

#!/bin/bash
images=(castle escargot fridge glass plants snow)

for i in {0..5}
do
    echo ${images[i]}
done

Answer

#!/bin/bash

#PBS -P Training
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -l walltime=0:05:00
#PBS -q small-express
#PBS -J 0-5

module load povray

images=(castle escargot fridge glass plants snow)

image=${images[$PBS_ARRAY_INDEX]}

cd $PBS_O_WORKDIR/$image
povray res $image.pov

Did you remember that Bash arrays index from 0?


Aside: If you had lots of images in named directories to process, you could use globbing or the find function to get a list of directory names to make your ‘imagesBash array variable. Eg, with globbing (using * wildcards) you could write:

images=(`echo */ | xargs -n1 basename`)

The */ wildcard expands to list all the directories in the current folder (the trailing / selects directories). We also have to pipe the results to the basename function (.. to remove that trailing / !). xargs does this piping, and the -n1 flag tells xargs to only pipe one echo‘d directory at a time.

Other solutions to this little problem might be

ls -d */ | xargs -n1 basename
find . -maxdepth 1 -mindepth 1 -type d -exec basename {} \;
find . | egrep -o '(\w+)\/\1\.pov' | xargs -n1 dirname

:wink:

(That last one will actually only match directories containing .pov files with the same name, so it’s a bit safer than the others!)


Further exercises

1. povray array job with a config file

Our solution above using a PBS array job was pretty neat, if I do say so myself. However, it may not prove to be very flexible.

Write another PBS script using an array job to render all of the images in the Povray example, but this time use a config file instead. Look back at the config file examples in the previous Episode if you need a reminder!

Solution

First we need a config file. Let’s call it ex1.config:

# ArrayID Image
1 castle
2 escargot
3 fridge
4 glass
5 plants
6 snow

Now we need a PBS script. Open single_image.pbs in your preferred editor, but don’t forget to save it as new file! Call it ex1.pbs (In nano, type a new name on the entry line when you press Ctrl+o):

#!/bin/bash
#PBS -P Training
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -l walltime=0:05:00
#PBS -q small-express
#PBS -J 1-6

module load povray

cd $PBS_O_WORKDIR
config=ex1.config

image=$(awk -v taskID=$PBS_ARRAY_INDEX '$1==taskID {print $2}' config)

cd $image
povray res $image.pov

Did you remember to set the indexing to match your config file?

If you like, submit this script to Artemis with qsub. Did it work?


2. povray array job with extra parameters

Having a config file allows greater ease and flexibility to add extra parameters or options. Suppose you wanted to render the Povray example images at different resolutions? The povray function takes argument flags -W and -H to set the width and height of the rendered image.

Adapt your solution to Exercise 1 above to render the images in the following sizes:

Image Width Height
Castle 480 360
Escargot 600 400
Fridge 768 576
Glass 1024 768
Plants 480 360
Snow 320 120

Solution

First we need a config file. Let’s call it ex2.config:

# ArrayID Image Width Height
1 castle 480 360
2 escargot 600 400
3 fridge 768 576
4 glass 1024 768
5 plants 480 360
6 snow 320 120

Now we need a PBS script. Open ex1.pbs in your preferred editor, but don’t forget to save it as new file! Call it ex2.pbs (In nano type a new name on the entry line when you press Ctrl+o):

#!/bin/bash
#PBS -P Training
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -l walltime=0:05:00
#PBS -q small-express
#PBS -J 1-6

module load povray

cd $PBS_O_WORKDIR
config=ex2.config

image=$(awk -v taskID=$PBS_ARRAY_INDEX '$1==taskID {print $2}' $config)
width=$(awk -v taskID=$PBS_ARRAY_INDEX '$1==taskID {print $3}' $config)
height=$(awk -v taskID=$PBS_ARRAY_INDEX '$1==taskID {print $4}' $config)

cd $image
povray -W$width -H$height $image.pov

Did you remember to select the right config file?

If you like, submit this script to Artemis with qsub. Did the images render at the correct resolutions?



Notes
1In this construction, only either j or m are required, so :j will retrieve the j-th element and beyond, whilst ::m will retrieve the first m elements (up to element m-1, as the indexing starts with 0).



Key Points

  • The PBS_JOBNAME variable can also be used to batch analyses

  • Array jobs can replace FOR loops in PBS scripts