Saturday, February 2, 2008

Simple BASH-based Queue System

I'm rather new to the entire Linux shell scripting world. After a brief look into BASH (Bourne-Again Shell), the default shell used in Ubuntu which I have installed on my office machine, I set about on my next mini-project.


Whilst I have seen many examples of BASH in use (like drawing ASCII circles, etc.), I thought that I should develop something seemingly useful (seeing that the office machine isn't used for anything particularly useful, except experimenting and Microsoft Office on the Windows partition for those that absolutely insist on sending attachments in MS Office formats). Anyway (I digress), I planned to create a workhorse machine which would perform all my MATLAB simulations, etc. automatically. Actually, it'd be a cheap (and slimmed-down) version of the University's grid service: Iceberg


So with some sticky-back plastic, I set about sorting out SSH (Secure Shell) availability on Ubuntu such that I could "dial in" to the office machine, submit a job and log-out (SSH will allow you to establish a remote shell and, with the X-server, you can also receive graphics/windows allowing, essentially, a complete remote working solution). With SSH set for a single (restricted) account, I altered the account's home directory access control such that the queue manager account would have read and write access to files (otherwise, no tasks will ever be started, let alone completed). So with SSH working and an account for working remotely, it was now possible to build the foundations of the queue manager.


Submitting Tasks/Jobs


The first step was to establish a path that both tasks could be queued to and script files related to the queue system could be stored. I chose /home/queue/ so as not to cause any grief to the already-fudged system files and directories I had been experimenting with on previous occasions. The first task problem to overcome was how to create these task files such that the queue manager could execute each one consecutively. The simplest solution was to save every task as a script file within the common queue directory. To overcome the problem of multiple tasks with the same name, each task/job was given a unique Job ID which would, in effect, also act as a receipt. The resulting submitjob script file was created (again, my apologies for the formatting issues):


#!/bin/bash
#
# Used to submit jobs to the Queue Manager
#
# Version: 0.1 (24th October 2007)
# Author: Andrew
#

# Generate a job ID
jobid="`date +%y%m%d%H%M%S`-$RANDOM"

# Check to see if a parameter has been given
if [ $# != 1 ]
then
echo ' ERROR: Job to submit not specified'
echo ' '
echo " Usage: `basename $0` [file to submit]"
echo ' '
exit 1
fi

# Check to see if the file does not exist
if [ ! -e $1 ]
then
echo " ERROR: The file \"$1\" does not exist."
exit 1
fi

# Rename the current file
mv $1 ${jobid}.job

if [ $? -eq 0 ]
then
# Copy the job file
cp ${jobid}.job ${QUEUEPATH}
if [ $? -ne 0 ]
then
# Error copying to queue path
echo " ERROR: Unable to copy ${jobid}.job to ${QUEUEPATH}"
exit 1
fi
else
echo " ERROR: Unable to create new job file with ID: ${jobid}"
exit 1
fi

echo ' '
echo " You have been assigned job ID ${jobid}"
echo ' ************************************'
echo ' '
echo ' Please make a note of your job ID as you will need this'
echo ' to stop your job and/or know when your job has finished'
echo ' '
echo ' As a reminder, the job submission file has been renamed'
echo " to the corresponding job ID ${jobid}.job. This file"
echo ' can be safely deleted as the job is now in the queue.'
echo ' '


Note the use of the global variable QUEUEPATH. This is set to the queue path whenever a Terminal session is created which means the path can be updated or altered at a later date without the need to change all the source files). To distinguish job files from other types of files, I skilfully appended the file extension .job to all task files.
So now we've submitted a job file, we'd preferable want to be able to do some things with it, so we move onto the queue manager itself.


The Queue Manager


The queue manager's job, as its name implies, is to manage jobs by checking for new script files to execute and... well, execute them. To get a fair idea of what to run, a simple directory listing would do the job which could then be recorded to a text file. If the file was empty, it should act idle and await a task (without hogging processor cycles so as not to hinder other unproductive work) whereas if there were entries within the list of job files, it should execute them consecutively. This immediately screams the need for a loop of some description and, due to our lack of knowledge on how many files there will be, we'd need to use a do...done loop. This should also ring some alarm bells as we'd need some kind of condition (other than Ctrl + C) to terminate the queue manager. Whilst I could invest time in creating an intelligent solution to this problem, I opted for another file creation system (namely a file called quit.job which would terminate the loop should the queue manager see this file).


Anyone could then technically create a file called quit.job and terminate the queue manager (i.e. sabotage... although "who?" remains a good question and only fuels my paranoia), so the owner would need to be determined. Luckily, Linux has a utility called whoami which, unlike the Jackie Chan movie, determines who's the current user (great for those that forget who they are... like the Jackie Chan movie). This is also embedded into the BASH if...fi statements as the tag -O filename.ext which makes the task of identification extremely easy. Additionally, the ability to notify the owner of task completion (and even when a job starts) can be done through the sendEmail program. This script file is included below (I have, however, altered the email address and SMTP mail server fields to minimise the risk of junk mail and a very unhappy University):



#!/bin/bash
#
# Basic queue manager for running jobs remotely.
#
# Version: 0.1 (24th October 2007)
# Author: Andrew
#

# Ensure a command to quit doesn't already exist
rm -f ${QUEUEPATH}quit.job

echo ' Queue Manager Initialising...'
echo ' '
echo ' Entering looping state...'
while
true # Loop forever
do
# Check to see if there are jobs
ls ${QUEUEPATH} | grep .job > ${QUEUEPATH}contents.txt
nextjob=`head -1 ${QUEUEPATH}contents.txt`
# Then check to see if there's something in this variable. If not, then there's not job to run
if [[ ! -z $nextjob ]]
then
# There's a job to run
echo "Job ${nextjob%%.job} Started"

# Extract notification information
notification=`fgrep "#$" "${QUEUEPATH}$nextjob" | tr "[:upper:]" "[:lower:]" | fgrep "notify" | head -1 | tr -d " " | cut -c10-11`

if [[ ! -z $notification ]]
then
# Extract email information
emailadd=`fgrep "#$" "${QUEUEPATH}$nextjob" | tr "[:upper:]" "[:lower:]" | fgrep "email" | head -1 | tr -d " " | cut -c9-50`

# Check to see if a notification should be sent at the start of the job
if [[ ( $notification = 1 || $notification = 3 ) && ! -z $emailadd ]]
then
sendEmail -f "my.email@ddress" -t "$emailadd" -u "Job ${nextjob%%.job} Started" -m "Hi,\n\nAs requested, this message is to inform you that job ${nextjob%%.job} started at `date`\n\nRegards,\n\nAndrew" -s "smtp.mail.server.address" -q
fi
fi

# Run the job in a new bash shell
bash ${QUEUEPATH}$nextjob

# Job complete tell user

if [[ ( $notification = 2 || $notification = 3 ) && ! -z $emailadd ]]
then
sendEmail -f "my.email@ddress" -t "$emailadd" -u "Job ${nextjob%%.job} Completed" -m "Hi,\n\nAs requested, this message is to inform you that job ${nextjob%%.job} completed at `date`\n\nRegards,\n\nAndrew" -s "smtp.mail.server.address" -q
fi

# Now Job is complete. Remove the file
rm -f ${QUEUEPATH}${nextjob}
echo ' '
echo "Job ${nextjob%%.job} Finished"

else
# There's no job to run

# In order to save the poor computer, sleep for 2 seconds
sleep 2
fi

# Need to check for the quit.job file and
# confirm that the owner is whoami
if [[ -e ${QUEUEPATH}quit.job ]]
then
if [[ -O ${QUEUEPATH}quit.job ]]
then
break
else
echo " Owner of file different to `whoami`. Removing file"
rm -f ${QUEUEPATH}quit.job
fi
fi
done

# If outside the main loop, the quit file exists.
# Need to delete the file and then inform user of
# the quit command
echo ' Quit file detected. The Queue Manager is shutting down...'
# Delete the file
rm -f ${QUEUEPATH}quit.job
echo ' Clean-up complete. Queue Manager finished.'


So there you have it. One queue manager completed. You may notice some rather odd goings-on with the email address system as it parses the header information from a task script file which would contain information such as the what type of notifications to send (when starting and/or finishing a job). I have also set a default time of 2 seconds before checking for new items in the queue should there be no jobs to execute in order to minimise CPU cycle wastage.


MATLAB?


As I mentioned earlier, my intention is to use the office machine as a workhorse that would, predominantly, run consecutive MATLAB simulations. MATLAB has the ability to run under a Terminal window (handy for those that dislike the memory hungry Java GUI in favour of the trusty (not to mention stable) Terminal window). This is possible thanks to the arguments -nojvm -nosplash -nodisplay which seem pretty self-explanatory. One thing to take note is that as the GUI is essentially disabled, no windows will appear (so forget using the plot feature in this mode - you're better outputting the contents into other interpretors like Python's matplotlib library which, in my opinion, is far more customisable (and, hence, more powerful?) compared to MATLAB's inbuilt plot function). So, courtesy of the Iceberg team, I give you the matlabjob script:


#!/bin/bash
#
# For running matlab jobs
#
help()
{
echo ' Usage: matlabjob matlab_script_file [output_file] '
echo ' '
echo ' This command runs the MATLAB script file in'
echo ' non-interactive, non-graphical mode'
echo ' '
echo ' if the output_file is not specified, output is directed to stdout'
}
if test -z "$1"
then
help
exit 1
else
if test -f "$1"
then
if test -z "$2"
then
matlab -nojvm -nosplash -nodisplay < "$1"
else
matlab -nojvm -nosplash -nodisplay < "$1" > "$2"
fi
else
echo "$1" "is not a file"
fi
fi


Create a job file


Creating a job file that conforms with the specification required by the queue manager is important and so, to facilitate the user in creating a compliant script file, a wizard was created. Now, whenever I hear wizard, I normally shudder (no, not childhood trauma as you'd expect) and this is mainly down to the fact that most wizards you see nowadays try to do everything for you (including mess up everything - I need no help in this area). In order to please the end user (i.e. myself and perhaps a few other people in the office I could force to use this system), I developed a wizard that could create a script file with minimum intervention. Yes, there are better ways to do this kind of thing (like create a GUI, take up memory and crash your system), but I wanted a complete BASH solution to my queuing woes (after all, isn't it just the British that queue? (I suppose mainly out of necessity really)). As you're probably wondering what the hell I'm talking/writing/typing about, here's the code:



#!/bin/bash
#
# This creates a job file which can be read by the queue manager
#
# Version: 0.1 (27th October 2007)
# Author: Andrew
#

# Custom function for extending path names
tolongpath() {
if [[ "$1" = /* ]]
then
echo "$1"
else
echo "`pwd`/$1"
fi
}

# Set version
version="0.1"

if [[ $# = 1 ]]
then
# There's only one argument
# Check to see if the file has the .m extension
if [[ "$1" != *.m ]]
then
# The parameter doesn't seem to be an m-file. It's probably a log
outputfile="$1"
fi
elif [[ $# > 2 ]]
then
echo 'ERROR: Too many parameters'
echo ' '
echo "USAGE: `basename $0` [matlab_script_to_run.m] [job_file.job]"
echo " NOTE: Both parameters are optional"
echo ' '
exit 2
else
# Check for a second parameter
if [ -z "$2" ]
then
outputfile="myjob.job"
else
outputfile="$2"
fi
fi

echo '# This file was created using the interactive tool' > $outputfile
echo "# called `basename $0`, version $version" >> $outputfile
echo "# Creation date: `date`" >> $outputfile
echo '# ' >> $outputfile
echo ' ' >> $outputfile

emailnotify="preq"
# Loop until we get a desired answer
while [[ "$emailnotify" != n* && "$emailnotify" != y* ]]
do
echo -en "Do you wish to enable email notification for your job? [y/n]: "
read emailnotify
emailnotify=`echo $emailnotify | tr [:upper:] [:lower:]`
done

# If email notification is required
if [[ "$emailnotify" == y* ]]
then
# Now perform email check:
while [[ "$emailnotify" != *@* || ( "$emailnotify" != *.co* &&
"$emailnotify" != *.ac.uk ) ]]
do
echo ' '
echo -e "Please enter a valid email address to which the"
echo -en "notifications will be sent to: "
read emailnotify
done
# Append this email address to the output file
echo "#$ email=$emailnotify" >> $outputfile

emailnotify=10
while [[ $emailnotify -lt 1 || $emailnotify -gt 3 ]]
do
echo ' '
echo 'Would you like notification when the job...'
echo '1. Commences'
echo '2. Completes'
echo '3. Commences and Completes'
echo ' '
echo -en "[Please select 1, 2 or 3]: "
read emailnotify
done

# Now add this to the job file
echo "#$ notify=$emailnotify" >> $outputfile
fi

# End of email section
# From here, we deal with just job that will be carried out

# Automatic mode...

if [[ "$1" = *.m ]]
then
echo ' '
echo 'MATLAB file detected as input variable. Automatically configuring setup file...'
if [[ -e $1 && -f $1 ]]
then
if [[ "$1" = */* ]]
then
# The file already has a path
echo "matlabjob \"$1\"" >> $outputfile
else
# Relative paths work okay
echo "matlabjob \"`pwd`/$1\"" >> $outputfile
fi
echo ' '
echo "Job file created successfully. You can now submit the job file \"$outputfile\""

# # Change the permissions within the directory
# $pathtochange=`readlink -f "$1"`
# $pathtochange=`dirname "$pathtochange"`
# chmod -R -f o+rwx $pathtochange/*
exit 0
else
echo "Could not find MATLAB file \"`pwd`/$1\"."
echo "Please ensure this file exists"
# Remove the output file as it's only longer required due to the failure
rm -f $outputfile
exit 2
fi
fi

# Manual mode...

# Ask the user if their job is MATLAB based
ismatlab="preq"
# Loop until we get a desired answer
while [[ "$ismatlab" != n* && "$ismatlab" != y* && "$ismatlab" != a* ]]
do
echo ' '
echo -en "Is the job MATLAB based (you need to have prepared a m-file)? [y/n/(a)bort]: "
read ismatlab
ismatlab=`echo $ismatlab | tr [:upper:] [:lower:]`
done

# Check input
if [[ "$ismatlab" = a* ]]
then
# Request to abort received
exit 1
elif [[ "$ismatlab" = y* ]]
then
# MATLAB will be used

# Tell the user the shortcut that can be used - currently only available for .m files
echo "HINT: You can use \"`basename $0` [matlab_script_to_run.m] [job_file.job]\""
echo " to quickly create a Job file. You will then only be prompted for"
echo " email notifications."
echo ' '

ismatlab="preq"
# Loop until we get a desired answer
while [[ "$ismatlab" != n* && "$ismatlab" != y* ]]
do
echo ' '
echo -en "Do you wish to take any MATLAB output to a file (for debugging)? [y/n]: "
read ismatlab
ismatlab=`echo $ismatlab | tr [:upper:] [:lower:]`
done
if [[ "$ismatlab" = y* ]]
then
# Take the name of the filename
echo ' '
echo -e "Please enter the name of the file you wish to save the output to: "
read logoutput
fi

# Now to get the MATLAB script from the user
while [[ ( "$matlabfile" != *.m && ! -e $matlabfile ) || -d $matlabfile ]]
do
echo ' '
echo -e "Please enter the filename of the script (including the .m extension): "
read matlabfile
done

# Now check the path
matlabfile=`tolongpath "$matlabfile"`

# Check to see if an output should be taken and write the necessary line
if [[ -z $logoutput ]]
then
# No log output
echo "runmatlab \"$matlabfile\"" >> $outputfile
else
logoutput=`tolongpath "$logoutput"`
echo "runmatlab \"$matlabfile\" \"$logoutput\"" >> $outputfile
fi

# # Change the permissions within the directory
# $pathtochange=`readlink -f "$matlabfile"`
# $pathtochange=`dirname "$pathtochange"`
# chmod -R -f o+rwx $pathtochange/*

# Finally, there's no need to continue running the script, so exit
exit 0
fi
# Ask the user whether a script file will run

isscript="preq"
# Loop until we get a desired answer
while [[ "$isscript" != n* && "$isscript" != y* ]]
do
echo -en "Has your job been created as a BASH-compatible script file? [y/n]: "
read isscript
isscript=`echo $isscript | tr [:upper:] [:lower:]`
done

# If email a BASH script will run is required
if [[ "$isscript" == y* ]]
then
# Script bash script to run. Which script?
while [[ ! -e $scriptfile || -d $scriptfile ]]
do
echo ' '
echo -e "Please enter the filename of the script (including the extension, if any): "
read scriptfile
done

# We now have a script file. It should be checked, but, on good faith,
# we assume file is okay

# Check the path and should it be a relative path, put it into its longer context
scriptfile=`tolongpath "$scriptfile"`

# Now add the line to the job file. There may be other arguments
# that could be added to the job file to customise the bash window
echo "bash \"$scriptfile\"" >> $outputfile

else
# No script will run. Nothing left to do
echo 'Create Job script finished. Nothing else to do.'
echo 'No job information was created - just a header.'
echo "Job file created as $outputfile. Add commands to the end of this file"
fi


So there you have it. The createjob script has been designed for MATLAB scripts in mind (although it will run other types of script files) in order to make that process easier (otherwise, users would be required to enter the corresponding MATLAB arguments and, should they forget, my office machine will have opened instances of MATLAB in GUI mode which could seriously cause some problems).


Other files


There exist other files that enable the remote user to view items within the queue and delete items from the queue in addition to the utility that ends the queue manager. None of which I will go into detail as they're simple script files, so I'll only say that they exist (and work).


Conclusion


Current statistics show that nobody is using it and I haven't any simulations to run just yet... not only that but Iceberg was recently upgraded such that the potential user(s) I had in mind will want to use the more powerful resources available to them... that is until I create a multi-CPU queue system which will happen as soon as I get my hands on a redundant computer within the office...


I should also mention the fact that the above has undergone a few revisions which aren't shown here - they're relatively minor changes, but has made the "experience" slightly better. Everything you see above was designed and implemented over the course of three days (I make that clear otherwise it would appear no academic work is done throughout my work week - a fair comment to make). Additionally, as with all small print, I take no responsibility for any damage to your office computer (or mine come to that matter) should you go about implementing the above.


1 comments:

Pavel Patrin said...

Another tasks queue written in bash:
https://github.com/pavelpat/yastq