The SIXPAC
by Robert Amodeo
9/26/2003
The SIXPAC is a "mini-cluster" of six Dell Optiplex GX110 machines, with 933 MHz speed, and 512MB
memory. These machines run the current version of SuSE Linux, 8.2. These machines are faster
than our current base of Solaris SPARC Machines, which support the main UNIX based features of
the Math network. Although are SPARC servers are capable of executing codes, they are more used
for performing UNIX system and network functions (like handing mail, for example).
The sixpac machines are better suited to running codes; we have therefore set up a dedicated
batch submission application for the sixpac, for the purpose of delegating executable codes to
those nodes that are free and available on a continuing basis. The Sun Grid Engine is used as
the background batch submitter, but will be transparently run through specific scripts we have
employed to handle more Math system specific concerns.
Running jobs on the SIXPAC
From any server on the MathNet, you can run your "compiled" jobs (C, C++, F77, F90, etc.). You
simply submit the job to the SIXPAC, and it will parcel out the job to any of the six machines
that is currently not running any jobs.
Assume that your script / program is in your root directory, in a subdirectory called, let's
say, 'mydir', you would simply type:
sixpac mydir
and then a queueing script will appear in your UNIX window (called job.q). The script will guide
you through a series of questions for submitting your job. First and foremost, you should "build"
your command script to run the job. This can be done by selecting "b" for build. Since you
specified the directory where your file is located as 'mydir', it will look in 'mydir' for that
file. Afterwards, it will create the command script for you.
If instead of using the batch submission (queueing) procedure, you intend to log directly in to one of the
sixpac machines, you can also run jobs that way. However, you run the risk of overloading a machine
that could already be running jobs. In addition, if you a running a job there, the sixpac queueing
system will overlook that machine when submitting a new job (as long as your job is running on it).
I.e., if others want to submit to the sixpac, they may not be apprised of the status of the available
machines if you are running your job outside of the queueing system.
Reviewing job queue status on the SIXPAC
To review the job queue status on the SIXPAC, simply type:
sixque
Then, you will see a screen which displays the jobs running on the various SIXPAC machines. It will
look something like this:
job-ID    prior name        user p         state submit/start at      queue       master   ja-task-ID
---------------------------------------------------------------------------------------------
      21       0   hello.sh.c   ra      t       09/26/2003   15:47:48   guiness.q    MASTER
To interpret this: job-ID is 21, user is ra, file submitted is hello.sh.cmd, time of submission is 3:47 on 9/26, and the
job ran on queue 'guiness.q' (one of the SIXPAC machines).
Sample job run on the SIXPAC
The following are a sequence of windows which appear in your command line screen after running 'sixpac mydir':
Notice to users of the job.queue script:
The output for SGE jobs generated by the job.queue script
will be written to two files:
'jobname'.joblog will contain the output from the 'jobname'.cmd script.
'jobname'.output will contain the output from the program or script being executed.
Enter to continue.
Functions (acceptable abbreviations are shown in CAPS)
Menu: Display this menu
Build: Build a SGE .cmd file for Serial
Submit: Submit a SGE .cmd file for execution
STatus: Display the status of SGE jobs for ra
SYsstat: Display the status of SGE jobs for the system
Hold: Hold a SGE job
RELease: Release a SGE job that is held
RESet Reset the priority of a SGE job
Cancel: Cancel a SGE job
Quit: Exit this script
Command: b (<-- for build)
Enter the name of the program or script to be executed
: hello.sh (<-- for example)
Checking for duplicate queue control files.
You already have a "/net/tupelo/h1/maint/ra/hello/hello.sh.cmd" file.
Do you want to remove this file and continue (y or n)?
'default n': y (<-- for example)
Enter any arguments for the hello.sh program or script (default none):
The "hello.sh.cmd" file has been built.
Would you like to submit it (y or n)?
: y (<-- for example)
Checking for duplicate output files.
You already have a "/net/tupelo/h1/maint/ra/hello/hello.sh.joblog" file.
You already have a "/net/tupelo/h1/maint/ra/hello/hello.sh.output" file.
Do you want to overwrite these files and continue (y or n)?
'default n': y (<-- for example)
your job 19 ("hello.sh.cmd") has been submitted
Current SGE job status for ra
job-ID prior name user state submit/start at queue master ja-task-ID
---------------------------------------------------------------------------------------------
19 0 hello.sh.c ra qw 09/25/2003 12:00:05
Enter to continue.
BRIEF EXPLANATION OF WHAT WENT ON ABOVE.
1) You ran 'sixpac mydir'.
2) It launched a notice screen, you hit enter
3) It gave you a menu; you selected 'b' to build the command script
4) It asked for the name of the file to run (e.g., hello.sh)
5) It found that you already have a command script; asked to remove it; you answered yes
6) It asked you to enter any arguments. You had none, and hit enter
7) It told you the command file was built; it asked you to submit it; you answered yes
8) It checked for duplicate files; it asked to overwrite them; you answered yes
   (please note, each output file is NOW tagged with the job ID number at the end.
9) It submitted your job, and gave you a message indicating the status in the queue
10) If you hit enter again, you'll return to the main menu, and then hit 'q' for quit
FURTHER NOTES
1) If you just run 'sixpac' instead of 'sixpac mydir' (or whatever you call your subdirectory),
  it will just look in your home root (i.e., ~username directory).
2) If you have built your command script, and just want to run it (perhaps with different input
   only, then you type 's' for submit instead of 'b' for build. If you 'submit' a job, you
   still need only type the filename (e.g., hello.sh), and NOT the command script name hello.sh.cmd.
3) You will receive 2 emails to your address in regards to the job: 1) tells you when the job
   was submitted, and 2) tells you full details of the job (runtime, completion time, etc.).
   Samples of these emails are below.
The EMAILS that you will receive regarding the job you just ran
Subject: Job 19 (hello.sh.cmd) Started
Job 19 (hello.sh.cmd) Started
User = ra
Queue = bud.q
Host = bud.math.ucla.edu
Start Time = 09/25/2003 12:00:13
Subject: Job 19 (hello.sh.cmd) Complete
Job 19 (hello.sh.cmd) Complete
User = ra
Queue = bud.q
Host = bud.math.ucla.edu
Start Time = 09/25/2003 12:00:13
End Time = 09/25/2003 12:00:13
User Time = 00:00:00
System Time = 00:00:00
Wallclock Time = 00:00:00
CPU = NA
Max vmem = NA
Exit Status = 0