Systems Analysis - Batch Processing Systems
Academic Computing Services , The University Of Sheffield
Stuart Herbert (S.Herbert@sheffield.ac.uk)Document copyright ©. All rights reserved.
Abstract
This document is a report to evaluate and contrast freely available
batch processing systems. 4D/NQS, a commercial system from Sillicon
Graphics [3], is used in passing for comparison purposes.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Introduction
Introduction
JISC, as part of its New Technology Initiative, has funded a one
year post to supply and support batch processing systems for UNIX
for the UK HE community.
As part of this process, we have located, and examined, existing
freeware products, seeking to determine which, if any, we should
support in the future.
There are two principle systems :
- NQS, derived from MDQS, has been sold commercially by Sterling
Software, and several derived systems are freely available via
the Internet, using ftp(1).
- DQS, written apparently from scratch at the Floria State
University, is the main alternative available via the Internet,
using ftp(1). DNQS is an older product from the same author.
Both types of system provide essentially the same ability - to
make use of idle workstations by running jobs deferred by users.
Both present the same mechanism for doing so - the queue, into which
requests are placed, and from which they are taken and executed at
some later time.
The Software
Condor
Condor is a batch system developed from scratch at WISC in the
States. As most UK HE sites which expressed an opinion [9] stated
that they use some form of NQS, and none mentioned using Condor, it
was decided not to evaluate this software.
However, we do intend to track its development with a view to
incorporating any desirable features into our own development.
If anyone else is interested in looking at Condor, the contact
address appears to be
> mike@cs.wisc.edu
In addition, I am informed that Condor has been adopted as IBM
Loadleveler for IBM systems.
DQS v3.1.1
Please note that DQS is described as a BETA quality system, and that
evaluation was a paper-based exercise due to unresolved problems
with the software.
Details
- Author :
Tom Green, Florida State University (dqs@scri.fsu.edu)
- Location :
ftp://ftp.scri.fsu.edu/pub/dqs/DQS_3_1_1.tar.gz
Installation
Installation and configuration was not achieved.
- The documentation is supplied as Postscript files. For sites
without Postscript printing facilities, install GNU's Ghostscript
and Ghostview utilities. All documentation also has TeX source
supplied.
- The `make config' target alters /etc/services automatically if the
invoker is the superuser. Invoke `make config' as a normal user,
and then update NIS+ by hand.
- The `make config' target asks about options which are not
documented in the installation instructions.
- The supplied Imakefiles have illegal comment statements in them
(old-style #'s). These need to be changed to XCOMM or to C-style
comments.
- The X Windows programs include a source-distribution of Xaw3d, a
3D replacement for the Athena widget set. The header file `At.h'
attempts to override the prototype for strcasecmp() and
strncasecmp(). This problem also occurs in various source files.
- The installation document includes no help or information on
configuring DQS - one has to refer to the man pages for that,
making configuration a trial-and-error process. (The
documentation states that configuration is covered in depth; this
is definitely not the case.)
- Contrary to what the documentation says, one must configure the
queues before qmaster will run. Unfortunately, configuration
apparently cannot be achieved without a running qmaster daemon.
At this point, it was decided that further effort with DQS was not
appropriate.
(Addendum: we later learned, thanks to help from the authors, what
our problem with DQS was. The qmaster daemon was silently failing
because it could not locate the two sockets it wanted by name - the
names given in the documentation are incorrect.)
Comparison With 4D/NQS
- DQS has a single point of failure - the qmaster daemon. NQS has
no such problem.
- DQS allows information about available resources (such as PVM) to
be attached to queues, allowing for very detailed configuration.
- The DQS source includes runtime sanity checks and an outline
traceback facility which could be modified to provide even better
traceback.
- DQS does not seem to support any of the OS-based limits such as
CPU usage et al. At least, there is no mention whatsoever in the
documentation about support for these limits.
Future Work
- POSIXfy the source code, where it is not already compliant. This
will simplify maintenance in the future.
- Write a detailed, and complete, installation guide, perhaps
suppling a better means of configuration.
- Look into minimising, or removing, the single qmaster daemon as a
potential point of failure.
- Enhance the included debugging mechanisms, to provide a nested
traceback facility.
- Update the error reporting so that messages are not misleading.
In addition, the following points should be considered.
- Introduce Extended Hungarian Notation, to improve readability and
therefore maintainability.
Summary
DQS is an actively maintained product, which provides a highly
configurable environment in which to work. The design of DQS
includes a major weakness - given the unreliability of certain
versions of UNIX, this must be a cause for concern.
If DQS is not selected as the most suitable product, the
implementation of some of its features into the selected product
must be strongly considered.
MCC NQS
Details
- Author :
Phil Stringer, Manchester Computing Centre (P.Stringer@mcc.ac.uk)
- Location :
ftp://vpx.mcc.ac.uk/pub/mcc.solaris.gz
Installation
The code installed without undue problems on Solaris 2.
Comparison With 4D/NQS
- 4D/NQS does not appear to have any equivalent to the nmapmgr(1)
program of MCC NQS.
- MCC NQS does not include 4D/NQS's multi-processor support, or the
Non-Degradable Priority support.
- 4D/NQS does not support the passing of the group id (gid) -
support for this can be found in MCC NQS.
- MCC NQS otherwise appears to be comparable to the functionality
of 4D/NQS.
Future Work
- The various options added to MCC NQS are currently done using
external perl scripts; they need to be integrated into the
standard utilities (this does not affect the functionality).
- Documentation is restricted to manual pages, plus an installation
guide. Tutorial documentation for end users would be beneficial.
- The code requires restructuring to improve portability.
- Multi-processor support, for such as IRIX, needs adding.
- IRIX's Non Degradable Priority needs supporting.
Summary
MCC NQS is not a suitable product to base future development on.
- MCC NQS is not a product intended for use at other sites - work
is required to bring MCC NQS up to the level of the other
distributions before new features can be added.
- Overall, MCC NQS does not compare favourably to Monsanto NQS on
the feature set. While the missing features may not be of great
importance to most sites, this is still an issue.
Monsanto NQS v3.36
Details
- Author :
Jon Roman, Monsanto Software (jrroma@beaker.monsanto.com)
- Location :
ftp://wuarchive.wustl.edu/pub/Nqs/unix/monsanto-nqs-3.36.tar.gz
Installation
Installation and configuration was achieved with only three
difficulties.
- The code failed to compile, apparently due to a compiler error.
This was easily solved by splitting the problematic line into two
lines on each occaison.
- Due to an error in the Makefile, the daemons did not correctly
link with the NQS library, and instead linked with Solaris' UCB
library. This took time to locate, with help from the author.
- The INSTALL documentation is unclear on installing NQS on a
cluster of machines.
With these delt with, Monsanto NQS appeared to work as documented.
Comparison With 4D/NQS
- Both systems introduce the daemons `netdaemon', `logdaemon', and
`nqsdaemon'.
- 4D/NQS does not appear to have any equivalent to the nmapmgr(1)
utility of Monsanto.
- Monsanto includes the Non-Degradable Priority support of 4D/NQS,
but does not support 4D/NQS's ability to bind queues to
processors.
- Otherwise, on paper, Monsanto NQS is a complete superset of
4D/NQS, supporting all the utilities (and their switches) of
4D/NQS.
Future Work
Aside from feature lists arising from the UK HE community, the
following work is recommended should Monsanto be adopted as the
supported system.
- POSIXfy the source code where possible. This will require a
large effort, but will make porting to future platforms an easier
task.
- Restructure the code, into smaller, discrete units. This will
enable any future maintainers to find their way around easier than
at present, and should ease maintenance.
- Add multi-processor support for all operating systems which
provide a suitable API, such as IRIX.
- Implement a new error-reporting mechanism, which details the
origin of any reported errors and warnings, as well as their
cause. Tracking errors down is much simplified if the software
reports the file, and line number, which triggered the error.
- Compile-time configuration and installation can be greatly
improved - better documentation, and a configure script will ease
the process significantly.
In addition, the following points should be considered.
- Introduce Extended Hungarian Notation throughout. Coupled with
the restructuring of the code mentioned above, this will greatly
enhance the readability (and therefore the understandability) of
the code. The amount of time this will save future maintainers
cannot be overstated.
- Requests should be represented using text based messages rather
than the current binary mechanisms. This will make
cross-platform interoperability easier to maintain, and simplify
support for future, 64-bit and beyond platforms.
- Kernel-based limits can be eumlated in software for systems which
don't support them in the kernel. This emulation should be added
to make all platforms nominally equivalent.
Summary
Monsanto NQS is a product which works. It provides a superset of
the features found in CERN NQS, and in 4D/NQS (almost). Future work
needs to concentrate on improving the portability, and
maintainability of the code, plus implementation of any wish-lists
from the UK HE community.
Sterling NQS v2.3
NOTE that this evaluation of Sterling NQS [7], [8] is purely a
paper-based excercise, as our site does not run Sterling NQS.
Details
- Author :
Sterling Software.
- License :
Appear to be on a per-processor basis.
- Platforms (NQS):
AIX/6000, AIX/ESA, AIX/370, HP 9000: 700 or 800 series with HPUX,
SunOS, Solaris, IRIX, Amdahl systems with UTS.
- Platforms (NQS/Exec):
AIX/6000, HPUX on HP 9000: 700 and 800 series, IRIX, SunOS,
Solaris
- Port Used :
1701 recommended, 607 if communicating with Cray NQS. In
addition, NQS/Exec uses 1702, 1703.
- RPC (NQS/Exec only):
100078
Installation
According to the manual [7], installation is a matter of extracting
the NQS file from the installation tape, and then expanding that
into the NQS spool directory. From there, installation appears
comparable to Monsanto NQS.
Comparison With 4D/NQS
- Sterling NQS has no support for the Non-Degradable Priority or
binding queues to processors.
- 4D/NQS does not appear to have any equivalent to the nmapmgr(1)
utility of Monsanto.
- Otherwise, Sterling NQS on paper claims to be comparable to
4D/NQS.
Summary
Sterling NQS is a commercial product. Its inclusion of a
MOTIF-based front-end is a welcome new feature, but otherwise there
is nothing here which goes further than Monsanto NQS.
When I attempted to contact Sterling over pricing and licensing, I
was informed that the request had been passed on to their UK office,
and nothing more was heard. In light of this, I personally could
not recommend Sterling NQS to sites where batch processing was
mission-critical.
Feature Comparison
Table Of Features
The table below is a comprehensive list of features for DQS,
Monsanto NQS and Sterling NQS. Please note that the feature lists
for DQS and Sterling NQS come from the documentation alone, and are
therefore not necessarily accurate or complete.
> ------------------------------------------------------------------
> Feature | DQS | Mon | Ste | MCC |
> ------------------------------------------------------------------
> | | | | |
> Aborting Running Jobs | | x | x | x |
> AFS Support | x | | | |
> Broadcast message when job begins | | x | | 1 |
> Broadcast message when job ends | | x | | 2 |
> Cells | x | | | |
> Charging usage to accounts | x | | | |
> DFS Support | x | | | |
> Default maximum priorities on a per | | | | x |
> user basis | | | | |
> Default maximum priorities on a per | | | | |
> group basis | | | | |
> Deletion of pending requests | x | x | x | x |
> Deletion of the submitted script | | x | | |
> Device support | | x | x | x |
> Display spooled output | x | x | x | x |
> (on remote machines ?) | | | | x |
> Don't echo job_id to stdout | x | | | |
> Enable/disable queues | x | x | x | x |
> Epilogue scripts | | | | x |
> Execute request after a given time | x | x | x | x |
> and/or date | | | | |
> Export all environment variables | x | x | x | x |
> Extended display for executing | | | | x |
> jobs | | | | |
> Forms (for printing support) | | x | x | x |
> Gid support | | | | |
> Leave stderr on the machine where | | x | x | x |
> the job executed | | | | |
> Leave stdout on the machine where | | x | x | x |
> the job executed | | | | |
> Limit access to users/groups | x | x | x | x |
> Limit CPU time | | x | x | x |
> Limit corefile size | | x | x | x |
> Limit data segment size | | x | x | x |
> Limit total memory usage | | x | x | x |
> Limit permanent file size | | x | x | x |
> Limit temporary file size | | x | x | x |
> Limit stack size | | x | x | x |
> Limit working set size | | x | x | x |
> Limit number of print copies | | x | x | x |
> Limit size of a print file | | x | | |
> Limit the number of jobs executed | | x | x | x |
> simultaneously per queue | | | | |
> Limit the number of jobs executed | | x | x | x |
> simultaneously per user | | | | |
> Load balancing support | x | x | x | x |
> Lock the daemon into memory | | x | x | x |
> Machine database | x | x | x | x |
> Mailing of spooled output on | | x | | |
> completion | | | | |
> Mapping of users and groups from | | x | x | x |
> machine to machine | | | | |
> (by both name and uid together) | | | | x |
> Mark job as re-runable | x | x | x | |
> Mark job as not re-runable | x | x | x | |
> Manager status for users | x | x | x | x |
> Message appending to error and | | x | | |
> output log files | | | | |
> Modify pending requests | x | x | x | x |
> Move requests from queue to queue | | x | x | x |
> MPI support | x | | | |
> Multiple machine support | x | x | x | x |
> Name a request | x | x | x | |
> Nice levels for requests | | x | x | x |
> Non-degrading priority support | | x | | |
> Operator status for users | x | x | x | x |
> P4 support | x | | | |
> Place/remove holds on jobs | x | x | | x |
> Print request to stdout only | x | | | |
> Printing via device queues | | x | x | x |
> Process accounting support | | x | | |
> Prologue scripts | | | | x |
> Queue complexes | | x | x | x |
> Resource Information | x | | | |
> Run several versions simultaneously | | | | x |
> Running log file | x | x | x | x |
> Runtime debugging info | x | x | x | x |
> Satisfy resource requests | x | | | |
> Send extra signals to jobs | x | | | |
> Send mail to specified user | x | x | x | x |
> Send mail when request starts | x | x | x | x |
> Send mail when request ends | x | x | x | x |
> Send mail when request is suspended | x | | | |
> Send mail when request is aborted | x | | | |
> Send mail when request is restarted | | x | | |
> Send mail when request is transfered | | x | | |
> to destination machine | | | | |
> Sort queues according to request | | | | x |
> priority | | | | |
> Specify current working directory | x | | | |
> for a request | | | | |
> Specify a list of queues to submit to | x | | | |
> Specify environment variables to use | x | | | |
> Specify path to spool stderr to | x | x | x | x |
> Specify path to spool stdout to | x | x | x | x |
> Specify reauthentication interval | x | | | |
> Specify request priority | x | x | x | x |
> Specify the shell to use | x | x | x | x |
> Spool stderr & stdout as one output | x | x | x | x |
> Spooled device support | | x | x | x |
> Staged upgrades | | x | | |
> Suspend/resume pending requests | x | x | x | x |
> Status information on request | x | x | x | x |
> X Windows user interface | x | | x | |
> ------------------------------------------------------------------
AFS/DFS
The Andrew Filing System (AFS) is sold by Transarc, having been
developed at CMU in the States. DFS is a further development of
AFS, to be used by the COSE initiative.
Cells
As I understand it from DQS's documentation, a cell is a collection
of hetrogenus (not necessarily clustered) machines. By placing a
new workstation (for example) in a cell, it becomes available to DQS
to distribute requests to.
Print Request To Stdout Only
DQS will print the request to stdout, and will not actually add the
request to any queue.
Resource Information / Satisfy Resource Requests
Under DQS, you can attach information (called complexes) to
cells/queues, which state whether software such as PVM is available
on the machine served by that particular queue.
This then allows the user, who is submitting the request, to specify
a set of required resources, and to demand that DQS find a
queue/cell/whatever which has all those resources available.
Specify A List Of Queues To Submit To
Under DQS, the user can give a list of queues, and DQS works from
left to right to find a suitable queue to put the request into.
Send Extra Signals To Jobs
DQS can send the signals SIGUSR1 and SIGUSR2 before SIGSTOP and
SIGKILL respectively.
Specify Reauthentication Interval
DQS does not make it clear as to where authentication information
comes from.
Platforms
The table below lists the platforms on which the software claims to
be available.
> ------------------------------------------------------------------
> Platform | DQS | Mon-NQS | Sterl-NQS
> ------------------------------------------------------------------
> | | | |
> AIX 3.2 | Y | | |
> AIX/370 | | | Y |
> AIX/6000 | | | Y |
> AIX/ESA | | | Y |
> HPUX | | Y | Y |
> IRIX 4 | Y | Y | Y |
> IRIX 5 | Y | Y | Y |
> Linux | Y | Y | |
> NeXT 3.2 | Y | | |
> OSF/1 | Y | Y | |
> Solaris 2 | Y | Y | Y |
> SunOS 4 | Y | Y | Y |
> ULTRIX | Y | Y | |
> UTS | | | Y |
> ------------------------------------------------------------------
DQS and Monsanto NQS required modification to successfully compile
and run on Solaris 2.3.
Monsanto NQS's support for Linux is a third-party patch which I have
integrated into the main source tree.
Conclusions
Conclusion
On evaluation, Monsanto-NQS is considered to be the most suitable
system on which to base future development.
- It is a superset of all other freely-available versions of NQS,
and appears more complete than Sterling NQS is reported to be.
- While the design is not perfect, unlike DQS the design does not
introduce a major weakness, and is therefore easier to build on.
- The Monsanto sources can be freely distributed in modified form.
With DQS, possible ventures into commercial exploitation will
first require negotiation with Florida State University.
- Monsanto NQS is a simpler system to DQS, which can only make it
easier to build new features on top, and so allow more to be
achieved in the time available.
References
References
- [2] The GNU General Public License (Version 1), Free Software
Foundation, 1989
The license agreement for Monsanto NQS is designed to ensure that
all users of the software are granted specific rights, which
cannot be removed from them.
Note that you are free to relicense any GPL'd software under any
later version of the GPL, at your choice.
- [3] 4D/NQS, Silicon Graphics Inc.
Until 1994, Silicon Graphics Inc. provided their own version of
NQS, derived from the freely available source. Recently, they
withdrew support for the product, offering discounts on licenses
of Sterling NQS for the existing customer base.
- [4] Implementation of ISO/IEC 9899 : 1990; Programming Languages -
C, ISBN 0-580-19572-4.
- [5] Announcing CERN-NQS-2.4, Christian Boissat, CERN, 30 August
1993.
- [6] Using NQS, the Unix Batch System, Manchester Computing
Centre, 1st Edition, 20th August 1993.
Supposedly available on-line via gopher, but attempts to use this
to obtain a later copy were unsuccessful - Manchester's gopher
server was unable to locate the machine which hosts any such
document.
- [7] Sterling NQS(tm) and Sterling NQS/Exec(tm) System
Administration Guide, Sterling Software, no date.
One of two sources on which the evaluation of Sterling NQS
(above) is based, this manual is well organised, and written in a
clear but complete style.
- [8] Sterling NQS User's Guide, Sterling Software, 1993
The second of two sources on which the evaluation of Sterling NQS
(above) is based, this user guide contains little more than
clearer descriptions of the utilities available to users.
- [9] Report - Batch Processing Systems In The UK HE Community,
Stuart Herbert, Academic Computing Services, University of
Sheffield, 1994
|