This is www.gnqs.org, The Home Of Batch Processing


Home | Developers | Documents | Downloads | Mailing Lists | People | Support | Volunteer


Systems Analysis - Batch Processing Systems

Academic Computing Services , The University Of Sheffield

Stuart Herbert (S.Herbert@sheffield.ac.uk)

Document copyright ©. All rights reserved.


Abstract

This document is a report to evaluate and contrast freely available batch processing systems. 4D/NQS, a commercial system from Sillicon Graphics [3], is used in passing for comparison purposes.


Contents

Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).


Introduction


Introduction

JISC, as part of its New Technology Initiative, has funded a one year post to supply and support batch processing systems for UNIX for the UK HE community.

As part of this process, we have located, and examined, existing freeware products, seeking to determine which, if any, we should support in the future.

There are two principle systems :

  • NQS, derived from MDQS, has been sold commercially by Sterling Software, and several derived systems are freely available via the Internet, using ftp(1).

  • DQS, written apparently from scratch at the Floria State University, is the main alternative available via the Internet, using ftp(1). DNQS is an older product from the same author.

Both types of system provide essentially the same ability - to make use of idle workstations by running jobs deferred by users. Both present the same mechanism for doing so - the queue, into which requests are placed, and from which they are taken and executed at some later time.


The Software


Condor

Condor is a batch system developed from scratch at WISC in the States. As most UK HE sites which expressed an opinion [9] stated that they use some form of NQS, and none mentioned using Condor, it was decided not to evaluate this software.

However, we do intend to track its development with a view to incorporating any desirable features into our own development.

If anyone else is interested in looking at Condor, the contact address appears to be

> mike@cs.wisc.edu
In addition, I am informed that Condor has been adopted as IBM Loadleveler for IBM systems.


DQS v3.1.1

Please note that DQS is described as a BETA quality system, and that evaluation was a paper-based exercise due to unresolved problems with the software.


Details

  • Author :

    Tom Green, Florida State University (dqs@scri.fsu.edu)

  • Location :

    ftp://ftp.scri.fsu.edu/pub/dqs/DQS_3_1_1.tar.gz

  • Uploaded :

    July 7th, 1994

  • License :

    Freely distributable. Any derived products must state that they are derived from DQS, and a license fee is payable for commercial distribution.

  • Ports Used :

    607, 608

  • Source Size :

    211,179 lines

  • Copyright : Florida State University


Installation

Installation and configuration was not achieved.

  • The documentation is supplied as Postscript files. For sites without Postscript printing facilities, install GNU's Ghostscript and Ghostview utilities. All documentation also has TeX source supplied.

  • The `make config' target alters /etc/services automatically if the invoker is the superuser. Invoke `make config' as a normal user, and then update NIS+ by hand.

  • The `make config' target asks about options which are not documented in the installation instructions.

  • The supplied Imakefiles have illegal comment statements in them (old-style #'s). These need to be changed to XCOMM or to C-style comments.

  • The X Windows programs include a source-distribution of Xaw3d, a 3D replacement for the Athena widget set. The header file `At.h' attempts to override the prototype for strcasecmp() and strncasecmp(). This problem also occurs in various source files.

  • The installation document includes no help or information on configuring DQS - one has to refer to the man pages for that, making configuration a trial-and-error process. (The documentation states that configuration is covered in depth; this is definitely not the case.)

  • Contrary to what the documentation says, one must configure the queues before qmaster will run. Unfortunately, configuration apparently cannot be achieved without a running qmaster daemon.

At this point, it was decided that further effort with DQS was not appropriate.

(Addendum: we later learned, thanks to help from the authors, what our problem with DQS was. The qmaster daemon was silently failing because it could not locate the two sockets it wanted by name - the names given in the documentation are incorrect.)


Comparison With 4D/NQS

  • DQS has a single point of failure - the qmaster daemon. NQS has no such problem.

  • DQS allows information about available resources (such as PVM) to be attached to queues, allowing for very detailed configuration.

  • The DQS source includes runtime sanity checks and an outline traceback facility which could be modified to provide even better traceback.

  • DQS does not seem to support any of the OS-based limits such as CPU usage et al. At least, there is no mention whatsoever in the documentation about support for these limits.


Future Work

  • POSIXfy the source code, where it is not already compliant. This will simplify maintenance in the future.

  • Write a detailed, and complete, installation guide, perhaps suppling a better means of configuration.

  • Look into minimising, or removing, the single qmaster daemon as a potential point of failure.

  • Enhance the included debugging mechanisms, to provide a nested traceback facility.

  • Update the error reporting so that messages are not misleading.

In addition, the following points should be considered.

  • Introduce Extended Hungarian Notation, to improve readability and therefore maintainability.


Summary

DQS is an actively maintained product, which provides a highly configurable environment in which to work. The design of DQS includes a major weakness - given the unreliability of certain versions of UNIX, this must be a cause for concern.

If DQS is not selected as the most suitable product, the implementation of some of its features into the selected product must be strongly considered.


MCC NQS


Details

  • Author :

    Phil Stringer, Manchester Computing Centre (P.Stringer@mcc.ac.uk)

  • Location :

    ftp://vpx.mcc.ac.uk/pub/mcc.solaris.gz

  • Uploaded :

    10th November 1994

  • License :

    MCC NQS is derived from CERN NQS 2.2 and so falls under CERN's license (see above).

  • Port Used :

    607

  • Source Size :

    179, 824 lines

  • Copyright :

    Copyright holder is CERN - MCC extensions are copyright MCC.


Installation

The code installed without undue problems on Solaris 2.


Comparison With 4D/NQS

  • 4D/NQS does not appear to have any equivalent to the nmapmgr(1) program of MCC NQS.

  • MCC NQS does not include 4D/NQS's multi-processor support, or the Non-Degradable Priority support.

  • 4D/NQS does not support the passing of the group id (gid) - support for this can be found in MCC NQS.

  • MCC NQS otherwise appears to be comparable to the functionality of 4D/NQS.


Future Work

  • The various options added to MCC NQS are currently done using external perl scripts; they need to be integrated into the standard utilities (this does not affect the functionality).

  • Documentation is restricted to manual pages, plus an installation guide. Tutorial documentation for end users would be beneficial.

  • The code requires restructuring to improve portability.

  • Multi-processor support, for such as IRIX, needs adding.

  • IRIX's Non Degradable Priority needs supporting.


Summary

MCC NQS is not a suitable product to base future development on.

  • MCC NQS is not a product intended for use at other sites - work is required to bring MCC NQS up to the level of the other distributions before new features can be added.

  • Overall, MCC NQS does not compare favourably to Monsanto NQS on the feature set. While the missing features may not be of great importance to most sites, this is still an issue.


Monsanto NQS v3.36


Details

  • Uploaded :

    6th September 1994

  • License :

    GNU General Public License, v1 [2]

  • Port Used :

    607

  • Source Size :

    143,032 lines

  • Copyright : Copyright holder is the author.


Installation

Installation and configuration was achieved with only three difficulties.

  • The code failed to compile, apparently due to a compiler error. This was easily solved by splitting the problematic line into two lines on each occaison.

  • Due to an error in the Makefile, the daemons did not correctly link with the NQS library, and instead linked with Solaris' UCB library. This took time to locate, with help from the author.

  • The INSTALL documentation is unclear on installing NQS on a cluster of machines.

With these delt with, Monsanto NQS appeared to work as documented.


Comparison With 4D/NQS

  • Both systems introduce the daemons `netdaemon', `logdaemon', and `nqsdaemon'.

  • 4D/NQS does not appear to have any equivalent to the nmapmgr(1) utility of Monsanto.

  • Monsanto includes the Non-Degradable Priority support of 4D/NQS, but does not support 4D/NQS's ability to bind queues to processors.

  • Otherwise, on paper, Monsanto NQS is a complete superset of 4D/NQS, supporting all the utilities (and their switches) of 4D/NQS.


Future Work

Aside from feature lists arising from the UK HE community, the following work is recommended should Monsanto be adopted as the supported system.

  • POSIXfy the source code where possible. This will require a large effort, but will make porting to future platforms an easier task.

  • Restructure the code, into smaller, discrete units. This will enable any future maintainers to find their way around easier than at present, and should ease maintenance.

  • Add multi-processor support for all operating systems which provide a suitable API, such as IRIX.

  • Implement a new error-reporting mechanism, which details the origin of any reported errors and warnings, as well as their cause. Tracking errors down is much simplified if the software reports the file, and line number, which triggered the error.

  • Compile-time configuration and installation can be greatly improved - better documentation, and a configure script will ease the process significantly.

In addition, the following points should be considered.

  • Introduce Extended Hungarian Notation throughout. Coupled with the restructuring of the code mentioned above, this will greatly enhance the readability (and therefore the understandability) of the code. The amount of time this will save future maintainers cannot be overstated.

  • Requests should be represented using text based messages rather than the current binary mechanisms. This will make cross-platform interoperability easier to maintain, and simplify support for future, 64-bit and beyond platforms.

  • Kernel-based limits can be eumlated in software for systems which don't support them in the kernel. This emulation should be added to make all platforms nominally equivalent.


Summary

Monsanto NQS is a product which works. It provides a superset of the features found in CERN NQS, and in 4D/NQS (almost). Future work needs to concentrate on improving the portability, and maintainability of the code, plus implementation of any wish-lists from the UK HE community.


Sterling NQS v2.3

NOTE that this evaluation of Sterling NQS [7], [8] is purely a paper-based excercise, as our site does not run Sterling NQS.


Details

  • Author :

    Sterling Software.

  • License :

    Appear to be on a per-processor basis.

  • Platforms (NQS):

    AIX/6000, AIX/ESA, AIX/370, HP 9000: 700 or 800 series with HPUX, SunOS, Solaris, IRIX, Amdahl systems with UTS.

  • Platforms (NQS/Exec):

    AIX/6000, HPUX on HP 9000: 700 and 800 series, IRIX, SunOS, Solaris

  • Port Used :

    1701 recommended, 607 if communicating with Cray NQS. In addition, NQS/Exec uses 1702, 1703.

  • RPC (NQS/Exec only):

    100078


Installation

According to the manual [7], installation is a matter of extracting the NQS file from the installation tape, and then expanding that into the NQS spool directory. From there, installation appears comparable to Monsanto NQS.


Comparison With 4D/NQS

  • Sterling NQS has no support for the Non-Degradable Priority or binding queues to processors.

  • 4D/NQS does not appear to have any equivalent to the nmapmgr(1) utility of Monsanto.

  • Otherwise, Sterling NQS on paper claims to be comparable to 4D/NQS.


Summary

Sterling NQS is a commercial product. Its inclusion of a MOTIF-based front-end is a welcome new feature, but otherwise there is nothing here which goes further than Monsanto NQS.

When I attempted to contact Sterling over pricing and licensing, I was informed that the request had been passed on to their UK office, and nothing more was heard. In light of this, I personally could not recommend Sterling NQS to sites where batch processing was mission-critical.


Feature Comparison


Table Of Features

The table below is a comprehensive list of features for DQS, Monsanto NQS and Sterling NQS. Please note that the feature lists for DQS and Sterling NQS come from the documentation alone, and are therefore not necessarily accurate or complete.

> ------------------------------------------------------------------
> Feature				| DQS | Mon | Ste | MCC |
> ------------------------------------------------------------------
>					|     |	    |	  |     |
> Aborting Running Jobs			|     |  x  |  x  |  x  |
> AFS Support				|  x  |     |     |     |
> Broadcast message when job begins	|     |  x  |     |  1  |
> Broadcast message when job ends	|     |  x  |     |  2  |
> Cells					|  x  |     |     |     |
> Charging usage to accounts		|  x  |     |     |     |
> DFS Support				|  x  |     |     |     |
> Default maximum priorities on a per	|     |     |     |  x  |
>   user basis				|     |     |     |     |
> Default maximum priorities on a per	|     |     |     |     |
>   group basis				|     |     |     |     |
> Deletion of pending requests		|  x  |  x  |  x  |  x  |
> Deletion of the submitted script	|     |  x  |     |     |
> Device support			|     |  x  |  x  |  x  |
> Display spooled output		|  x  |  x  |  x  |  x  |
> (on remote machines ?)		|     |     |     |  x  |
> Don't echo job_id to stdout		|  x  |     |     |     |
> Enable/disable queues			|  x  |  x  |  x  |  x  |
> Epilogue scripts			|     |     |     |  x  |
> Execute request after a given time    |  x  |  x  |  x  |  x  |
>   and/or date				|     |     |     |     |
> Export all environment variables	|  x  |  x  |  x  |  x  |
> Extended display for executing	|     |     |     |  x  |
>   jobs				|     |     |     |     |
> Forms (for printing support)		|     |  x  |  x  |  x  |
> Gid support				|     |     |     |     |
> Leave stderr on the machine where	|     |  x  |  x  |  x  |
>   the job executed			|     |	    |     |     |
> Leave stdout on the machine where	|     |	 x  |  x  |  x  |
>   the job executed			|     |	    |     |     |
> Limit access to users/groups		|  x  |  x  |  x  |  x  |
> Limit CPU time			|     |  x  |  x  |  x  |
> Limit corefile size			|     |  x  |  x  |  x  |
> Limit data segment size		|     |  x  |  x  |  x  |
> Limit total memory usage		|     |  x  |  x  |  x  |
> Limit permanent file size		|     |  x  |  x  |  x  |
> Limit temporary file size		|     |  x  |  x  |  x  |
> Limit stack size			|     |  x  |  x  |  x  |
> Limit working set size		|     |  x  |  x  |  x  |
> Limit number of print copies		|     |  x  |  x  |  x  |
> Limit size of a print file		|     |  x  |     |     |
> Limit the number of jobs executed	|     |  x  |  x  |  x  |
>  simultaneously per queue		|     |     |     |     |
> Limit the number of jobs executed	|     |  x  |  x  |  x  |
>  simultaneously per user		|     |     |     |     |
> Load balancing support		|  x  |  x  |  x  |  x  |
> Lock the daemon into memory		|     |  x  |  x  |  x  |
> Machine database			|  x  |  x  |  x  |  x  |
> Mailing of spooled output on		|     |  x  |     |     |
>   completion				|     |     |     |     |
> Mapping of users and groups from	|     |  x  |  x  |  x  |
>   machine to machine			|     |     |     |     |
>   (by both name and uid together)	|     |     |     |  x  |
> Mark job as re-runable		|  x  |  x  |  x  |     |
> Mark job as not re-runable		|  x  |  x  |  x  |     |
> Manager status for users		|  x  |  x  |  x  |  x  |
> Message appending to error and	|     |  x  |     |     |
>   output log files			|     |     |     |     |
> Modify pending requests		|  x  |  x  |  x  |  x  |
> Move requests from queue to queue	|     |  x  |  x  |  x  |
> MPI support				|  x  |     |     |     |
> Multiple machine support		|  x  |  x  |  x  |  x  |
> Name a request			|  x  |  x  |  x  |     |
> Nice levels for requests		|     |  x  |  x  |  x  |
> Non-degrading priority support	|     |  x  |     |     |
> Operator status for users		|  x  |  x  |  x  |  x  |
> P4 support				|  x  |     |     |     |
> Place/remove holds on jobs		|  x  |  x  |     |  x  |
> Print request to stdout only		|  x  |     |     |     |
> Printing via device queues		|     |  x  |  x  |  x  |
> Process accounting support		|     |  x  |     |     |
> Prologue scripts			|     |     |     |  x  |
> Queue complexes			|     |  x  |  x  |  x  |
> Resource Information			|  x  |     |     |     |
> Run several versions simultaneously	|     |     |     |  x  |
> Running log file			|  x  |  x  |  x  |  x  |
> Runtime debugging info		|  x  |  x  |  x  |  x  |
> Satisfy resource requests		|  x  |     |     |     |
> Send extra signals to jobs		|  x  |	    |     |     |
> Send mail to specified user		|  x  |  x  |  x  |  x  |
> Send mail when request starts		|  x  |  x  |  x  |  x  |
> Send mail when request ends		|  x  |  x  |  x  |  x  |
> Send mail when request is suspended	|  x  |	    |     |     |
> Send mail when request is aborted	|  x  |	    |     |     |
> Send mail when request is restarted	|     |  x  |     |     |
> Send mail when request is transfered	|     |  x  |     |     |
>   to destination machine		|     |	    |     |     |
> Sort queues according to request      |     |     |     |  x  |
>   priority				|     |     |     |     |
> Specify current working directory	|  x  |	    |     |     |
>   for a request			|     |	    |     |     |
> Specify a list of queues to submit to |  x  |	    |     |     |
> Specify environment variables to use	|  x  |	    |     |     |
> Specify path to spool stderr to	|  x  |	 x  |  x  |  x  |
> Specify path to spool stdout to	|  x  |	 x  |  x  |  x  |
> Specify reauthentication interval	|  x  |	    |     |     |
> Specify request priority		|  x  |  x  |  x  |  x  |
> Specify the shell to use		|  x  |  x  |  x  |  x  |
> Spool stderr & stdout as one output	|  x  |  x  |  x  |  x  |
> Spooled device support		|     |  x  |  x  |  x  |
> Staged upgrades			|     |	 x  |     |     |
> Suspend/resume pending requests	|  x  |  x  |  x  |  x  |
> Status information on request		|  x  |  x  |  x  |  x  |
> X Windows user interface		|  x  |	    |  x  |     |
> ------------------------------------------------------------------


AFS/DFS

The Andrew Filing System (AFS) is sold by Transarc, having been developed at CMU in the States. DFS is a further development of AFS, to be used by the COSE initiative.


Cells

As I understand it from DQS's documentation, a cell is a collection of hetrogenus (not necessarily clustered) machines. By placing a new workstation (for example) in a cell, it becomes available to DQS to distribute requests to.


Print Request To Stdout Only

DQS will print the request to stdout, and will not actually add the request to any queue.


Resource Information / Satisfy Resource Requests

Under DQS, you can attach information (called complexes) to cells/queues, which state whether software such as PVM is available on the machine served by that particular queue.

This then allows the user, who is submitting the request, to specify a set of required resources, and to demand that DQS find a queue/cell/whatever which has all those resources available.


Specify A List Of Queues To Submit To

Under DQS, the user can give a list of queues, and DQS works from left to right to find a suitable queue to put the request into.


Send Extra Signals To Jobs

DQS can send the signals SIGUSR1 and SIGUSR2 before SIGSTOP and SIGKILL respectively.


Specify Reauthentication Interval

DQS does not make it clear as to where authentication information comes from.


Platforms

The table below lists the platforms on which the software claims to be available.

> ------------------------------------------------------------------
> Platform				| DQS | Mon-NQS | Sterl-NQS
> ------------------------------------------------------------------
>					|     |		|	   |
> AIX 3.2				|  Y  |		|	   |
> AIX/370				|     |         |     Y    |
> AIX/6000				|     |		|     Y    |
> AIX/ESA				|     |		|     Y    |
> HPUX					|     |    Y	|     Y    |
> IRIX 4				|  Y  |    Y	|     Y    |
> IRIX 5				|  Y  |    Y	|     Y    |
> Linux					|  Y  |    Y	|          |
> NeXT 3.2				|  Y  |		|          |
> OSF/1					|  Y  |    Y	|          |
> Solaris 2				|  Y  |    Y	|     Y    |
> SunOS 4				|  Y  |    Y	|     Y    |
> ULTRIX				|  Y  |    Y	|          |
> UTS					|     |		|     Y    |
> ------------------------------------------------------------------
DQS and Monsanto NQS required modification to successfully compile and run on Solaris 2.3.

Monsanto NQS's support for Linux is a third-party patch which I have integrated into the main source tree.


Conclusions


Conclusion

On evaluation, Monsanto-NQS is considered to be the most suitable system on which to base future development.

  • It is a superset of all other freely-available versions of NQS, and appears more complete than Sterling NQS is reported to be.

  • While the design is not perfect, unlike DQS the design does not introduce a major weakness, and is therefore easier to build on.

  • The Monsanto sources can be freely distributed in modified form. With DQS, possible ventures into commercial exploitation will first require negotiation with Florida State University.

  • Monsanto NQS is a simpler system to DQS, which can only make it easier to build new features on top, and so allow more to be achieved in the time available.


References


References

  • [1] The Network Queueing System, Brent A. Kingsbury, Sterling Software - preliminary draft - (29/04/92).

    Included with Monsanto NQS, this paper documents the original NQS system, and will prove a useful tool for identifying the divergences introduced by other distributions.

  • [2] The GNU General Public License (Version 1), Free Software Foundation, 1989

    The license agreement for Monsanto NQS is designed to ensure that all users of the software are granted specific rights, which cannot be removed from them.

    Note that you are free to relicense any GPL'd software under any later version of the GPL, at your choice.

  • [3] 4D/NQS, Silicon Graphics Inc.

    Until 1994, Silicon Graphics Inc. provided their own version of NQS, derived from the freely available source. Recently, they withdrew support for the product, offering discounts on licenses of Sterling NQS for the existing customer base.

  • [4] Implementation of ISO/IEC 9899 : 1990; Programming Languages - C, ISBN 0-580-19572-4.

  • [5] Announcing CERN-NQS-2.4, Christian Boissat, CERN, 30 August 1993.

  • [6] Using NQS, the Unix Batch System, Manchester Computing Centre, 1st Edition, 20th August 1993.

    Supposedly available on-line via gopher, but attempts to use this to obtain a later copy were unsuccessful - Manchester's gopher server was unable to locate the machine which hosts any such document.

  • [7] Sterling NQS(tm) and Sterling NQS/Exec(tm) System Administration Guide, Sterling Software, no date.

    One of two sources on which the evaluation of Sterling NQS (above) is based, this manual is well organised, and written in a clear but complete style.

  • [8] Sterling NQS User's Guide, Sterling Software, 1993

    The second of two sources on which the evaluation of Sterling NQS (above) is based, this user guide contains little more than clearer descriptions of the utilities available to users.

  • [9] Report - Batch Processing Systems In The UK HE Community, Stuart Herbert, Academic Computing Services, University of Sheffield, 1994



This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.