This is www.gnqs.org, The Home Of Batch Processing


Home | Developers | Documents | Downloads | Mailing Lists | People | Support | Volunteer


Progress Report (November 1994)

Academic Computing Services , University of Sheffield

Stuart Herbert (S.Herbert@Sheffield.ac.uk)

Document copyright ©. All rights reserved.


Abstract

As part of the JISC New Technologies Initiative, the University of Sheffield is funded for one year to supply and support batch processing systems to the UK Higher Educational Community.


Contents

Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).


Introduction


About This Progress Report

This report documents the progress achieved in the months of October and November, 1994, on work related to batch processing systems.


Excerpt From The Funding Bid

The following subsection, taken from the successful funding bid, lists the major aims and objectives of the project which were outstanding at the start of October, 1994.


Aims And Objectives

The main objective is to help sites match their users' demand for computer power to the available equipment through the use of distributed batch systems. Currently the systems are available, but little or no independant information about the suitability of products is available. Getting to know the products sufficiently well to understand them without training is a very time consuming and hence expensive task. Specific aims for the year of funding would be as follows:

  • to implement and evaluate commercial and public domain distributed batch systems and in particular NQS and DQS;

  • to provide a report, comparing the systems' utility;

  • to provide support on implementation, configuration and use through the setting up and monitoring of a list on Mailbase;

  • to provide a training course on selected systems at which the systems will be described and information on implementation and configuration will be given;

  • to provide packaged releases for popular systems and in particular Sun Solaris 1 and Sun Solaris 2;

  • to provide simple end user documentation on selected systems (to augment the inevitably terse manual pages).


Activities


Preamble

The first two months of the project have concentrated on laying the foundations on which to build. Much of this is inevitably intangable, such as building connections with people both inside, and outside, Sheffield; but much more can be easily identified.

Below you will find a summary of all the activities which have been undertaken in the last two months. Full versions of the papers mentioned here are available via our World-Wide Web service.


NOTE

Items marked with a (*) are activities which (perhaps) do not fall directly under the initial funding bid, but which we believe are important to the project.

Tom, we would appreciate it if you could indicate whether you consider these activities to be suitable or not.


Batch Processing At Other UK Sites

A questionaire, looking to identify the current practice, and the needs, of other UK sites was sent out on the 14th October, 1994. The replies have been studied, and the following data extracted to date.

  • 36 UK sites have replied to the questionaire. Of these, 28 are being kept informed of the progress of this project (via email).

  • 11 sites currently run NQS-class software. They are relatively happy with NQS, but would like to see work done to add a proper scheduler to improve the service NQS provides.

    6 of these sites use 4D/NQS, by Silicon Graphics. 4D/NQS has not been supported, or updated, in some time, and the binaries available to sites will not work with the latest release of Silicon Graphics' UNIX (IRIX 6.0).

  • 7 sites run some version of DQS. One site is a new installation, two sites are actively looking for a replacement, and one site is actually running the direct ancestor of DQS.

  • Sillicon Graphics' IRIX is the version of UNIX most widely used for batch processing, followed by Sun Microsystems Solaris 2. Together, these two versions of UNIX account for nearly half the existing installations of a batch processing system.

    The other platforms are AIX, Alliant, Convex, Cray, Fujitsu, HPUX, OSF/1, SunOS 4, Ultrix.

  • From a range of features, sites consider the core functionality, user documentation, and ease of configuration to be the most important aspects of a batch processing system. Other features, such as user interfaces for administration purposes, are lowly rated by comparison.

  • Sites which do not perform batch processing cite `NO DEMAND' as the reason why. Local experience has shown that if a service is provided, the demand will materialise.

  • Sites are unwilling to spend significant amounts of money purchasing commercial batch processing systems.

In summary, we believe that other UK sites want some form of NQS software, which is easy to configure, does a good job of scheduling submitted requests, and includes good user documentation.


Evaluation Of Batch Processing Systems

Much of October 1994 was spent looking at a number of available batch processing systems, comparing them against 4D/NQS by Silicon Graphics.

  • Condor (commercially available as IBM Loadleveller) was examined but not evaluated. According to the accompanying documentation, Condor requires applications to have code included to assist it.

  • DQS v3.1.1, from Florida State University, was evaluated. This evaluation was paper-based due to unresolved problems with the software (a patch to at least ensure it compiles on Solaris 2.3 was forwarded to the authors by us).

    DQS has some very good ideas on managing hetrogenus clusters of workstations, but unfortunately the implementation has created a very confusing system. In addition, DQS does not provide support for essential limits, such as CPU time, memory usage, et. al. normally available from the underlying operating system.

    Documentation is poor to the point of useless - major topics such as configuration are conspicuous by their absence.

  • Manchester Computer Centre have put work into extending NQS, but in some respects they have been re-inventing the wheel because their work is based on a (relatively) old version of NQS. What has been written works well - this was the only software which compiled `out-of-the-box' - but this was never intended to be publically released.

  • Monsanto-NQS is the most comprehensibly maintained version of NQS which is freely available. Its feature set is a superset of other versions (most notably including work initially by CERN and Boeing), and its documentation is of a higher standard than other versions. In addition, the code has been well commented.

  • Sterling NQS is the main commercial option open to sites urgently requiring an alternative to 4D/NQS. The evaluation was a paper-based exercise as this site does not run Sterling NQS.

    The main problem with Sterling NQS is that it does not support the special processing features of IRIX - non-degradable priority and processor sets - and unlike freely available versions of NQS, such support cannot be added by ourselves.

As a result of this evaluation, we believe that Monsanto NQS is the best batch processing system which is currently freely available.

We have still to look at a commercial system, Network Queueing Environment, which has been developed by Cray.


System Administrator Support

A major part of the project is to provide support for batch processing systems to other UK sites. The `Project Aims' state that we intended to do this through the Mailbase service.


Mailbase

Due to problems at Mailbase's end, the email support was not available until the 9th November 1994. There are currently three electronic mailing lists :

  • NQS-Announce (moderated)

    This list is used to keep other UK sites informed of the progress of this project. It carries all announcements relevant to this project.

  • NQS-Support (unmoderated)

    This list is intended to be the first place for people to turn to for help about installing, configuring, or just using batch processing systems.

  • NQS-Developers

    This list carries discussion about changes needed to batch processing systems. Currently, all those subscribed to the list have written actual code for batch processing systems, providing a high-calibre, and highly-experienced, resource for consultation on technical issues.

We have sought to encourage interested parties to contact us through the mailing lists (which we monitor daily) rather than to contact us directly. This will hopefully serve to build a user community which can help sustain work in NQS once this project is complete.


World Wide Web (*)

We have turned to the World Wide Web as an excellent means of providing a large set of information about our work to all interested parties.

All completed reports can be accessed via the Web, which provides a better way to get at reports than having to download the files using ftp.

Draft copies of reports are also made available, usually as they are written, providing the user community with the opportunity to comment on every step of the project. The feedback from this process has already been of great help to us.

There is also a searchable database on the Web, which will grow over time, allowing administrators to search by subject of by keyword on topics of interest. The main source of data will be conversations held on the mailing lists.

Finally, the daily Project Diary is also accessable via WWW. An unusual move, but it is intended to promote a sense of open development, allowing anyone to read (and comment on) what's going through my head.

All documentation is updated automatically on a daily basis, and ASCII versions of all documentation is also available.


FTP Site

We have set aside space on ftp.shef.ac.uk where the very latest version of NQS is stored - our `NQS Archive' is mirrored by other sites to ensure that the entire NQS user community benefits from one single release which includes all the optional extras (*).

The contents of our NQS archive are currently :

  • v3.36 of Monsanto-NQS

    This is the version of NQS which we believe to be the most suitable to support.

  • Upgrades to v3.36.4 of Monsanto-NQS

    We have released four upgrades to NQS so far (more details below).

  • ASCII versions of all the documentation available via WWW.

  • Unsupported tools (*)

    A collection of numerous small tools developed to aid the main project work. These are freely available, but we don't have the time to offer help and support for them.


Installation and Configuration via SuperJANET (*)

To help encourage new installations of NQS at sites where resources are not available for the job, we have decided to offer to install, and configure NQS at any UK HE site, via the SuperJANET network.

We have publicised this service amongst all interested parties (via the NQS-Announce mailing list) and also to sites in general via a future article in Engineering Computing News. Further publicity about this service in particular, and this project in general, is required.


Training

To help promote this work, and provide information about the configuration of NQS, we are attending the December Silicon Graphics SIG meeting at Leicester. At the time of writing, we have not determined the content of our presentation material, but we expect to be able to use this material as the basis of a more rigourous training package in the future.


Source Code

After consultation with UK Higher Education sites, and our evaluation of freely available batch processing systems, we believe that Monsanto NQS forms the best basis from which to proceed.

Monsanto Software, based in the USA, have licenced their code under the Free Software Foundation's CopyLeft, which allows anyone to release new versions of the software.

They no longer work on NQS. This project is now responsible for co-ordinating new releases of NQS. (*)

There have been four such releases during October and (mainly) November.

  • v3.36.1

    Fixed support for Solaris 2. Done by Stuart.

  • v3.36.2

    Included support for Linux, a high-quality, and highly-popular version of UNIX for IBM PC's (and clones). Done by Stuart, based on original work done by Dr. Karsten Steffens.

  • v3.36.3

    Fixed some silly errors in the accounting support. Done by Michael Hamilton.

  • v3.36.4

    More fixes to the accounting, plus support for the new IRIX 6.0 release, support for processor sets, and inclusion of ANSI prototyping. Done by John Roman and Dave Safford.

At the moment, major changes to the code have been supplied from other institutions around the world, which we have merged into the main code and tested (where possible).

NQS is over 140,000 lines of code, and time is currently being spent on reading this source code to learn what it does and how, so that we can then perform any necessary bug fixes or other modifications to a high standard in a short period of time.

Time has also been spent looking at how to improve the quality of the source code. NQS is an old piece of software, dating back to 1985, and major modifications, such as adding a scheduler, may prove significantly easier if some restructuring of the NQS source code has been done first (*).

To date, the following improvements are being considered (*):

  • Update the code to be compliant to the POSIX 1 standard.

    POSIX was not around when NQS was first written, and as a result, porting NQS to new versions of UNIX is major work. By ensuring POSIX compliance, future porting (even in years to come) will prove to be significantly easier than it currently is.

  • Sweep the code to improve modularity

    We have been informed by other institutions that the NQS source code is not highly modular (our examination of the source code is not complete at this time). For example, email support is scattered through a number of unreleated areas, rather than being implemented in one place.

    Improved modularity will reduce the size of the source code, and improve consistency of behaviour throughout. It will also reduce the impact of future changes to the source code, by localising such changes.

  • Runtime debugging

    NQS currently has a (very) crude runtime debugging system, which allows the developer to see what is going on. A complete replacement, which has a proper stack-based tracing mechanism, can be quickly used (especially at remote sites) to at least locate the point of failure - a single-step debugger can then be applied to the locality to provide an exact diagnosis.

    In addition, one of the major problems encountered during the evaluation of freely available source code was that programs always failed without an error message. Even where the program did generate error messages, these messages were never related to the failure itself. A complete debugging system would do away with this problem.

  • Network protocols

    NQS is not secure, although we have yet to determine to what extent the current protocols can be abused. This area requires looking at.

    The NQS protocols also rely on the fixed size of integers and longs under C. As the size of these elements is not defined in any international standard, then future architectures may prove unable to run NQS. This area requires further examination.


Summary


Aims And Objectives


Goals Achieved

The following goals from the Funding Bid have been satisfied.

  • to provide support on implementation, configuration and use through the setting up and monitoring of a list on Mailbase;

In addition, the following goals which did not feature in the Funding Bid have been satisfied.

  • consultation of other UK Higher Education sites to determine their needs to help ensure that we meet those needs;


Goals Worked On

The following goals from the Funding Bid have been worked on.

  • to implement and evaluate commercial and public domain distributed batch systems and in particular NQS and DQS;

  • to provide a report, comparing the systems' utility;

  • to provide a training course on selected systems at which the systems will be described and information on implementation and configuration will be given;

  • to provide packaged releases for popular systems and in particular Sun Solaris 1 and Sun Solaris 2;


Goals Not Worked On

  • to provide simple end user documentation on selected systems (to augment the inevitably terse manual pages).


Expected Work - December and January 1994/1995

We believe that the following work should be undertaken during the next two months of the project (*):

  • Complete the examination of the NQS source code.

  • Make all necessary structural changes to the source code.

  • Bring the source code up to POSIX compliance.

In addition, these existing committments should also continue to be honoured.

  • Monitor traffic on the Mailbase mailing lists, and provide whatever information/assistance is required.

  • Oversee bug fix releases of Monsanto NQS (*).



This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.