This is www.gnqs.org, The Home Of Batch Processing


Home | Developers | Documents | Downloads | Mailing Lists | People | Support | Volunteer


Changes To Generic NQS v3.50.0

June 1996

Stuart Herbert (S.Herbert@sheffield.ac.uk)

Document copyright ©. All rights reserved.


Abstract

This document contains a summary of the changes made to produce Generic NQS v3.50.0.


Contents

Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).


Introduction


Introduction

This is a summary of changes to Generic NQS 3.50.0.

We are most grateful for the contributions made by other individuals and organisations.


About This Release


Purpose

Generic NQS 3.50.0 features some major internal changes, which are aimed at further addressing problems in the following key areas.

  • Portability
  • Robustness
  • Ease of installation

Generic NQS 3.50.0 is the `reference' release for the purposes of our funded work; the `Official Manual Set' will be based upon this release.

We recommend that all sites running Monsanto NQS, or Generic NQS v3.4x, should upgrade to this version. There will be no support for any previous versions of Monsanto-NQS or Generic NQS available from us.


Compatibility

This release features a number of changes which affect upgrading existing installations of Generic NQS. If you are installing Generic NQS for the first time, you can skip this section.

  • All message logging (and debugging output) now goes via the syslog mechanism. Generic NQS currently uses `local0' as the facility it logs to (this is configurable at compile-time). The old NQS logdaemon (and its logfile) are no longer used or supported.

    System administrators will have to ensure that their syslog daemon does not discard messages which come from the `local0' facility.

  • Pipe queues and queue complexes have been modified so that a pipe queue can be a member of a queue complex. Sites which use either pipe queues or queue complexes will have to remove their existing NQS installations, and re-install from scratch.

    This is an unfortunate side-effect of the way NQS stores its own information. Hopefully, this will be the last file format change for a very long time.

  • Support for resource limits in terms of words, kilo-words, mega-words, and giga-words, has been removed.

    The size of a `word' varies so much from host to host that a meaningful comparision is not possible.

  • The number of inodes used to hold transaction state information has been doubled.

    The side effect of this change is that any requests which have been queued will be corrupted when you install this version. Our advise is to use pipe queues to syphon queued requests to another NQS node; shutdown NQS, upgrade to this release, then use pipe queues to move the requests back onto this machine (courtesy of Thomas Eifert).

My apologies about these incompatibilities; they are the result of important changes to Generic NQS, and could not be avoided.


Project Conclusion

With the release of Generic NQS 3.50.0, the work under JISC grant NTI/48.2 has come to an end. This does not mean that this will be the last release of Generic NQS.

The University of Sheffield has agreed to continue to host the Generic NQS World-Wide Web site, and FTP site, at least until the end of June, 1997. In addition, they will continue to maintain the existing Mailbase mailing lists.

Stuart Herbert is leaving the University for a job in industry. However, thanks to the loan of equipment from the University, he will continue to act as world-wide maintainer of Generic NQS and its related technologies. This will be done in his spare time, and it is plainly obvious that he won't have anywhere near as much time to spend on Generic NQS as he did when he was employed full-time to support it, so please bear with us for a month or two until things settle into a new routine.


Administrators: New Features


Installation Now Done By SETUP

Installation of Generic NQS is now a matter of running the new `SETUP' script. This shell script will guide the administrator through configuring, compiling, and installing the Generic NQS software; the administrator should not need to edit any Makefiles any more.

SETUP includes a number of automatic tests, to determine a number of compile-time constants which previously were supplied by Makefiles. One of these tests is to actually determine what type of computer you are trying to install Generic NQS on.

This software is undergoing testing.

All platforms are affected.

This code was contributed by Stu.


Cluster-Wide Dynamic Scheduling Added

Generic NQS v3.50.0 can now perform dynamic scheduling on the pipe queues which are used to perform load-balancing across a cluster.

To use dynamic scheduling, add all of the pipe queues on your scheduling host to a queue complex, and set the user_limit for that queue complex to something suitable. Users who submit more jobs than the complex's user_limit will find their excess jobs being deferred in preference to jobs from other users, as is the case with dynamic scheduling across batch queues.

This code has not been tested.

All platforms are affected.

This code was contributed by Stu, and is based on Dave Safford's dynamic scheduling for batch queues.


Processor Set Support For Digital UNIX Added

If this feature is enabled, Generic NQS will place all processes for any batch queue `x' into the processor set of the same name. As Digital UNIX uses integers for processor set names, to make use of this feature you will have to use numeric names for your batch queues.

These changes have not been tested.

Platforms affected :

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [ ] HPUX 8			[ ] HPUX 9
> [ ] HPUX 10			[ ] IRIX 5
> [ ] IRIX 6			[ ] LINUX
> [ ] NCR			[x] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
Code contributed by Stuart Herbert (S.Herbert@sheffield.ac.uk)


Prologue/Epilogue Scripts Feature Added

If this feature is enabled, Generic NQS will attempt to execute `NQS_LIBEXE/nqs.prologue' directly before running a batch request, and `NQS_LIBEXE/nqs.epilogue' directly after running a batch request. Both programs run as root user, and are passed the name of the current queue as their only arguement.

One possible use of this facility is support for restricted processors on IRIX; the prologue script could ensure that, for a given queue, a given processor has been restricted (using mpadmin), and the epilogue script could unrestrict the processor once the request has run to completion.

This code has not been tested.

All platforms affected.

Code contributed by Stu.


Support For Dynix Added

Generic NQS should now compile, and operate, out of the box on Dynix/Ptx v4.1.3, running on Sequent 5000 hardware.

This change has been tested.


Support For HP-UX v10 Updated

Changes have been made to support Generic NQS on platforms running the HP-UX 10 operating system.

These changes have not been tested.

Platforms affected :

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [ ] HPUX 8			[ ] HPUX 9
> [x] HPUX 10			[ ] IRIX 5
> [ ] IRIX 6			[ ] LINUX
> [ ] NCR			[ ] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
Code contributed by Holger Busse (busse@chemie.fu-berlin.de)


Administrators: Changes To Existing Features


Debugging/Message Logging Replaced

The old `logdaemon' used for logging messages has been removed. Generic NQS now logs all messages through the syslog system daemon, using the `local0' facility.

I recommend that you update your syslog.conf file so that all messages from `local0' are logged to a single file. I have the following entry in my syslog.conf file (on Linux) :

>  local0.*		/var/log/nqs
This code has been tested.

All platforms are affected.

This code was contributed by Stu.


Debugging Levels Rationalised

While I was removing the logdaemon, I sorted out the messages that Generic NQS produces at each debugging level.

  • Level 0: Fatal errors only.
  • Level 1: Level 0 + information messages.
  • Level 2: Level 1 + high-priority debugging messages.
  • Level 3: Level 2 + medium-priority debugging messages.
  • Level 4: Level 3 + low-priority debugging messages.
  • Level 5: Level 4 + temporary debugging (trace) messages.

Levels 6-10 are reserved for future use. If you set the debugging level above `5', the behaviour is `unspecified' (ie it'll most likely crash rather horribly), so please don't do it.

Upon installation, Generic NQS defaults to level 1. You can change the debugging level using the `qmgr set debug' command.

This code has been tested.

All platforms are affected.

This code was contributed by Stu.


More Machine ID's Supported

Nmapmgr has been adopted to work with up to 1,000 machine IDs in the machine ID database.

This code has not been tested.

Code contributed by Stu, thanks to Mark Loveridge.


Pipe Queues Can Now Be Part Of Queue Complexes

Pipe queues can now be members of queue complexes. This change means that any existing installation of Generic NQS which uses pipe queues will have to be removed, because of file format incompatibilities.

This code has not been tested.

All platforms are affected.

This code was contributed by Stu.


Support For Word-Size Limits Removed

Resource limits could previous be specified in terms of words, kilo-words, mega-words and giga-words. Support for these units has been removed.

This code has not been tested; requests which contain one of these units may cause NQS to crash. Further work is anticipated before this change is complete.

All platforms are affected.

This code was contributed by Stu.


Staging Support For Pre-Releases

Generic NQS now understands the difference between a pre-release, and a final release, and can upgrade from 3.50.0.1 (v3.50.0, pre-release #1) to 3.50.0 (full release of v3.50.0).

This code has not been tested.

All platforms are affected.

This code was contributed by Stu.


Administrators: Fixes For Problems


Compilation Fix On HP-UX v8

Generic NQS 3.40.2 did not compile on HP-UX v8; compilation failed in shoqbydesc.c. Fixed.

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [x] HPUX 8			[ ] HPUX 9
> [ ] HPUX 10			[ ] IRIX 5
> [ ] IRIX 6			[ ] LINUX
> [ ] NCR			[ ] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
This code has been tested.

Contributed by Michael Andrews.


Compilation Problems On HP-UX v9

Generic NQS v3.40.2 did not compile on HP-UX v9; compilation failed during `netdaemon.c'. This has been fixed.

Platforms affected :

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [x] HPUX 8			[x] HPUX 9
> [ ] HPUX 10			[ ] IRIX 5
> [ ] IRIX 6			[ ] LINUX
> [ ] NCR			[ ] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
Code contributed by Stu.


Compilation Problems On IRIX

Generic NQS v3.40.2 did not compile on IRIX 6; compilation failed during qmgr because of a remaining reference to some SGI-specific memory debugging code. This has been fixed.

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [ ] HPUX 8			[ ] HPUX 9
> [ ] HPUX 10			[x] IRIX 5
> [x] IRIX 6			[ ] LINUX
> [ ] NCR			[ ] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
Code contributed by Stu.


Jobs Stuck In The Arriving State

I've had a number of reports of cases where a job has been moved from one machine to another; the job is deleted from the original machine, and gets stuck in the arriving state on the remote host. This has happened on a wide range of platforms.

Code in the machine id database library has been replaced in order to attempt to resolve this problem.

This code has not been tested.

All platforms are affected.

This code has been contributed by Stu, based on a contribution by Mark Whidby (M.Whidby@mcc.ac.uk).


Incorrect Resource Limits

All versions of Generic NQS from v3.40 onwards did not correctly support the limits which an administrator can use to prevent their users from running completely amok on a host. The main symptom was that file and memory resource limits were reported correctly by Generic NQS, but were significantly smaller when actual NQS jobs were executing.

The implementation of resource limits has been completely re-written from scratch; the new implementation should cure the problem. UNICOS users will, however, have to wait for the next pre-release before their resource limits are correctly supported.

This code has been partially tested.

All platforms are affected.

This code was contributed by Stu; the UNICOS support is based on the previous support written by Dave Safford.


Device Queues And Queue Complexes

Previous versions of Generic NQS (also Monsanto NQS and CERN NQS, and presumably anything derived from the original COSMIC NQS source code ...) did not correctly handle situations where device queues were members of one or more queue complexes, and either a device queue or such a queue complex were deleted. The result of deleting such a device queue, or such a queue complex, were `unspecified'; most likely, the NQS software would crash intermittedly.

The implementation of device queues and queue complexes has been completed, so that all operations on device queues and queue complexes are safe.

This code has not been tested.

All platforms are affected.

This code has been contributed by Stu.


Memory Leak On HP-UX

Code in libnqs/queues/all-systems/shoqbydesc.c for HP-UX did not release memory it malloc()ed. This has been fixed.

Platforms affected :

> [ ] AIX 3			[ ] AIX 4
> [ ] DYNIX/PTX			[ ] FUJITSU
> [x] HPUX 8			[x] HPUX 9
> [x] HPUX 10			[ ] IRIX 5
> [ ] IRIX 6			[ ] LINUX
> [ ] NCR			[ ] OSF/1
> [ ] SOLARIS 2			[ ] SUNOS 4
> [ ] ULTRIX			[ ] UNICOS
This code has not been tested.

Code contributed by Stu, based on points raised by Michael Andrews.


Incorrect TMPDIR Support

The code which created the TMPDIR temporary working directory neglected to set the permissions so that users could actually write into the directory. Fixed.

All platforms are affected.

Contributed by Stu.


Updated Transaction Handling Code

Previous versions of Generic NQS (and before it Monsanto-NQS and COSMIC NQS) used inodes in order to store state information in non-volitile memory. Unfortunately, the original implementation assumed that a 32-bit value would fit into an inode's mtime field; this is not the case on a number of platforms.

Our original intention was to remove the use of inodes completely; however, this approach requires more time than we can afford to spare at this stage of the project.

Instead, the transaction handling code has been updated to use twice as many inodes, and to store 16-bit values into each inode. This allows the existing transaction mechanisms to work unmodified on all known UNIX platforms, and still leaves open the option of completely replacing this part of NQS in the future.

This code has been tested.

All platforms are affected.

Contributed by Stu.


List Of Changes For Developers


New Source Tree Layout

All of the source code for Generic NQS 3.50.0 can now be found in the `Source-Tree' sub-directory. I have broken Generic NQS up into its individual components, one directory per component. The new source tree layout allows for platform-specific code to be moved into separate sub-directories, although this has not yet been done.

A full description of the source tree layout, and its management through the SETUP software, will be included in the Developer's Manual I'll shortly be working on; in the meantime, if there is anything you want to know, drop me an email.


Library Re-organisations

I've made some changes to the libraries built as part of Generic NQS :

  • The old npsn_compat stuff has been moved into its own libnmap space. Header files are now included from &ltlibnmap/>. The nmapmgr program itself has been moved into it's own directory.

  • The old lib stuff has now been moved into its own libnqs space. Header files are now included from &ltlibnqs/>. You MUST include &ltlibnqs/license.h> before including any other header files.

    I've also broken down the library into groups related by common functionality. The `unwelcome' group contains functions which require radical modification or removal.

  • The filename handling stuff (originally contributed by Boeing) has been moved into a separate libnqs_a library. This code was used by both libnmap and libnqs, and needs to go into a separate library to allow things to link correctly.

  • A new library has been added - libsal, the `System Abstraction Layer'. Future portability work for GNQS will be to hide ALL platform-specific code in this library; the idea is that porting GNQS will become a task of porting libsal to new architectures.

There is plenty more to do with the libraries. For example, the prototypes for libnqs and all of the binaries are stored in libnqs; they should be broken up so that the prototypes are near the functions they relate to (see libsal for what I'd like to achieve).


Other Credits


Other Credits

In addition to the changes listed above, the following people contributed fixes for problems introduced during the pre-release testing programme for Generic NQS 3.50.0.

  • Michael Andrews
  • Ulrich Bernhard
  • Mark Loveridge

(I'm sure this list should be longer - Stu. My apologies to anyone I've forgotten to mention.)



This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.