Features Of Generic NQS
June 1996
Stuart Herbert (S.Herbert@sheffield.ac.uk)Document copyright ©. All rights reserved.
Abstract
This paper looks at a number of the features which set Generic NQS
apart from competitive products. For more exhaustive information
about the capabilities of Generic NQS, please refer to the Official
Manual Set, available from the Generic NQS World Wide Web Site.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Introduction
No Nonsense Batch Processing
Generic NQS is a batch processing system which you compile, install,
configure, and then simply use. There is little, if any,
administration required from day to day; the only administration
comes if you decide to change the batch queues available to your
users.
There can be no denying that some commercial alternatives (and a few
freely-available ones - NASA's PBS comes to mind) do have more to
offer. Considering the amount of people and money available to these
alternatives compared to Generic NQS, you'd be worried if they
didn't.
But, at the end of the day, Generic NQS provides straight-forward
no-nonsense batch processing in combination with availability across
a large number of UNIX-like operating systems, and I think you'd be
amazed at just how many people find that this is really all that they
need.
Below, you will find short discussions on selected bits of
functionality which we believe make Generic NQS that little bit
different from your other choices.
Symmetric Multi-Processing
Introduction
Most UNIX equipment purchased for central use these days have one
thing in common : they are equipped with two or more CPU's, and use
techniques broadly described as Symmetric Multi-Processing (SMP) in
order to make use of all of this extra power without you having to
resort to true parallel processing (and the programming headaches
which come with parallel processing).
These operating systems provide various mechanisms for you, the
system administrator, to exert control over the way each processor is
used. Yet batch processing systems in general are blissfully unaware
of the differences between normal UNIX, and UNIX running on SMP
hardware.
As you will see, Generic NQS is different.
Processor Sets
Both IRIX 5.x/6.x and Digital UNIX allow system administrators to
create what are known as `processor sets'. In the simplest sense,
these are a set of CPUs. You can then take any process, and bind it
to a processor set, which means that the process, and its children,
will only run on the CPUs within that processor set. It's a simple
concept, but very effective.
Generic NQS was, to the best of my knowledge, the first batch
processing system to provide support for processor sets. Support for
processor sets on IRIX 5 & 6 was available on 22nd November, 1994,
and for Digital UNIX on 4th June 1996. And this support is so
simple, you'll wonder why no-one else has bothered yet.
With Generic NQS, you simply create a batch queue which has the same
name as the processor set you want to bind running jobs to. (For
Digital UNIX, you name the queue `_xxx', where xxx is the name of the
processor set) Then, whenever Generic NQS starts a new job out of
that queue, it simply puts the job in the processor set with the same
name as the queue.
Restricted Processors
Another useful trick of SMP equipment is that you can `rent' out
individual CPUs to particular departments or end-users so that they
are guaranteed a certain level of throughput, and you get money to
pay for the extra CPU (or two). Normally, this management option has
simply resulted in the `rented out' CPU lying idle when not in use,
but Generic NQS provides you with a better approach.
Generic NQS provides prologue and epilogue script support. This
means that, immediately before, and immediately after, a job runs in
a batch queue, Generic NQS runs a script, which you provide. This
script runs as the superuser, and therefore can do practically
anything it wants.
The idea is this: the prologue script uses your operating system's
SMP tools (such as mpadmin) to restrict a given CPU to running just
the job which is being started. The epilogue script then unrestricts
the CPU. This allows you to `rent' the CPU (via Generic NQS) to
guarentee performance, but in between times everyone gets to use the
extra CPU, making a difference to their work too.
Scheduling
Cluster-Wide Scheduling
One of the main uses of a batch processing system is to schedule
jobs to run overnight on otherwise unused workstations, in order
to increase the amount of work each machine does, and so further
justify their cost. No batch processing system is complete without
this ability, and Generic NQS is no exception.
With Generic NQS, you decide which machine will perform scheduling on
behalf of your clustered workstations. You would normally pick the
server for your cluster of workstations; if the server were to fail
for some reason, chances are your workstations would be unusable
as a result.
Then all you do is create a set of pipe queues on your server, some
of which are load-balanced based on incoming jobs, and some of which
are based on outgoing jobs, and then create pipe queues on your
workstations to send and receive jobs from these load balanced
queues.
And that's it.
Dynamic Scheduling Of Local Queues
One of the biggest complaints about many versions of NQS, and indeed
about other batch processing systems, is that users' jobs are
scheduled in a strictly first-in-first-out (FIFO) manner, taking
little (if any) account of factors such as priorities and the
workload scheduled by each user in turn.
We found with Generic NQS that the reason, and the solution, was
surprisingly straight forward. The problem is caused because these
systems only change the order of queues by using an insertion sort to
determine where to place a new job when it is first queued. Thanks
to Dave Safford (saff@saff.tamu.edu), Generic NQS re-orders its
queues completely every time the contents of the queue changes. In
addition, as part of this re-ordering, Generic NQS has a look to see
just how many jobs each user has, and users who have submitted more
jobs than can be run simultaneously will find that their excess jobs
sink towards the bottom of the queue.
Because the Generic NQS source code is freely available, the
scheduling can be easily modified to meet more specific needs.
Cluster-Wide Dynamic Scheduling
We did just that, and applied the re-sorting of queues to cover pipe
queues, through the use of queue complexes. As a result, if you put
all of your load balanced queues on the cluster scheduler machine
into one or more complexes, this allows you to perform cluster-wide
dynamic scheduling, ensuring that jobs are scheduled fairly, and that
users who abuse your resources are stopped.
Non-Degradable Priorities
IRIX 5 and 6 users can rest assured that Generic NQS provides full
support for IRIX's non-degradable priorities.
Resources
Kernel Based Resource Limits
Your UNIX kernel supports a number of useful limits, limits which
you may already use to prevent individual users wreaking havoc on
your system. These are typically limits on the amount of CPU time
each process can accumulate, the amount of virtual memory each
process can allocate, and so forth.
These limits differ from operating system to operating system, and a
common complaint about one particular commercial NQS product is that
their idea of cross-platform support is to simply support those
limits which are common to all of their platforms.
Not so Generic NQS.
Generic NQS features something we call the `System Abstraction Layer',
a component we are continually improving which, when you compile
Generic NQS, makes sure that your operating system is supported to
the full. The first part of the `System Abstraction Layer',
introduced in Generic NQS 3.50, works out what limits your operating
system supports, and makes sure that Generic NQS knows about them.
|