Project Introduction
Academic Computing Services , University of Sheffield
Stuart Herbert (S.Herbert@sheffield.ac.uk)Document copyright ©. All rights reserved.
Abstract
This is the WWW Support Site for the NQS-Announce and NQS-Support
mailing lists available via Mailbase (NQS-Announce@mailbase.ac.uk,
NQS-Developers@mailbase.ac.uk, NQS-Support@mailbase.ac.uk). You
will find all the files available from the mailing list on here as
well. Feel free to email me with any queries which aren't answered
here.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Introduction To The Project
Introduction
The introduction is taken from the information included with the
advertisment of this post, and was prepared by Chris Cartledge,
Deputy Director, Academic Computing Services, University of
Sheffield.
Background
Higher Education establishments typically have clusters of UNIX
workstations that are switched on for 24 hours per day, but are
almost unused overnight due to the lack of appropriate systems
software to distribute and manage batch work. Software is available
both commercially and in the public domain to perform the task (like
NQS, DQS and others). This project will evaluate products and will
supply support and education to enable the better utilisation of
equipment already installed.
Clearly, if there is no work to be done, it does not much matter if
the equipment is left unused. However, many users of open access
workstation clusters, particularly research students in science and
engineering, have computations that are larger than can conveniently
be performed while sitting at a workstation. However they are
forced to work this way because UNIX does not provide any mechanism
for scheduling their work to be run later, at off-peak times. This
is inefficient for the student who spends a lot of time staring at a
blank screen, and inefficient in the use of expensive equipment.
The minimal background mechanism offered by UNIX is for a job to be
run immediately, but this impairs the interactive performance of the
workstation.
NQS and DQS allow jobs to be scheduled to be run across a network of
workstations when they are lightly loaded, typically overnight. The
queues can be distributed and the load can be balanced to maximise
throughput. Clearly the commercial products with their wealth of
support and documentation are attractive, however they are also very
expensive (being costed on a per-processor basis), and it is not
obvious that they are much better than public domain codes. Also it
is difficult to estimate the demand for a service like this before
it is implemented. We have found the take-up to be high at
Sheffield even though we did not have a lot of users demanding a
batch service before we supplied it.
The public domain codes are expensive, but in a different way. They
are often poorly documented, particularly from the user's point of
view, and are difficult to find, compare, evaluate, install and
configure.
Expertise Available
University of Sheffield Academic Computing Services already runs one
commercial version (IRIS) of NQS, but is about to switch to another
commercial version (Sterling) for support reasons. However the
licensing of this product on a per-processor basis makes the use of
the public domain codes attractive. NQS was written for NASA so the
original version is in the public domain, as is at least one later
version (Monsanto). DQS is an alternative which originates from the
Supercomputer Computations Research Institute at Florida State
University. These are being evaluated at Sheffield for local use.
The Department is familiar with the provision of high quality
support and documentation, much of which is in use at other sites,
though it has no recent national funding for these activities.
Aims And Objectives
The main objective is to help sites match their users' demand for
computer power to the available equipment through the use of
distributed batch systems. Currently the systems are available, but
little or no independant information about the suitability of
products is available. Getting to know the products sufficiently
well to understand them without training is a very time consuming
and hence expensive task. Specific aims for the year of funding
would be as follows:
- to implement and evaluate commercial and public domain
distributed batch systems and in particular NQS and DQS;
- to provide a report, comparing the systems' utility;
- to provide support on implementation, configuration and use
through the setting up and monitoring of a list on Mailbase;
- to provide a training course on selected systems at which the
systems will be described and information on implementation and
configuration will be given;
- to provide packaged releases for popular systems and in
particular Sun Solaris 1 and Sun Solaris 2;
- to provide simple end user documentation on selected systems (to
augment the inevitably terse manual pages).
|