Installing Generic NQS Version 3.4x.x
Academic Computing Services , University of Sheffield
Stuart Herbert (S.Herbert@sheffield.ac.uk)Document copyright ©. All rights reserved.
Abstract
Under grant NTI/48.2 from the New Technologies Sub Committee (NTSC)
of JISC, the University of Sheffield is maintaining a freely-available
version of the Network Queueing System (NQS), the de facto standard
batch processing system for the UNIX operating system.
This document explains how to install, and configure, Generic NQS
v3.40 or later.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Introduction
Welcome To Generic NQS
Thank you for your interest in Generic NQS.
Generic NQS is the continuing development of Monsanto-NQS, itself
descended directly from the original COSMIC NQS, written under
contract to NASA by Sterling Software, Inc.
Since October, 1994, Monsanto-NQS (and then Generic NQS) has been
maintained by The University of Sheffield. We are funded to produce
a freely-available, robust, and well-documented version of NQS for
UK Academia.
For more information, please see the `README' file included with the
source code distribution, or alternatively, from :
> http://www.shef.ac.uk/uni/projects/nqs/Product/GNQS/v3.4x/README/
About This Document
Purpose
This document is supposed to provide you with all the information you
need in order to compile, install, and setup NQS at your site for
daily use.
Instructions are included (in the following order) for :
- New Generic NQS users - Quick Start Guide
One of the more frequent complaints about Generic NQS is that it
appears to be very difficult to install - much more so than, say,
CERN NQS.
The Quick Start Guide is for you if you just want to get NQS up
and running in a hurry.
- New Generic NQS users - compiling, and installing.
These are detailed instructions for users who are installing GNQS
for the very first time, or who are re-installing from scratch.
- Existing Monsanto-NQS/Generic NQS users - upgrading.
These are detailed instructions for users who are upgrading from
an older version of Generic NQS, or from Monsanto-NQS.
- How To Configure GNQS For Batch On A Single Machine
These are typical configurations for a single compute server.
- How To Configure GNQS For Batch On A Cluster Of Workstations
These are typical configurations for a set of workstations which
are clustered together.
Using This Document
I recommend that your print this document out (or otherwise have the
document available) so that you can follow the instructions below and
still read this document at the same time.
A HTML version of this document is available from the following URL :
> http://www.shef.ac.uk/uni/projects/nqs/Product/GNQS/v3.4x/Install/
Conventions
During each step of the installation process, you will see a
paragraph (or more) of instructions, followed by sample commands
which demonstrate what to do. Sample commands are represented by
> this is a sample command
Contacting The Author
IF there is anything about compiling, installing, and setting up
Generic NQS which this document does NOT cover, please mail the
author.
> mailto:S.Herbert@sheffield.ac.uk
I normally reply within one working day.
Quick Start Guide
Introduction
In this chapter, we look at how to compile, install, and configure
Generic NQS with the minimum of fuss, effort, and detail. After
reading this chapter, please read the two chapters which explain
typical configurations for using Generic NQS on a number of
machines.
Installing Generic NQS
As the user `root' on your machine, download the Generic NQS source
code. Uncompress the source code.
> ftp ftp.shef.ac.uk
> cd /pub/uni/projects/nqs/latest
> binary
> get Generic-NQS.tar.gz
> quit
> tar zxf Generic-NQS.tar.gz
> cd Generic-NQS-3.40.0/proto
Edit the file `Makefile'. Scroll through the Makefile, and uncomment
the line which includes the support for your version of UNIX.
> vi `Makefile'
> (scroll down to where it says STEP THREE)
> (remove the # from the front of the correct line for your OS)
> :wq (save and quit)
Compile and install the software.
> make ; make directories ; make install ; make install.man
Add a service entry for Generic NQS. If your machine uses NIS,
NIS+, or something else like that, then you may need to do
additional steps - refer to your system administrator's manual.
> vi /etc/services
> nqs 607/tcp # Generic NQS
Create an NQS machine id for your host.
> rehash
> nmapmgr
> add host <your machine's hostname>
> (eg add host myrddraal)
> add alias <your machine's full DNS name> <your machine's hostname>
> (eg add alias myrddraal.shef.ac.uk myrddraal)
> list
> exit
Finally, start Generic NQS for the very first time.
> qmgr start nqs > /usr/adm/nqslog
If Generic NQS fails to start, then any error messages will be
placed in the file /usr/adm/nqslog.
You may want to put this into your machine's startup scripts.
Configuring Generic NQS
First things first - add your normal, non-root user as an NQS
manager, so that you do not have to be logged in as root whenever
you want to change the configuration of Generic NQS.
> qmgr add manager yourself:m
Also add any other NQS managers as required.
Next, specify where you want the NQS logfile to go.
> qmgr set log_file /usr/adm/nqslog
Now, create your first NQS queue.
> qmgr create batch_queue test1
> qmgr enable queue test1
> qmgr start queue test1
Submit your first NQS request.
> qsub -q test1
> date
> ^D
Finally, view the output files created by your first NQS request.
> ls STDIN.*
> more STDIN.o0
> more STDIN.e0
Compiling And Installing Generic NQS
Introduction
In this chapter, we look at how to compile and install the GNQS
source code. These instructions were written for Generic NQS
v3.40.0, and were last updated on Friday, 1st September 1995.
Generic NQS v3.40.0 is available from
> ftp://www.shef.ac.uk/uni/projects/nqs/v3.40/
Getting Started
Before you start, you need to know the following :
- Which version of UNIX do you have?
Generic NQS has been tested on the following versions of UNIX at
some point in its history :
AIX 3.2.5, 4.1; Fujitsu; HP-UX 8,9; Irix 4,5,6; Linux; NCR;
OSF/1; Solaris 2; SunOS 4; ULTRIX, UNICOS 8.
If you are successful in getting Generic NQS to work, please email
me, and tell me, or use the following form on the World-Wide Web :
> http://www.shef.ac.uk/uni/projects/nqs/Product/GNQS/v3.4x/Success.html
Edit The Makefile
Change to the `proto' directory, and edit the file `Makefile'.
> cd Generic-NQS-3.40.0/proto
> vi Makefile
Where NQS Is Installed
The GNQS software itself can be installed onto a central server, and
shared between all of the workstations which mount their software
(typically via NFS) from that server.
The GNQS software is installed into directories off of NQS_ROOTDIR.
NQS_ROOTDIR can be a directory which is shared via NFS between
several machines. For example, if NQS_ROOTDIR was `/usr/local', then
GNQS would be installed into `/usr/local/bin', `/usr/local/sbin',
`/usr/local/man' etc., etc.
If you want to install GNQS into somewhere other than `/usr/local',
then change the value of NQS_ROOTDIR.
> NQS_ROOTDIR = /usr/local
The GNQS software also requires temporary file space to work. This
temporary space MUST be unique to each machine running GNQS, although
it can be on an NFS partition. By default, GNQS uses the standard
UNIX spool area, `/usr/spool'.
If you want GNQS to store its working files elsewhere, change the
value of NQS_ROOTPRIV.
> NQS_ROOTPRIV = /usr/spool
After this, the Makefile has a set of entries (from NQS_MAN to
NQS_NMAP) which specify where to install the various components of
Generic NQS. We recommend that you use the default settings.
Which Optional Features Do You Want?
Generic NQS now includes a number of features which you can choose
to add, or leave out, at compile time. The Makefile includes a
brief list, and you can find more details in the file
`doc/Features', or from the URL
> http://www.shef.ac.uk/uni/projects/nqs/Product/GNQS/v3.4x/Features/
By default, the Makefile enables the features most GNQS installations
will want. If you want to change which features are available, then
change the value of FEATURES.
> FEATURES = -DTAMU
Which Version of UNIX You Are Using
The GNQS software can be compiled on a number of platforms. For
each supported platform, there is an appropriate Makefile, which
contains all the information specific to each version of UNIX.
Please uncomment the line which selects the Makefile for your
UNIX machine.
> #include Makefile.linux
becomes
> include Makefile.linux
for example.
File Ownership
The GNQS software (and directories it uses) will be owned (by default)
by the user `root', and the group `bin'. If you wish to change either
of these, edit the values of `NQS_OWNER' and `NQS_GROUP' respectively.
> NQS_OWNER = root
> NQS_GROUP = bin
Save Your Changes
Once you have done the above, please save your changed Makefile to
disk ready for compiling.
Compiling The Software
We are now ready to compile Generic NQS. To do so, please change
to the `proto' directory (if you're not already there), and type
`make'.
> cd Generic-NQS-3.40.0/proto
> make
This will compile Generic NQS v3.4x, ready for installing. Generic
NQS is a large program, and can take over half an hour to compile
(especially if your machine is heavily loaded).
While Generic NQS is compiling, you will probably see your C
compiler producing warnings. The Generic NQS code is very old (much
of it dates back to 1985) and while a number of people have spent
time removing those warnings, we have not yet managed to remove them
all.
If Generic NQS fails to compile, please contact the author with the
following details :
- Which version of UNIX you are compiling on?
You can get this information normally by using `uname -a'.
- Where there is a choice, which C compiler you are using?
You can get this information normally by using `which cc'. If
you are using the Free Software Foundation's GCC, please indicate
which version of GCC you are using.
- A log of the compilation.
You can do this by typing `make >& logfile' (if you use csh/tcsh)
or `make > logfile 2>1' (is you use bash/sh/ksh). This will
store information only about the file that fails to compile.
- Anything else you think is important.
Please email all of the above to NQS-Support@mailbase.ac.uk.
Installing
Creating The GNQS Working Files
Once GNQS has compiled, the next step is to build the working files
GNQS uses. To do this, go to the `proto' directory, and type `make
directories'.
> cd Generic-NQS-3.40.0/proto
> make directories
Installing The GNQS Software
Now you can install the Generic NQS software itself, by using the
command `make install'.
> cd Generic-NQS-3.40.0/proto
> make install
Installing The GNQS Manual Pages
The Generic NQS manual pages can be installed by using the command
`make install.man'.
> cd Generic-NQS-3.40.0/proto
> make install.man
You will then need to rebuild your manual page database. For some
versions of UNIX, this is done with the command /usr/lib/whatis; for
others, this is done using catman. Please refer to your vendor's
documentation.
Adding The Service Entry
You next need to edit /etc/services (or modify your NIS/NIS+ database)
to add an entry for GNQS :
> nqs 607/tcp # Network Queueing System
Creating A Machine ID
Your next step is to run the nmapmgr program, provided with Generic
NQS, to allocate a machine id to your computer. The following
commands will use the IP address of your computer to form a machine
id :
> nmapmgr
> NMAPMGR: add host <hostname>
> NMAPMGR: add alias <hostname>.<domainname> <hostname>
> NMAPMGR: list
> NMAPMGR: exit
Each machine running NQS requires a unique machine id. The machine id
is used to track NQS requests as they move between different machines
running NQS.
If you wish to use Generic NQS in conjunction with other versions of
NQS, then you may have to assign machine id's explicitly - some
versions of NQS only allow machine id's which are small in value.
See the nmapmgr(1) manual page for information on how to manually
assign machine id's.
Start Up GNQS
To start GNQS for the very first time, use the command
> qmgr start nqs
You should add this command to your startup scripts to restart the
NQS daemon whenever the machine is restarted.
Installation Complete
At this point, Generic NQS is now installed on your computer. Your
next step is to configure Generic NQS to suit your setup. Chapter 4,
below, has details on how to go about this, and also includes several
example setups which may suit your needs.
Upgrading From An Older Version
Introduction
In this chapter, we look at how to upgrade your existing version of
Monsanto-NQS/Generic NQS to the latest Generic NQS v3.4x.x.
Getting Started
You must first ask yourself the following questions :
- Am I upgrading from Monsanto-NQS v3.35, Monsanto-NQS v3.36.x, or
Generic NQS v3.4x.x?
- Do I intend to install this new version of Generic NQS into
exactly the same directories as my current version?
If you answered `no' to either of these, then please follow the
instructions for a `A Non-Staged Upgrade', below.
If you answered `yes' to both questions, then please follow the
instructions for a `A Staged Upgrade', below.
If you are upgrading from someone else's version of NQS (eg, CERN
NQS), then you should consider this a new installation, and follow
the instructions given in Chapter 3.
A Non-Staged Upgrade
Introduction
These are instructions on how to upgrade your NQS installation to use
the software from this latest release. These instructions should be
followed if any of the following are true :
- You are upgrading from a version of Monsanto-NQS prior to v3.35.
- You do not wish to install this latest release in the same place
as your current NQS.
Step One : Compile The Software
Follow the instructions in Chapter 3, above, to compile this release
of Generic NQS. Stop once you reach section 3.5, ``Installing''.
Step Two : Take A Copy Of Your Current NQS Configuration
If you want to keep your current NQS configuration, use the qmgr(1)
command to create a `snap-file' :
> qmgr
> #Mgr: snap file=<filename>
> #Mgr: exit
This is a precaution, in case things go horribly wrong, and you need
to rebuild your NQS configuration.
Step Three : Stop NQS
Make sure that there are no running NQS jobs, and shutdown the
running NQS daemon :
> qmgr
> #Mgr: shutdown
> #Mgr: exit
Step Four : Install The New Version Of Generic NQS
Install the new version of Generic NQS by using the following
commands :
> cd Generic-NQS-3.40.0/proto
> make install
> make install.man
Step Five : Move The NQS Spool Area
This step only applies if :
- You have compiled the new version of NQS to store its working
files into a different directory.
Move the NQS spool area from its current place (typically,
/usr/spool/nqs) to its new place.
Step Six : Restart NQS
Finally, start up the new NQS daemon, by using qmgr :
> qmgr start nqs
Your Non-Staged Upgrade should now be complete.
A Staged Upgrade
Introduction
These are instructions for how to upgrade your NQS installation to
use the latest Generic NQS release. These instructions should be
followed if all of the following are true :
- You are upgrading from Monsanto-NQS v3.35 or later, or from
Generic NQS v3.40.0 or later.
- You intend to install this release of Generic NQS into the
SAME directories as your current version of NQS.
Step One : Compile The Software
Follow the instructions in Chapter 3, above, to compile this release
of Generic NQS. Stop when you get to section 3.5, ``Installing''.
Make sure that you edit the makefiles to ensure that the new NQS
software will be installed into the same directories as your existing
NQS software.
Step Two: Stage In The Software
To stage in the new NQS software, use the following commands :
> cd Generic-NQS-3.40.0/proto
> make stage
Your upgrade is now complete. Your existing NQS installation will
automatically replace itself with the new software when it can.
Configuring Generic NQS
Introduction
This chapter explains how to configure Generic NQS once it has been
installed. To do this, we will work through a collection of sample
configurations which have been contributed by various Generic NQS
users. Feel free to use one of these configurations for your own
computer.
Comments, and contributed configurations, are always welcome.
I've broken these configurations up into two groups, which represent
the two types of computer system NQS is typically used in.
Compute Servers
Introduction
One of the most popular uses of NQS is to impose some kind of order
on the users of central compute servers. These are typically
powerful UNIX machines (eg, SGI Challenge XL), possibly acting as
servers for a number of clustered workstations. They have many
large or CPU-intensive processes running concurrently.
NQS installations on this type of machine are typically stand-alone,
and do not dispatch jobs out to lesser machines, such as
workstations. Sometimes, workstations may forward jobs to the
compute server.
The purpose of an NQS installation on such a machine is to prevent
the over-allocation of system resources, so that a healthy
throughput is maintained. The main system resources which are
always in short supply are CPU time, and memory.
Sample Configuration - Controlling CPU Usage
This configuration is based upon the one used here at the University
of Sheffield on our SGI Challenge XL computer. This configuration
will probably be sufficient for most environments, because, if local
experience is anything to go by, most users soon get a feel for how
long their work will take to run, but they really haven't a clue as
to how much of other resources (such as memory) it will make use of
during that time.
Create four batch queues :
> qmgr create batch_queue short
> qmgr create batch_queue medium
> qmgr create batch_queue long
> qmgr create batch_queue extra_long
Next, for each queue, specify a maximum CPU time, the limit getting
progressively larger for each queue.
> qmgr set per_process cpu_limit = ( 2:0:0 ) short
> qmgr set per_process cpu_limit = ( 8:0:0 ) medium
> qmgr set per_process cpu_limit = ( 24:0:0 ) long
> qmgr set per_process cpu_limit = ( 72:0:0 ) extra_long
Here, we have limits of 2 hours, 8 hours, 24 hours and 72 hours
respectively for the four queues.
We now need to specify priorities and runlimits for each of these
queues, to ensure a good working balance between the four queues.
The runlimits depend entirely on what your machine can handle -
those given here are for a Challenge XL with 12 CPUs and 512Mb of
real RAM. I recommend that you experiment with the runlimits in
order to ensure that the running NQS requests don't put a strain
on your memory resources.
> qmgr set priority = 40 short
> qmgr set priority = 30 medium
> qmgr set priority = 20 long
> qmgr set priority = 10 extra-long
> qmgr set run_limit = 5 short
> qmgr set run_limit = 4 medium
> qmgr set run_limit = 2 long
> qmgr set run_limit = 1 extra-long
Next, you need to decide, for each queue, how many requests each
user is allowed to have actually running at the same time. If you
compiled NQS with dynamic scheduling (enabled by default), then
users who submit more jobs than they are allowed to run
simultaneously will find that their jobs will have a lower priority,
and therefore will be lower down in the queue.
> qmgr set user_limit = 2 short
> qmgr set user_limit = 1 medium
> qmgr set user_limit = 1 long
> qmgr set user_limit = 1 extra-long
Finally, you need to decide when these queues may run, and then use
root's crontab to start and stop the NQS queues as appropriate. In
this configuration, the only queue which would not run all the time
would be the extra-long queue; this queue would be started at 5pm on
Fridays, and stopped sometime before 9am Monday morning.
I'm sure that there are ways in which this configuration could be
improved; feel free to discuss this on the NQS-Support mailing list.
Clustered Workstations
Introduction
In recent times, there has been much interest in finding scheduling
software which can make use of UNIX workstations sat on people's
desks. These workstations are typically idle for long periods of
time overnight, which represents a significant amount of wasted CPU
time.
We will concern ourselves only with `clustered' workstations. These
are workstations which typically mount software and/or user
filestore via NFS (or equivalent) from a local server. This has the
effect of ensuring that all the workstations in a cluster are the
same architecture, run the same operating system, and have identical
filestore layouts. When the local server fails, each workstation is
unusable, because of the loss of services involved.
Sample Configuration - Clustered Workstations
This configuration demonstrates how to use a combination of pipe and
batch queues to setup a load-balancing NQS queue for a cluster of
workstations. You can then create more load-balancing queues, using
the same principles, and vary the limits per load-balancing queue in
order to provide a balanced service.
On each workstation which will run NQS requests, do the following :
> qmgr create batch misc-dest pipeonly run_limit = 1 user_limit = 1
> nice_level = 10
This creates a queue, misc-dest, which will run one NQS request at
a time, and which runs all requests at a nice level of `10', just
in case a.user is sat at the console trying to work while the job
is running.
Then, on each workstation which will run NQS requests, do :
> qmgr create pipe misc-in pipeonly run_limit = 5 user_limit = 1
> destination = misc-dest
> qmgr set lb_in misc-in
This creates a pipe queue, misc-in, which will store up to five
requests at a time. It will forward those requests to the queue
misc-dest, and will only accept requests if there are less than five
requests in the queue.
Finally, on each workstation which will run NQS requests, do :
> qmgr set scheduler server-name
where `server-name' is the DNS name of the local server which the
workstations mount filestore from. NOTE that you must set all the
workstations in a cluster to point to the SAME server.
Now, on the local server, do :
> qmgr create pipe misc-sched run_limit = 40 user_limit = 5
> destination = misc-in@workstation1, misc-in@workstation2 ...
> qmgr set lb_out misc-sched
where `workstation1', `workstation2' and so on are all of your
workstations which will run NQS requests. NQS will only send new
requests to your workstations when they have room for them in their
`misc-in' queues, and based on the load information from each
machine (a machine with a low load is favoured over a machine with a
high load).
Finally, on each of your workstations on which users can submit NQS
requests, do :
> qmgr create pipe misc destination = misc-sched@server
> qmgr set lb_in misc
So, when a user submits a request locally to the queue `misc', it is
sent to the queue `misc-sched' on the local server, which then sends
it to the least loaded workstation in the cluster.
|