Example Generic NQS Configurations Version 3.5x.x
Academic Computing Services , University of Sheffield
Stuart Herbert (S.Herbert@sheffield.ac.uk)Document copyright ©. All rights reserved.
Abstract
This document lists a number of sample configurations to help new
installations of Generic NQS.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Configuring Generic NQS
Introduction
This chapter explains how to configure Generic NQS once it has been
installed. To do this, we will work through a collection of sample
configurations which have been contributed by various Generic NQS
users. Feel free to use one of these configurations for your own
computer.
Comments, and contributed configurations, are always welcome.
I've broken these configurations up into two groups, which represent
the two types of computer system NQS is typically used in.
Compute Servers
Introduction
One of the most popular uses of NQS is to impose some kind of order
on the users of central compute servers. These are typically
powerful UNIX machines (eg, SGI Challenge XL), possibly acting as
servers for a number of clustered workstations. They have many
large or CPU-intensive processes running concurrently.
NQS installations on this type of machine are typically stand-alone,
and do not dispatch jobs out to lesser machines, such as
workstations. Sometimes, workstations may forward jobs to the
compute server.
The purpose of an NQS installation on such a machine is to prevent
the over-allocation of system resources, so that a healthy
throughput is maintained. The main system resources which are
always in short supply are CPU time, and memory.
Sample Configuration - Controlling CPU Usage
This configuration is based upon the one used here at the University
of Sheffield on our SGI Challenge XL computer. This configuration
will probably be sufficient for most environments, because, if local
experience is anything to go by, most users soon get a feel for how
long their work will take to run, but they really haven't a clue as
to how much of other resources (such as memory) it will make use of
during that time.
Create four batch queues :
> qmgr create batch_queue short
> qmgr create batch_queue medium
> qmgr create batch_queue long
> qmgr create batch_queue extra_long
Next, for each queue, specify a maximum CPU time, the limit getting
progressively larger for each queue.
> qmgr set per_process cpu_limit = \( 2:0:0 \) short
> qmgr set per_process cpu_limit = \( 8:0:0 \) medium
> qmgr set per_process cpu_limit = \( 24:0:0 \) long
> qmgr set per_process cpu_limit = \( 72:0:0 \) extra_long
Here, we have limits of 2 hours, 8 hours, 24 hours and 72 hours
respectively for the four queues.
We now need to specify priorities and runlimits for each of these
queues, to ensure a good working balance between the four queues.
The runlimits depend entirely on what your machine can handle -
those given here are for a Challenge XL with 12 CPUs and 512Mb of
real RAM. I recommend that you experiment with the runlimits in
order to ensure that the running NQS requests don't put a strain
on your memory resources.
> qmgr set priority = 40 short
> qmgr set priority = 30 medium
> qmgr set priority = 20 long
> qmgr set priority = 10 extra-long
> qmgr set run_limit = 5 short
> qmgr set run_limit = 4 medium
> qmgr set run_limit = 2 long
> qmgr set run_limit = 1 extra-long
Next, you need to decide, for each queue, how many requests each
user is allowed to have actually running at the same time. If you
compiled NQS with dynamic scheduling (enabled by default), then
users who submit more jobs than they are allowed to run
simultaneously will find that their jobs will have a lower priority,
and therefore will be lower down in the queue.
> qmgr set user_limit = 2 short
> qmgr set user_limit = 1 medium
> qmgr set user_limit = 1 long
> qmgr set user_limit = 1 extra-long
Finally, you need to decide when these queues may run, and then use
root's crontab to start and stop the NQS queues as appropriate. In
this configuration, the only queue which would not run all the time
would be the extra-long queue; this queue would be started at 5pm on
Fridays, and stopped sometime before 9am Monday morning.
I'm sure that there are ways in which this configuration could be
improved; feel free to discuss this on the NQS-Support mailing list.
Clustered Workstations
Introduction
In recent times, there has been much interest in finding scheduling
software which can make use of UNIX workstations sat on people's
desks. These workstations are typically idle for long periods of
time overnight, which represents a significant amount of wasted CPU
time.
We will concern ourselves only with `clustered' workstations. These
are workstations which typically mount software and/or user
filestore via NFS (or equivalent) from a local server. This has the
effect of ensuring that all the workstations in a cluster are the
same architecture, run the same operating system, and have identical
filestore layouts. When the local server fails, each workstation is
unusable, because of the loss of services involved.
Sample Configuration - Clustered Workstations
This configuration demonstrates how to use a combination of pipe and
batch queues to setup a load-balancing NQS queue for a cluster of
workstations. You can then create more load-balancing queues, using
the same principles, and vary the limits per load-balancing queue in
order to provide a balanced service.
On each workstation which will run NQS requests, do the following :
> qmgr create batch misc-dest pipeonly run_limit = 1 user_limit = 1
> nice_level = 10
This creates a queue, misc-dest, which will run one NQS request at
a time, and which runs all requests at a nice level of `10', just
in case a.user is sat at the console trying to work while the job
is running.
Then, on each workstation which will run NQS requests, do :
> qmgr create pipe misc-in pipeonly run_limit = 5
> destination = misc-dest
> qmgr set lb_in misc-in
This creates a pipe queue, misc-in, which will store up to five
requests at a time. It will forward those requests to the queue
misc-dest, and will only accept requests if there are less than five
requests in the queue.
Finally, on each workstation which will run NQS requests, do :
> qmgr set scheduler server-name
where `server-name' is the DNS name of the local server which the
workstations mount filestore from. NOTE that you must set all the
workstations in a cluster to point to the SAME server.
Now, on the local server, do :
> qmgr create pipe misc-sched run_limit = 40
> destination = misc-in@workstation1, misc-in@workstation2 ...
> qmgr set lb_out misc-sched
where `workstation1', `workstation2' and so on are all of your
workstations which will run NQS requests. NQS will only send new
requests to your workstations when they have room for them in their
`misc-in' queues, and based on the load information from each
machine (a machine with a low load is favoured over a machine with a
high load).
Finally, on each of your workstations on which users can submit NQS
requests, do :
> qmgr create pipe misc destination = misc-sched@server
So, when a user submits a request locally to the queue `misc', it is
sent to the queue `misc-sched' on the local server, which then sends
it to the least loaded workstation in the cluster.
|