This is www.gnqs.org, The Home Of Batch Processing

Example Generic NQS Configurations Version 3.5x.x

Academic Computing Services , University of Sheffield

Stuart Herbert (S.Herbert@sheffield.ac.uk)

Abstract

This document lists a number of sample configurations to help new installations of Generic NQS.

Configuring Generic NQS

Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).

Configuring Generic NQS

Introduction

This chapter explains how to configure Generic NQS once it has been installed. To do this, we will work through a collection of sample configurations which have been contributed by various Generic NQS users. Feel free to use one of these configurations for your own computer.

Comments, and contributed configurations, are always welcome.

I've broken these configurations up into two groups, which represent the two types of computer system NQS is typically used in.

Compute Servers

Introduction

One of the most popular uses of NQS is to impose some kind of order on the users of central compute servers. These are typically powerful UNIX machines (eg, SGI Challenge XL), possibly acting as servers for a number of clustered workstations. They have many large or CPU-intensive processes running concurrently.

NQS installations on this type of machine are typically stand-alone, and do not dispatch jobs out to lesser machines, such as workstations. Sometimes, workstations may forward jobs to the compute server.

The purpose of an NQS installation on such a machine is to prevent the over-allocation of system resources, so that a healthy throughput is maintained. The main system resources which are always in short supply are CPU time, and memory.

Sample Configuration - Controlling CPU Usage

This configuration is based upon the one used here at the University of Sheffield on our SGI Challenge XL computer. This configuration will probably be sufficient for most environments, because, if local experience is anything to go by, most users soon get a feel for how long their work will take to run, but they really haven't a clue as to how much of other resources (such as memory) it will make use of during that time.

Create four batch queues :

>  qmgr create batch_queue short
>  qmgr create batch_queue medium
>  qmgr create batch_queue long
>  qmgr create batch_queue extra_long

Next, for each queue, specify a maximum CPU time, the limit getting progressively larger for each queue.

>  qmgr set per_process cpu_limit = \( 2:0:0  \) short
>  qmgr set per_process cpu_limit = \( 8:0:0  \) medium
>  qmgr set per_process cpu_limit = \( 24:0:0 \) long
>  qmgr set per_process cpu_limit = \( 72:0:0 \) extra_long

Here, we have limits of 2 hours, 8 hours, 24 hours and 72 hours respectively for the four queues.

We now need to specify priorities and runlimits for each of these queues, to ensure a good working balance between the four queues. The runlimits depend entirely on what your machine can handle - those given here are for a Challenge XL with 12 CPUs and 512Mb of real RAM. I recommend that you experiment with the runlimits in order to ensure that the running NQS requests don't put a strain on your memory resources.

>  qmgr set priority = 40 short
>  qmgr set priority = 30 medium
>  qmgr set priority = 20 long
>  qmgr set priority = 10 extra-long

>  qmgr set run_limit = 5 short
>  qmgr set run_limit = 4 medium
>  qmgr set run_limit = 2 long
>  qmgr set run_limit = 1 extra-long

Next, you need to decide, for each queue, how many requests each user is allowed to have actually running at the same time. If you compiled NQS with dynamic scheduling (enabled by default), then users who submit more jobs than they are allowed to run simultaneously will find that their jobs will have a lower priority, and therefore will be lower down in the queue.

>  qmgr set user_limit = 2 short
>  qmgr set user_limit = 1 medium
>  qmgr set user_limit = 1 long
>  qmgr set user_limit = 1 extra-long

Finally, you need to decide when these queues may run, and then use root's crontab to start and stop the NQS queues as appropriate. In this configuration, the only queue which would not run all the time would be the extra-long queue; this queue would be started at 5pm on Fridays, and stopped sometime before 9am Monday morning.

I'm sure that there are ways in which this configuration could be improved; feel free to discuss this on the NQS-Support mailing list.

Clustered Workstations

Introduction

In recent times, there has been much interest in finding scheduling software which can make use of UNIX workstations sat on people's desks. These workstations are typically idle for long periods of time overnight, which represents a significant amount of wasted CPU time.

We will concern ourselves only with `clustered' workstations. These are workstations which typically mount software and/or user filestore via NFS (or equivalent) from a local server. This has the effect of ensuring that all the workstations in a cluster are the same architecture, run the same operating system, and have identical filestore layouts. When the local server fails, each workstation is unusable, because of the loss of services involved.

Sample Configuration - Clustered Workstations

This configuration demonstrates how to use a combination of pipe and batch queues to setup a load-balancing NQS queue for a cluster of workstations. You can then create more load-balancing queues, using the same principles, and vary the limits per load-balancing queue in order to provide a balanced service.

On each workstation which will run NQS requests, do the following :

>  qmgr create batch misc-dest pipeonly run_limit = 1 user_limit = 1
>    nice_level = 10

This creates a queue, misc-dest, which will run one NQS request at a time, and which runs all requests at a nice level of `10', just in case a.user is sat at the console trying to work while the job is running.

Then, on each workstation which will run NQS requests, do :

>  qmgr create pipe misc-in pipeonly run_limit = 5
>    destination = misc-dest
>  qmgr set lb_in misc-in

This creates a pipe queue, misc-in, which will store up to five requests at a time. It will forward those requests to the queue misc-dest, and will only accept requests if there are less than five requests in the queue.

Finally, on each workstation which will run NQS requests, do :

>  qmgr set scheduler server-name

where `server-name' is the DNS name of the local server which the workstations mount filestore from. NOTE that you must set all the workstations in a cluster to point to the SAME server.

Now, on the local server, do :

>  qmgr create pipe misc-sched run_limit = 40
>    destination = misc-in@workstation1, misc-in@workstation2 ...
>  qmgr set lb_out misc-sched

where `workstation1', `workstation2' and so on are all of your workstations which will run NQS requests. NQS will only send new requests to your workstations when they have room for them in their `misc-in' queues, and based on the load information from each machine (a machine with a low load is favoured over a machine with a high load).

Finally, on each of your workstations on which users can submit NQS requests, do :

>  qmgr create pipe misc destination = misc-sched@server

So, when a user submits a request locally to the queue `misc', it is sent to the queue `misc-sched' on the local server, which then sends it to the least loaded workstation in the cluster.

This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.

Example Generic NQS Configurations Version 3.5x.x

Abstract

Contents

Configuring Generic NQS

Introduction

Compute Servers

Introduction

Sample Configuration - Controlling CPU Usage

Clustered Workstations

Introduction

Sample Configuration - Clustered Workstations