The Network Queueing System (NQS)
Preliminary Draft , 4/29/92
Brent A. Kingsbury Sterling Software 1121 San Antonio Road, Palo Alto 94303Document copyright ©. All rights reserved.
Abstract
This paper describes the implementation of a networked, UNIX based
queueing system developed for a government contract with the National
Aeronautics and Space Administration (NASA). The system discussed
supports both batch and device requests, and provides the facilities
of remote queueing, request routing, remote status, queue access
controls, batch request resource quota limits, and remote output
return.
Contents
Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).
Origins
Origins
The invention of the Network Queueing System(NQS) was driven by the
need for a good UNIX batch and device queueing facility capable of
supporting such requests in a networked environment of UNIX
machines. More specifically, NQS was developed as part of an effort
aimed at tying together a diverse assortment of UNIX based machines
into a useful computational complex for the National Aeronautics and
Space Administration (NASA).
Today, this computational complex is officially known as the
Numerical Aerodynamic Simulator Processing System Network, otherwise
known as the NPSN. The assorted machines in this network are of
varying manufacture, and (as of the time of this writing) include
Digital Equipment Corporation VAXes, Silicon Graphics Irises, large
Amdahl 5840 mainframes, and a Cray Research Incorporated CRAY-2.
Each of the machines in the network runs its own vendor-supplied
version of the UNIX operating system, with appropriate kernel and
user-space extensions as necessary.
The presence of UNIX on all of these machines has made possible the
creation of a common user interface, so that despite the obvious
hardware differences, users can freely move among the different
machines of the NPSN without being confronted with entirely
different software environments. As part of this common user
interface, NQS has been implemented as a collection of user-space
programs providing the required batch and device queueing
capabilities for each machine in the network.
Design Goals
Design Goals
NQS was architected and written with the following design goals in
mind:
- Provide for the full support of both batch and device requests. A
batch request is defined as a shell script containing commands
not requiring the direct services of some physical device (other
than the CPU resource), that can be executed independently of any
user intervention by the invocation of an appropriate command
interpreter (e.g. /bin/csh, /bin/sh). In contrast, a device
request is defined as a set of independent instructions requiring
the direct services of a specific device for execution (e.g. a
line printer request).
- Support all of the resource quotas enforceable by the underlying
UNIX kernel implementation that are relevant to any particular
batch request, and its corresponding batch queue.
- Support the remote queueing and routing of batch and device
requests throughout the network of machines running NQS. This
means that some mechanism must exist to reliably transport batch
and device requests between distinct machines, even if one or
both of the machines involved crash repeatedly during the
transaction.
- Modularize all of the request scheduling algorithms so that the
NQS request schedulers can be easily modified on an installation
by installation basis, if necessary.
- Support queue access restrictions whereby the right to submit a
batch or device request to a particular queue can be controlled,
in the form of a user and group access list for any queue.
- Support networked output return, whereby the stdout and stderr
files of any batch request can be returned to a possibly remote
machine.
- Allow for the mapping of accounts across machine boundaries.
Thus, the account: winston on the machine called: Amelia might be
mapped to the account: chandra on the machine called: Orville.
- Provide a friendly mechanism whereby the NQS configuration on any
particular machine can be modified without having to resort to
the editing of obscure configuration files.
- Support status operations across the network so that a user on one
machine can obtain information relevant to NQS on another
machine, without requiring the user to log in on the target
remote machine.
- Provide a design for the future implementation of file staging,
whereby several files or file hierarchies can be staged in or out
of the machine that eventually executes a particular batch
request. For files being staged-in, this implies that a copy of
the file must be constructed on the execution machine, prior to
the execution of the batch request. Such files must then be
deleted upon the completion of the batch request. For files being
staged-out, this implies the actual movement of the file from the
execution machine, to the eventual destination machine.
Implementation Strategies
Implementation Strategies
Before dashing off to implement NQS completely from scratch, a long
look was taken at an already existing UNIX queueing system known as
the Multiple Device Queueing System (MDQS), as developed at the U.S.
Army Ballistic Research Laboratory. [1]
At one point, it was even decided that NQS could be implemented as
an enhanced version of MDQS, borrowing heavily from the original
MDQS source code. Theoretically at least, this strategy was
supposed to reduce the work and risk involved in building a
networked queueing system that would satisfy NASA's needs. This
thinking lasted long enough for an early design document to be
written detailing the modifications to be made under such a plan.
The plan however was later abandoned, when it was recognized that
the new code required for the proposed extensions exceeded the size
of the already existing MDQS code. Rather than heap unwieldy
extensions upon a frame never designed for such weight, NQS was
built completely from scratch. This new strategy allowed for the
construction of a new framework from which to hang new ideas, along
with many of the concepts included in MDQS. NQS is therefore
something old, and something new.
The NQS Landscape
Introduction
This section of the paper describes the general design and concepts
of NQS. It must be understood that NQS continues to be developed.
This paper discusses only the current state of affairs, with
occasional pointers referencing future areas of improvement.
The Queue And Request Model
In order to provide support for the two request types of batch and
device, NQS implements two distinctly different queue types, with
the respective type names of batch and device. Only batch queues
are allowed to accept and execute batch requests. Similarly, device
queues are only allowed to accept and execute device requests.
In addition to the first two queue types, a third queue type known
as a pipe queue exists to transport requests to other batch, device,
or pipe queues at possibly remote machine destinations. Readers
familiar with MDQS will note that the implementation of three
distinctly different queue types differs substantially from the MDQS
philosophy of having only one queue type.
BatchQueues
The first queue type implemented in NQS is called a batch queue. As
stated earlier, NQS batch queues are specifically implemented to run
only batch requests.
BatchQueueQuotaLimits
It is useful to be able to place limits on the amounts of different
resources that a batch request can consume during execution.
Towards that end, NQS batch queues have an associated set of
resource quota limits, that all other NQS queue types lack.
For a batch request to be queued in a particular batch queue, any
resource quota limits defined by the request must be less than or
equal to the corresponding limit as defined for the target batch
queue. If a batch request fails to specify a particular resource
limit value for which a limit is enforceable by the underlying UNIX
implementation, then the queued batch request inherits the
corresponding limit as defined for the target batch queue.
If a resource limit associated with a batch queue is later lowered
by a system administrator, then all requests residing in the queue
with a quota limit greater than the new corresponding quota limit,
are given a grandfather clause (and the adjusting system
administrator is notified accordingly). This example illustrates
the important principal enforced in NQS that the set of limits under
which a batch request is to run, are determined and frozen at the
time that the batch request is first queued in its destination batch
queue.
Spawning a Batch Request
The actual execution of a batch request is a somewhat complicated
affair. First, a batch request may require that the output files of
stderr and stdout be spooled, to a possibly remote machine
destination. In order to do this safely, a temporary version of the
output files is created in a protected location known to NQS.
Second, any additional environment variables optionally exported
with the request from the originating (and possibly remote) host,
are placed in the environment set for the shell that is about to be
execed.
Third, based on any request shell specifications and the shell
strategy policy at the local host, the proper shell (e.g. /bin/csh,
/bin/ksh, /bin/sh, etc.) is chosen (see the Batch Request Shell
Strategies section below). The chosen shell will be spawned as a
login shell, virtually indistinguishable from the shell that the
request owner would have gotten had they logged directly into the
execution machine.
Fourth, all of the resource limits as supported by the underlying
UNIX operating system implementation are applied to the new shell
process, as determined for the request at the time it was first
queued in the batch queue.
After the resource limits have been applied, the proper shell is
execed, and the shell script that defines the batch request is
actually executed. Upon completion, the spooled output files of
stderr and stdout are returned to their possibly remote machine
destinations.
Batch Queue Run Limits
To prevent the local host from being swamped with running batch
requests, some mechanism must exist to prevent too many batch
requests from running at any single given time. Currently, this
mechanism is quite simple, and is implemented by the presence of two
batch request run limits.
The first batch request run limit is global in nature, and places a
ceiling on the maximum number of batch requests allowed to execute
simultaneously on the local host.
The second batch request run limit is applied at the queue level,
and places a ceiling on the maximum number of batch requests allowed
to execute simultaneously in the containing batch queue.
When a batch request completes execution, the entire set of batch
queues is traversed in order of decreasing batch queue priority.
For each batch queue in the order traversed, any eligible batch
requests are spawned until either the queue run limit is reached, or
the global batch request run limit is reached. If upon discovering
that no more requests can be spawned for the batch queue under
scrutiny, and the total number of running batch requests is still
less than the global batch request run limit, then the next lower
priority batch queue is examined applying the same algorithm, until
all of the batch queues have been examined.
So far, this simple run limit scheme has sufficed as the only tool
to control the running batch request execution load. Since batch
requests can vary widely in their consumption of resources,
additional more sophisticated control mechanisms limiting the number
of simultaneously executing batch requests may be required in the
future.
Device Queues
Device queues represent the second queue type implemented in NQS.
Unlike their sibling batch queues, device queues do not have a set
of associated resource quota limits. Device queues do however have
a set of associated devices, which batch queues do not have.
Devices
For each device queue, there exists a set of one or more devices to
which requests entering the device queue can be sent for execution.
Each such device in turn has an associated server, which constitutes
the program that is always spawned to handle a request that is given
to the device for execution.
Any imaginable queue-to-device mapping can be configured. In
general, N device queues can be configured to feed M devices. The
only restriction placed on the value of N and M, is the obvious one
that their respective values be greater than or equal to zero (note
that it is possible for a device queue to exist without any devices
in its device set, though such a queue is useless). It is even
possible to have multiple device queues feeding the same device.
Spawning a Device Request
When an NQS device completes the task of handling a device request
or is found to be idle after a device request has been recently
queued, all of the device queues that feed the device are scanned
to determine if they have a queued request that can be handled by
the device. Like MDQS, an NQS device request can specify that a
particular device forms type be used to execute the request. For a
queued device request to be deemed eligible for execution by a
particular device, any forms specified by the request must match the
forms defined for the device. If the request does not specify a
forms type, then it is assumed that the request can be satisfied by
any device in the mapping set of the queue containing the request.
If two or more queues are found to contain a request that can be
executed by the newly idled device, then the first available request
from the device queue with the numerically higher queue priority is
chosen. If two or more such queues have the same queue priority,
then the queues are serviced in the classic round-robin fashion.
Device Queue Run Limits
Like a batch queue, some mechanism must exist to keep the number of
simultaneously running device requests from swamping the local host.
Unlike a batch queue however, device queues do not have an
associated run limit. Device queues are instead throttled by their
associated devices, which can be disabled as necessary by a system
administrator.
PipeQueues
Pipe queues represent the third queue type implemented in NQS, and
are responsible for routing and delivering requests to other
(possibly remote) queue destinations. Pipe queues derive their name
from their conceptual similarity to a pipeline, transporting
requests to other queue destinations.
Pipe Queues and Request Transport
Differing from both batch and device queues, pipe queues do not have
any associated quota limits or devices. Pipe queues do however have
a set of associated destinations to which they route and deliver
requests. Pipe queues also differ from their sibling batch and
device queues, in that they can accept both batch and device
requests.
With each pipe queue, there is an associated server that is spawned
to handle each request released from the queue for routing and
delivery. Ironically, the spawned instance of a pipe queue server
is called a pipe client, due to the use of the word server in the
context of a client/server network connection.
Thus, when a pipe queue request requires routing and delivery to
some destination of the pipe queue, the associated pipe queue server
is spawned as a pipe client, which must then route and deliver the
request to a destination. For each attempted remote destination,
this requires the creation of a network server process on the remote
host acting as an agent on behalf of the pipe queue request. The
choice of the term pipe client allows us to use the standard
client/server vocabulary when discussing the queueing and delivery
of a pipe queue request to a remote host.
SpawningaPipeRequest
When a pipe client is spawned to route and deliver a request, it is
given complete freedom to choose any destinations from the
destination set configured for the pipe queue, as possible
destinations for the request. If a selected destination does not
accept the request, then the pipe client is free to try another
destination for the request.
It is quite possible for a request to be rejected by all but one of
the possible destinations defined for a pipe queue. It is not
necessary to find many destinations willing to accept the request.
Only one accepting destination need exist for the pipe queue request
to be handled successfully.
It is also possible for every single destination of a pipe queue to
reject the request for reasons which are deemed permanent in nature
(e.g. all of the destination queues reside on remote machines where
the request owner does not have access to an account). In such
situations, the request is deleted, and mail is sent to the request
owner informing him or her of the demise of their request.
Requests can be rejected by a destination for a plethora of reasons,
including remote host failures, queue type disagreements with the
request type, lack of request owner account authorization at the
remote queue destination, insufficient queue space, or any one of a
hundred other reasons including the simple problem of the
destination queue being disabled (unable to accept any new
requests).
Some of the reasons for a destination rejection denote retriable
events (the effort to queue the request at the destination may
succeed if tried later). Examples of this kind of failure include
the destination queue being disabled (the system administrators at
the destination may enable it some time), and machine failures (the
destination machine is crashed, but might be rebooted in the
future).
Other destination rejection reasons are more permanent such as the
lack of proper account authorization at the remote destination, or
request and destination type disagreement (the request is a device
request, and the destination is a batch queue for instance).
Due to the tremendous number of ways in which a request can be
rejected by a queue destination, there is an equally tremendous
amount of logic incorporated into NQS that attempts to deal with the
situation. Some failures require that queue destinations be
disabled for some finite amount of time after which the destination
is considered retriable. All failures of the retriable variety
require that the pipe queue request be requeued and delayed for some
amount of time, after which an attempt is made to reroute the
request.
Even the successful case of a request being tentatively accepted by
a queue destination is fraught with complexity, since one or both
machines involved in the transaction may crash at any time.
In summary, pipe queues are both powerful and complex. Since the
pipe client configured with each pipe queue is allowed to choose
which destinations to try from the destination set, it is possible
to implement a crude but effective request class mechanism. The
pipe client can examine the request, and then choose an appropriate
destination queue that is more appropriate for the request. Thus,
large batch requests queued in a pipe queue can be delivered to
batch queues which may run only at night, while small batch
requests can be delivered to fast batch queues, which run with a
UNIX nice execution value that gives high compute priority, while
keeping a small upper limit on CPU time and maximum file size for
the request.
When a pipe queue is used as request class mechanism, it is wise to
define the target destination queues with the attribute of pipeonly,
which prevents any requests from being queued in such queues unless
the requests are queued from another pipe queue. In this way, the
request class policies implemented by the pipe queue and associated
server (pipe client) can be strictly enforced.
Pipe queues also help to ameliorate the unreliability of the
surrounding network and machines. Even if the proper destination
machine is down or unreachable, the pipe queue mechanism can requeue
the request and deliver it later, when the destination machine and
connecting network are restored to operation.
Pipe Queue Run Limits
To prevent pipe queues from flooding the host system with an overly
large number of simultaneously running pipe client processes, a
mechanism identical to that implemented for batch queues is
employed.
RequestStates
In the previous sections, we have described the general request and
queue type concepts implemented in NQS. This section descends the
staircase of detail, focusing on the different states that a request
can go through all the way from its initial creation, to its
ultimate execution.
A request residing within an NQS queue can be in one of several
states. First of all, the request may actually be running. This
request state exists for requests residing in batch and device
queues, and implies that the request is presently being executed.
The analogous request state for requests residing within a pipe
queue is termed routing, since the request is not actually running,
but is rather being routed and delivered to another queue
destination.
The second (and most common) request state, is what is termed the
queued state. A request in the queued state is completely ready to
enter the running or routing states.
The third request state describes the condition of where a request
is waiting for some finite time interval to pass, after which it
will enter one of the states of queued, running, or routing. This
request state is known as the waiting state.
The fourth request state is known as the arriving state. All
requests in the arriving state are in the process of being queued
from another (possibly remote) pipe queue. When completely received
they will enter one of the other states of waiting, queued, running,
or routing.
There are also three additional request states that are not
implemented in the current version of NQS. The first such state is
known as the holding state, and describes the condition of where an
operator, user, or both have applied a hold to the given request.
Such a request is frozen, and cannot exit the hold state unless all
holds applied by an operator or user have been released.
The second and third unimplemented request states concern the batch
request states of staging-in, and staging-out. These states will not
be implemented, unless the demand for the facility of file staging
increases, since it is already possible to use the remote file copy
commands in the shell script that constitutes a batch request, to
copy the requisite files to and from the execution machine for the
request. The advantage of implementing file staging is that NQS can
use a transaction mechanism to prevent the execution of a batch
request, until all of the input files have been staged-in to the
local host. In this way, crashes of remote machines cannot cause a
batch request to fail. Output files could be similarly staged.
More Landscaping
The previous major section described the queue and request model
implemented in NQS. This section of the paper describes the
implementation of queue access controls, batch request quota limits,
batch request shell strategies, request transaction states, the
networking implementation, account mapping across machine
boundaries, NQS configuration control, status operations, and the
possible future implementation of file staging.
Queue Access Controls
In any reasonable queueing system, it is necessary to provide for
the configuration of queue access restrictions. Without such
restrictions, there would be no way to prevent every user of the
machine from submitting their requests to the fastest queue with the
highest priority and resource limits on the machine. Thus, NQS
supports queue access restrictions.
For each queue, access may be either unrestricted or restricted. If
access is unrestricted, any request may enter the queue. If access
is restricted, then a request can only enter the queue if the
requester's login user-id or login group-id is defined in the access
set for the target queue.
All such access permissions are always defined relative to user and
group definitions present on the local host. The restriction that
all user and group references be relative to the local host is not a
problem, since request ownership mapping is performed whenever a
request is transported across a machine boundary (see the Account
Mapping section below).
Lastly, an additional queue access parameter known as pipeonly can
be defined for any queue. The presence of this queue access
attribute prevents requests from being directly placed within the
queue by one of the user commands used to submit an NQS request.
Queues with the pipeonly attribute can only accept requests queued
via another pipe queue. As outlined in the summary of the Spawning
a Pipe Request section, this attribute makes is possible to
implement a simple request execution class facility.
Batch Request Quota Limits
As mentioned previously, NQS supports an extensive set of batch
request resource quota limits. However, NQS cannot enforce a batch
request resource quota limit unless the underlying UNIX
implementation also supports the enforcement of the same limit.
Thus, the resource limit enforcement functions of NQS have been
implemented using an appropriate set of #ifdefs, allowing the system
maintainers to configure the resource limit functions as
appropriate.
It must be understood that NQS does not define the interface through
which errant batch requests will be informed of their attempts to
consume more of a given resource than is allocated to them. Upon
exceeding some limit types, some UNIX implementations send a signal
to the offending process. Other implementations may simply cause the
errant system call to fail, with errno being set as appropriate.
If a batch request specifies the enforcement of a quota limit that
is not enforceable at the execution machine, then the limit is
simply ignored, and the request is run anyway. It is also possible
to specify that no limit be given to the usage of a particular
resource for both a batch request and batch queue.
Lastly, the NQS implementation of batch request resource limits
allows each batch request to specify a warning limit value for UNIX
kernels that allow processes to be warned when they are getting
close to exceeding some hard quota limit. Once again as for hard
quota limits, the actual enforcement mechanism of warning limits is
up to the supporting UNIX kernel.
The full set of batch request resource quota limits recognized by
NQS falls into two principal categories. The first category
concerns only those limits applicable to each process of the process
family comprising the running request. This category of limits is
known collectively as the per-process limit set.
The second category concerns only those limits applicable to the
entire request. That is, the consumption of the limited resource as
consumed by all processes comprising the running batch request must
never exceed the given per-request limit.
The complete set of batch request quota limits supported by NQS is
listed below. Each limit is shown with its corresponding Qsub(1)
command syntax (Qsub(1) is the command used to submit an NQS batch
request). The use of the (P) and (R) description in the limit
definition indicates the per-process or per-request nature of the
limit:
> -lc limit - (P) corefile size limit.
> -ld limit [ , warn ] - (P) data segment size limit.
> -lf limit [ , warn ] - (P) file size limit.
> -lF limit [ , warn ] - (R) file space limit.
> -lm limit [ , warn ] - (P) memory size limit.
> -lM limit [ , warn ] - (R) memory space limit.
> -ln limit - (P) nice execution priority limit.
> -ls limit - (P) stack segment size limit.
> -lt limit [ , warn ] - (P) CPU time limit.
> -lT limit [ , warn ] - (R) CPU time limit.
> -lv limit [ , warn ] - (P) temporary file size limit.
> -lV limit [ , warn ] - (R) temporary file space limit.
> -lw limit - (P) working set limit.
The present implementation also includes provisions for the
additional limits of:
> -l6 limit - (R) tape drive device limit.
> -lP limit - (R) number of processors limit.
> -lq limit [ , warn ] - (P) Quick device file size limit.
> -lQ limit [ , warn ] - (R) Quick device file space limit.
These last limits are not presently supported, but are instead
reserved for future use. The last two future limits of -lq, and -lQ
are reserved for defining limits on the amount of fast (quick) file
storage to be allocated to a process of the running request, and to
the entire running request. An example of a fast file storage
resource can be found in the solid state disk (SSD) product that
Cray Research Incorporated supports with their CRAY-XMP series of
computers.
Batch Request Shell Strategy
The execution of a batch request requires the creation of a shell
process to interpret the shell script which defines the batch
request. On many UNIX systems, there is more than one shell
available (e.g. /bin/csh, /bin/ksh, /bin/sh). To deal with this
problem, NQS allows a shell pathname to be specified when a batch
request is first submitted.
If no particular shell is specified for the execution of the
request, then NQS must have some other means of deciding which shell
to use when spawning the request. The solution to this dilemma has
been to equip NQS with a batch request shell strategy , which can be
configured as necessary by the local system administrators.
The batch request shell strategy as configured on a particular
system, determines the shell to be used when executing a batch
request on the local host that fails to identify any specific shell
for its execution. Three such shell strategies can be configured
for NQS, and they are known by the names of fixed, free, and login.
A shell strategy of fixed causes the request to be run by the fixed
shell, the pathname of which is configured by the system
administrator. Thus, a particular NQS installation may be
configured with a fixed shell strategy where the default shell used
to execute all batch requests is defined as the Bourne shell.
A shell strategy of free simply causes the user's login shell (as
defined in the password file), to be execed). This shell is in turn
given a pathname to the batch request shell script, and it is the
user's login shell that actually decides which shell should be used
to interpret the script. The free shell strategy therefore runs the
batch request script exactly as would an interactive invocation of
the script, and is the default NQS shell strategy.
The third shell strategy of login simply causes the user's login
shell (as defined in the password file), to be the default shell
used to interpret the batch request shell script.
The strategies of fixed and login exist for host systems that are
short on available free processes. In these two strategies, a
single shell is execed, and that same shell is the shell that
executes all of the commands in the batch request script (barring
shell exec operations in any user startup files: .profile, .login,
.cshrc).
In every case however, the shell that is chosen to execute the batch
request is always spawned as a login shell, with all of the
environment variables and settings that the request owner would have
gotten, had they logged directly into the machine.
The shell strategy as configured for any particular host, can always
be determined by the NQS qlimit command.
Transactions
The accurate recording of request state information is a sometimes
complicated affair within NQS. The need to support some reliable
mechanism for the recording of request state is particularly
critical when an NQS request is in the process of being routed and
delivered to a remote queue destination. It is also necessary to
support some reliable mechanism for detecting interrupted executions
of batch and device requests upon system restart, so that they can
be restarted or aborted depending upon the user's wishes.
To do this, NQS uses the UNIX file system to record request state
information. On the surface, this use of the UNIX file system to
store request state information seems trivial. It's not.
The UNIX file system buffer cache implementation of lazy write I/O
makes the situation almost intolerable, since the update of request
state information must occur synchronously, for many of the request
state transitions. That is, there are several instances where the
state of a particular request must be accurately recorded on the
physical disk medium prior to continuing further with the
transaction, otherwise reliable transaction recovery is impossible.
The need for synchronous state updates becomes absolutely critical
when an NQS pipe client process is routing and delivering a request
to a remote queue destination on another machine. The algorithm
used to remotely queue an NQS request must allow for both machines
involved in the transaction to crash, without leaving things in an
unrecoverable state.
The algorithm to do this is implemented using a well known technique
called the two-phase commit protocol. While the algorithm is quite
interesting, space restrictions prohibit a full explanation of the
technique here, and the reader is referred to the text: Nested
Transactions: An Approach to Reliable Distributed Computing by
Moss.8 [2]
What will be described here however, is the unusual mechanism
implemented in the present version of NQS to get around the UNIX
file system buffer cache.
While AT&T system V release 2 UNIX supposedly supported an
undocumented flag in the open(2) system call forcing synchronous
write operations for the opened file descriptor, not all UNIX
implementations running on the various machines of the NPSN
supported this feature. However, an examination of the UNIX source
code as supplied by all of the different vendors showed that the
link(2) system call was synchronous, to the extent that the target
file inode had either been written to disk, or was scheduled to be
written to disk upon return from the system call.
Therefore, since the amount of transaction state information for
each request is quite small, NQS does something unbelievably
strange. It uses the modification time field of protected and
preallocated files to store transaction state information for each
request.
The update of transaction state information in this manner is
performed by setting the modification time of the appropriately
preallocated file (never created or deleted once NQS is installed),
making a link to the updated inode to force its writing to disk,
followed by an unlink to remove the temporary link used to force the
I/O operation. While the desired synchronous transaction state
update is accomplished using a mechanism that is not very fast or
efficient, it does have at least the virtue of being relatively
portable.
All of the code involved in setting and reading transaction state
for a request is isolated in a very small number of NQS source
modules. When a synchronous I/O mechanism becomes supported as a
general UNIX standard, then the implementation of NQS will be
changed to take advantage of it, discarding the atavistic technique
described here.
Networking Implementation
At present, all NQS network conversations are performed using the
Berkeley socket mechanism, as ported into the respective vendor
kernels or emulated by other means. The only connection type used
by NQS is that of a stream connection, in which NQS assumes that the
requisite bytes will be reliably transmitted to and from the server
in the order in which they were written, by the underlying network
software of the respective host systems. Any conversion to the use
of the streams mechanism as developed by AT&T should be extremely
straightforward.
In general, all NQS database information is always stored in the
form most appropriate for the local host. If it becomes necessary
to communicate information to another remote NQS host, then the
information is converted into a network format understood by all NQS
machines.
All network conversations performed by NQS are always done using the
classic client/server model, in which a client process creates a
connection to the remote machine where a server process is created
to act on behalf of the client process.
When this initial connection is created, some introductory
information is exchanged between the two processes. Regardless of
the transaction to be conducted, the format of the introduction is
always the same, in which certain key personality information is
transmitted by the client process to the remote server. Included as
part of this introductory dialogue, are the the client's identity in
the form of its real user-id and corresponding user name at the
client host, and the timezone in effect at the client's machine.
The parameters of real user-id and user name are both passed to the
server process, so that the server can map the identity of the
client to the appropriate account at the remote server machine.
Although one of these two parameters is sufficient, both are passed
so that the client mapping at the server machine can be performed by
either user-id or user name, depending upon the implementation at
the remote host.
The timezone for the client is also passed across so that future
implementations of NQS when performing remote status operations,
will properly display event times using the timezone of the client.
Lastly, the initial dialogue is the obvious place in which attempts
can be made by malevolent users to try to gain unauthorized entry to
a remote machine. At present, the only mechanism to prevent this,
is the difficulty in faking the NQS protocols, and the requirement
that all networking connections be made from privileged ports that
can only be gotten by privileged root processes.
AccountMapping
When a network connection is established between an NQS client
process and a remote NQS server process, an account mapping must be
performed so that the network server at the remote machine can take
on the proper identity attributes. This mapping is performed for all
network conversations. In particular, the transport of a batch or
device request requires that the ownership of the request be
adjusted as appropriate, since the user-id of the request owner is
not necessarily the same on all machines.
This mapping can be performed either by mapping the client's host
and user-id, or client's host and user name to the proper account.
In both cases though, the mapping must be done by the remote server
machine if there is to be any semblance of security.
The choice of whether to map user-id or user name values was the
subject of intense debate. In the beginning, the mapping was to
have been made by mapping user-ids. Near the very end of the
project, it was mandated that the mapping be performed by user name,
and not user-id.
The present implementation of NQS has therefore adopted the
defensive position that the server machine should make the decision
as to which algorithm to use when performing an account mapping.
Since both the user-id and user name of the client process are
available to the server process (see the Networking Implementation
section), the server can use either one when performing the account
mapping.
Beyond the problem of user-id versus user name mapping, an
additional problem is posed by the need to determine the identity of
the client's host, irrespective of the network interface upon which
a connection is made. In the environment of the NPSN, there are
often at least two different principal paths by which a machine can
be reached. The example paths typically include the interfaces of
ethernet and hyperchannel, and lead to the existence of entries in
the UNIX /etc/hosts file where the names of amelia-hy and amelia-ec
denote the two different paths of hyperchannel and ethernet to the
same machine known locally as amelia.
NQS however requires that it be able to tell without ambiguity that
connections coming from amelia-hy and amelia-ec denote connections
coming from the same machine, even though the entries in the
/etc/hosts file are separate.
To do this, it was necessary to create the notion of a machine-id, a
number that uniquely identifies a client machine, irrespective of
the path used to conduct the network conversation. Thus, an
additional mapping mechanism was created to map different client
host addresses to a single unique machine-id.
Like the user-id versus user name mapping controversy, this decision
was also caught in a maelstrom of controversy. When the dust finally
settled, the machine-id concept was still present in the NQS
implementation. Unfortunately, the storm of controversy swept away
the tools which were going to be used to administer the machine-id
mappings. Thus, the present implementation provides a rudimentary
program called nmapmgr which can be used to painfully create the
requisite machine-id mappings.
Someone receiving NQS source code for the first time would do well
to either implement their own machine-id mapping mechanism, or
polish the present mechanism.
Configuration Control
All of the setup and configuration of NQS is accomplished through
the use of a single configuration program known as the qmgr utility.
This program establishes a connection to the local NQS daemon, and
transmits message packets to perform the various configuration
commands implemented in NQS. This program is quite user friendly,
and provides an on-line help facility.
The use of an intelligent configuration program to setup and modify
NQS on the local machine provides many benefits, one of which is the
benefit of consistency. One cannot for example, add a
queue-to-device mapping for a non-existent device or queue.
When given a particular command such as adding a device to the
queue-to-device mapping set for some queue, the qmgr utility builds
a message update packet which is then sent to the local NQS daemon
for processing. The local NQS daemon then successfully performs the
update or returns an error code, which the qmgr program diagnoses.
In either case, the final outcome of the command is always displayed
to the user system administrator.
Status Operations
All of the obvious status operations are supported by NQS, including
device, request, queue, and limit queries. The latter status
operation is used to determine the set of batch request resource
limits supported by NQS on the local machine.
These status functions are supported by the respective NQS commands:
qdev, qstat, and qlimit, with qstat providing information about
previously queued requests and their containing queues.
Due to time constraints, the only status function which has been
networked is the qstat command. As time becomes available, this
situation will hopefully be corrected.
File Staging
Although file staging is not presently implemented by NQS, future
versions of NQS may implement such a facility. A thorough
examination of the NQS source code will reveal that provisions have
been made for this eventuality in both the request transaction state
mechanism, and the batch request data structures.
Conclusion
Conclusion
NQS is only another effort aimed at providing a more complete
queueing system for a collection of UNIX machines operating in a
networked environment.
As mentioned in the Implementation Strategies section, NQS was
designed and written after a careful examination of a previous UNIX
queueing system known as MDQS. It is hoped that others will now
build on NQS, as NQS has been built from ideas in MDQS.
References
- 1. Kingston, Douglas P. III, A Tour Through the Multi- Device
Queueing System, revised for MDQS 2.0, Ballistic Research
Laboratory, Army Armament Research And Development Command
(AARADCOM). September 12, 1983.
- 2. Moss, J. Elliot B., Nested Transactions: An Approach to
Reliable Distributed Computing, Cambridge, Massachusetts: The MIT
Press, 1985.
Automagically produced by KTEpaper, part of The Knowledge Tree Engine
|