This is www.gnqs.org, The Home Of Batch Processing


Home | Developers | Documents | Downloads | Mailing Lists | People | Support | Volunteer

About The GNQS Project | Contacts | Editorial | News | Future Plans | Year 2000


Future Plans

Introduction

This is my current roadmap for future Generic NQS development.  As with all such things, all plans here will probably change over the coming months, and you've no guarantees that any of this will actually happen at all.

Where We Are Now

The existing Generic NQS source tree is 15 years old.  I guess you could call it a legacy application.  You can certainly call it a mess.  With every release of Generic NQS, we run the risk of adding more bugs than we fix.

Generic NQS uses file formats, and network protocols, that are difficult (if not impossible) to expand.  The internal code does not isolate logic from infrastructure.  Nor does it completely isolate non-portable code.

Where We Are Going

My focus, for the coming year or so, is to re-engineer the Generic NQS source tree.  Backwards compatibility must be preserved at the command-line level, and at the networking level.  Internal network protocols and structures will be scrapped, and replaced with more flexible designs.  Existing file formats will either be preserved (unlikely in most cases) or replaced.  Tools will be written to ensure a totally painless migration.

Why?  First and foremost, I want Generic NQS to be more robust than it currently is.  I need Generic NQS to be easier to maintain.  Secondly, we all want to see new functionality for Generic NQS.  Parallel processing is one such area.

How We Are Going To Get There

I wanted to effectively scrap the existing Generic NQS source tree, and completely re-engineer it from scratch.  After a debate on NQS-Developers [more], I lost that argument.

So, the idea is to re-engineer Generic NQS in stages.

This is going to require parallel development.  We have tried (and failed) to do parallel development twice in the past.  Here's hoping it is a case of third time lucky.

Maintaining Production Releases

During the re-engineering work, I need to ensure that there is always a production-quality version of Generic NQS which new users can safely start using.  The new policy I outline here is designed to achieve that above all else.

Starting with Generic NQS 3.50.6, I will only accept patches which fix bugs or portability issues against the current production release.  If you find a bug which is in both the production and pre-production source trees, it will save me a lot of time if you send me a patch against the production source tree.  I will make sure the fixes also go into the pre-production source tree.

I will only accept patches for new functionality against the pre-production source tree.  The new functionality will not be patched into the existing production source tree.  This limits the overhead of parallel development, at the expense of delaying new functionality becoming available.

We will continue to make new production releases when the quality of the pre-production source tree has reached a satisfactory state.  The list of milestones below is designed to make sure that we make more production releases than we ever have before.  This will (hopefully) ensure that we don't find ourselves once again scrapping a pre-production source tree because end-users need the new features, but the changes to be made will take too long.

Milestones

This is my outline of what order things need to be done in.  I expect that this document will become much more detailed as we get stuck in.

  • Generic NQS v3.52 Production

Strip out and replace the existing SETUP software.  
New SETUP software must offer a autoconf-like configure script, must produce usable Makefiles for experienced users & developers, and must be able to create a local binary package (e.g. RPM, .pkg file) for operating systems.

My intention is to make sure that Generic NQS is as easy to install for all my users as soon as possible.  Hopefully it will make it possible for more users to download and play with the pre-releases, and so help ensure that the testing of Generic NQS is better than it otherwise would be.

All compile-time features to become runtime features.
Ensure all choices which have to be made at compile-time can instead be made through editing a new /etc/nqs.conf configuration file at runtime.

NOTE: This is a temporary solution, and will require replacing at a later stage.

  • Generic NQS v3.53 Production

Isolate all file formats behind a new file-managing API.
Before existing file formats can be replaced, they have to be hidden behind a suitable API.  The new API will be designed so that eventually Generic NQS can store and retrieve data from other services, such as LDAP servers or SQL database servers.
NOTE: This is a large piece of work, and will require changes to the very heart of Generic NQS.

  • Generic NQS v3.54 Production

Isolate all internal communications behind a new IPC API.
Before the existing NQS daemons can be replaced with a new structure, all communications between the existing NQS daemons has to be placed behind a new API.
As part of this work, a new NQS daemon (the dispatcher) will be added.  Its role is to manage IPC between all other NQS daemons.
NOTE: This is a large piece of work, and will require changes to the very heart of Generic NQS.

  • Generic NQS v3.55 Production

Move all file-management code out into a separate daemon.
With the NQS dispatcher in place, it will now be possible to start moving functionality out of the NQSdaemon and into separate processes.  Eventually we will be able to switch off the NQSdaemon.

The first stage of this is to move all file-management code out of all of the existing daemons into a separate daemon.  The daemon can then be safely modified later on (without affecting any other part of NQS) to support LDAP, SQL databases, and any other data source.

  • Generic NQS v3.56 Production

Implement a new 'queue-manager' daemon.
At the moment, the Generic NQS scheduler is tightly coupled with the basic facilities available to setup and maintain queues.  A new daemon will be introduced which exists purely to manage the existence of the queues.  The existing code inside the NQSdaemon will then become more of a pure scheduler.

This means that we can then clean up all the administration utilities, so that items in queues can be manipulated (and finally deleted) without any hassle whatsoever.

  • Generic NQS v3.57 Production

Implement a new 'NQS-Protocol v1' network daemon.
NQS-Protocol v1 is the network protocol currently used by NQS to talk to other NQS systems, and to any commercial system which inter-operates with NQS.  At the moment, NQSdaemon is structured around serving this protocol.

All support for servicing this protocol will be moved out into a separate daemon.  (Actually, I plan on totally re-implementing this protocol, because of historical bugs in our existing implementation which I've failed to track down and remove over the years).

At the same time, changes will need to be made to what's left of NQSdaemon to support the information flow the new network daemon will require.

I think it goes without saying that this is, most definitely, the single riskiest piece of work on the plan.

However, once this is done, we can add daemons to support other network protocols (DQS, PBS, NQS-Protocol v2) without running the risk of breaking the existing support.

  • Generic NQS v3.58 Production

Separate out the batch spawning from the scheduling code.
The creation of a batch process should be triggered by the scheduler, but actually managed by a separate daemon.

Once the code to manage batch processes has been separated out, the daemon can then be modified to monitor running processes and stomp on them if they abuse their resource limits.

We will also introduce a new type of queue, which we are calling the 'ghost queue'.  The basic idea is that each scheduler will have a ghost queue for every batch or pipe queue on other machines in the cluster.  Each ghost queue will contain all the information in the real queue.  This means that the scheduler for the very first time will know what is going on cluster-wide, and will be able to provide the fine-grained control currently supported by commercial systems.

  • Generic NQS v3.59 Production

Implement a 'managed objects' environment.
I want to be able to literately 'plug' new features in, to be able to write a new 'object', register it with Generic NQS, and then away we go.

The emphasis of v3.59 will be to take our new architectured code, and add support for plugging in new features throughout the source base.

  • Generic NQS v3.60 Production

This will be v3.59 with bug fixes.  I doubt I'll be able to enforce a complete feature-freeze, but the priority for v3.60 is to ensure that, after the restructuring, Generic NQS v3.60 is robust and stable.

During this work, platform-specific code will be moved out into a new libsal2 library as and when it is uncovered.

Beyond GNQS v3.60

Generic NQS v3.61 and onwards will seek to add support for new features not possible under GNQS v3.50.  More details will be agreed later, but these features could include:

  • Management of parallel jobs (PVM, MPI)
  • Port to Win32 platform
  • Inter-operability with PBS
  • Inter-operability with DQS
  • Scripting language to allow easy prototyping of new features
 

This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.