This is www.gnqs.org, The Home Of Batch Processing


Home | Developers | Documents | Downloads | Mailing Lists | People | Support | Volunteer


Batch Processing Systems In The UK HE Community

Academic Computing Services , University Of Sheffield

Stuart Herbert (S.Herbert@Sheffield.ac.uk)

Document copyright ©. All rights reserved.


Abstract

JISC, as part of its New Technologies Initiative, is funding a one year post to evaluate, supply and support batch processing systems to the UK HE community. As part of the evaluation, a survey of UK HE sites was conducted, and this report is the result of that survey.


Contents

Click here for a plain-text version of this paper. Click here for a copy of this document in Microsoft RTF format, suitable for printing (if available).


Introduction


Surveying UK HE Sites

If we are to supply, and support, UK HE sites, then it is first important to learn of their current situation, and their plans for the future in this area.

The questionaire [1] sent out to UK HE sites consisted of three sections.

  • The first section asked about what they currently use, and what they might consider using in the future.

  • The second section asked about their requirements of a batch processing system.

  • The final section asked for any other comments, and whether they wish to be kept in touch or not.

From the replies, coupled with the on-going evaluation of freeware/public domain systems, we can select a single system which we will supply, and support, to the UK HE community.


Results


Replies

The table below lists the sites which have replied, and indicates whether they have any future interest in this work. Those who have expressed an interest will be automatically added to the `announce' mailing list on Mailbase once it is established.


Table 1 : Replies Received

> ------------------------------------------------------------------
> Site		| Contact				| Interest?
> ------------------------------------------------------------------
> 		|					|
> Birmingham	| J.P.Newbury@bham.ac.uk		| Yes
> Bristol	| Bob.Walker@bristol.ac.uk		| Yes
> Brunel	| Alan.Broadbent@brunel.ac.uk		| Yes
> Cardiff	| Oborne@taff.cardiff.ac.uk		| Yes
> CIHE		| DIHarrison@cardiff-institute.ac.uk	| No
> Cranfield	| r.goodfellow@cranfield.ac.uk		| Yes
>	        | P.Lister@cranfield.ac.uk		| Yes
> De-Montfort	| lymerb@de-montfort.ac.uk		| No
> Durham	| K.G.Middleton@durham.ac.uk		| Yes
> Exeter	| P.A.Chambers@exeter.ac.uk		| Yes
> Glasgow	| billm@aero.gla.ac.uk			| Yes
> Hull		| R.A.Reese@computer-centre.hull.ac.uk	| Yes
> Keele		| cca01@keele.ac.uk			| No
> Lancaster	| J.Boreham@lancaster.ac.uk		| Yes
> Leeds		| ecl6jasb@cif.leeds.ac.uk		| Yes
> Leicester(1)	| dgs1@leicester.ac.uk			| Yes
> Leicester(2)	| nmw@ion.le.ac.uk			| Yes
> Liverpool	| P.D.Mallinson@liverpool.ac.uk		| Yes
> MCC		| zzassdh@cs6400.mcc.ac.uk		| Yes
>		| LeBlanc@mcc.ac.uk			| Yes
> Nene		| alan.broadaway@nene.ac.uk		| No
> Oxford-Brookes| ccoghill@brookes.ac.uk		| Yes
> Reading	| A.C.R.Thornton@reading.ac.uk		| Yes
> Rutherford	| igbf@osf01.cc.rutherford.ac.uk	| Yes
> Salford	| D.Lomas@ais.salford.ac.uk		| Yes
> Sheffield	| C.Cartledge@sheffield.ac.uk		| Yes
> Soton		| idh@soton.ac.uk			| Yes
> SouthamptonI  | miller_t@southampton-institute.ac.uk	| No
> St Andrews	| phrrngtn@cs.st-andrews.ac.uk		| Yes
> Staffordshire	| cstdeo@bs41.staffs.ac.uk		| No
> Strathclyde	| j.gentles@strath.ac.uk		| Yes
> Surrey	| cus1vh@surrey.ac.uk			| Yes
> UEA		| K.woods@east-anglia.ac.uk		| Yes
> UCLAN		| E.H.Smith@prime1.central-lancashire.  | No
>		|   ac.uk				|
> UKC		| D.A.Clear@ukc.ac.uk			| Yes
> Ulster	| R.Wilson@ulst.ac.uk			| Yes
> ULCC		| R.Pockney@ulcc.ac.uk			| Yes
> Westminster	| trevor@westminster.ac.uk		| No
> ------------------------------------------------------------------
> Total : 36 replies, 28 to be kept informed, 8 not interested
> ------------------------------------------------------------------


Comments

  • Birmingham's reply is on behalf of their Academic Computing Services.

  • I have contacted Cardiff, asking them to reconsider their position, as they use Monsanto NQS, which is no longer developed.

  • De-Montfort's reply comes from the Management and Admin Systems department.

  • Glasgow's reply is on behalf of a JISC New Technologies Initiative project under Professor Richards.

  • Leeds' reply is on belhalf of their University Computing Service.

  • Leicester(1) is a reply on behalf of the site, while Leicester(2) is on behalf of a research group.

  • St-Andrew's reply is on behalf of the individual concerned.

  • ULCC's reply is on behalf of the Systems Group there.

  • All other replies are on behalf of the site.


Analysis

  • 36 replies is approximately 18% of all UK HE sites.

  • We estimate that no more than 50% of all UK HE sites (80 out of 190) are even remotely likely to have the research commitments which would require batch processing software to be installed locally.


Conclusions

  • The work to be carried out at the University of Sheffield is of interest to 27 other sites, and is therefore worth pursuing.


Currently In Use

The table below gives an indication of which batch processing software is in use at which sites.


Table 2 : Software Used

> ------------------------------------------------------------------
> Site		| Mon-NQS | Ste-NQS | DQS | 4D-NQS  | Other | None
> ------------------------------------------------------------------
> 		|	  |	    |	  |	    |	    |
> Birmingham	|         |         |     |         |       |   x
> Bristol	|         |    x    |     |    x    |       |
> Brunel	|         |         |  x  |         |       |
> Cardiff      *|    x    |         |     |         |       |
> CIHE	       *|	  |	    |     |	    |       |   x
> Cranfield     |         |         |  x  |         |   x   |
> De-Montfort  *|         |         |     |         |       |   x
> Durham	|	  |         |     |         |   x   |
> Exeter	|         |    x    |     |    x    |       |
> Glasgow	|         |         |  x  |         |       |
> Hull		|         |         |     |         |       |   x
> Keele	       *|         |         |     |         |       |   x
> Lancaster	|         |         |  x  |         |       |
> Leeds		|         |         |     |    x    |       |
> Leicester(1)	|	  |	    |     |    x    |       |
> Leicester(2)	|         |         |     |    x    |       |
> Liverpool	|	  |	    |  x  |         |   x   |
> MCC		|         |         |     |         |   x   |
> Nene	       *|         |         |     |         |       |   x
> Oxford-Brookes|         |         |     |         |       |   x
> Reading	|	  |	    |     |         |       |   x
> Rutherford	|         |         |  x  |         |   x   |
> Salford	|         |         |     |         |       |   x
> Sheffield	|	  |         |     |    x    |       |
> Soton		|         |         |     |         |   x   |
> SouthamptonI *|	  |         |     |         |   x   |
> St-Andrews	|         |         |     |         |       |   x
> Staffordshire*|	  |	    |	  |	    |   x   |
> Strathclyde	|         |         |     |         |       |   x
> Surrey	|         |         |     |         |   x   |
> UCLAN	       *|	  |         |     |         |       |   x
> UEA		|	  |         |  x  |         |       |
> UKC		|         |         |     |         |   x   |
> ULCC		|         |         |     |         |   x   |
> Ulster	|         |         |     |         |   x   |
> Westminster  *|         |         |     |         |       |   x
> ------------------------------------------------------------------
> Total      34 |       1 |       2 |   7 |       6 |    12 |    13
> ------------------------------------------------------------------


Comments

(*) indicates sites with no further interest in this project.

  • Cranfield use VMS as well as DQS.

  • Durham use DNQS, the predecessor to DQS.

  • Exeter are running Sterling NQS to perform fault diagnosis on behalf of SGI.

  • Glasgow have only just started using DQS.

  • Liverpool are writing an in-house replacement for DQS.

  • MCC use CERN-NQS.

  • Reading plan to introduce MCC NQS, but other tasks have taken a higher priority to date.

  • Rutherford use CERN-NQS, NQE, and Cray-NQS, as well as DQS.

  • Salford plan to introduce a batch processing system.

  • Sotton are currently purchasing GENIAS Codine, and possibly IBM Loadleveler in the near future.

  • Southampton Institute run a VMS-based batch system.

  • Staffordshire use VMS.

  • Strathclyde have recently purchased suitable hardware and are interested in installing a batch system.

  • Surrey use batchd, an apparent NQS for Concentrix-2800 machines.

  • UEA are looking for a replacement for DQS.

  • UKC use batchd by Ken Lalonde, Computer Science, Uni. of Toronto, with in-house improvements.

  • ULCC use Convex/CXBatch, a derivative of Sterling-NQS.

  • Ulster run a VMS-based batch system.


Analysis

  • 11 sites use some form of NQS software.

  • 7 sites use some form of DQS software; one site is a new installation, two sites are actively replacing DQS, and one site runs DQS's immediate ancestor.

  • Of the remaining 5 systems used, three are running on VMS, one is using batchd, while the other is under development at Liverpool.

  • Of the 12 sites not running any service at present, 6 have no further interest in this work. Two of the VMS sites also has no further interest.

  • Of the 28 sites who have further interest, 11 run NQS, 7 use DQS, one runs VMS, one runs batchd, one runs Codine, and the other six do not yet run a batch processing system.

  • The sites running 4D/NQS are doing so without any support from SGI. This means that a future version of IRIX is likely to break the binaries. These sites, therefore, are possibly in urgent need of a replacement.


Conclusions

  • NQS has more favour than DQS amongst the population sample, by a 2:1 margin.


Platforms

The table below is a list of platforms which are used by sites, or which will be used in the future, for batch processing.


Table 3 : Platforms To Be Supported

> ------------------------------------------------------------------
> Site		| SunOS | Sol | IRIX | OSF | HPUX | Other
> ------------------------------------------------------------------
>               |       |     |      |     |      |
> Bristol	|       |     |  x   |     |  x   |
> Brunel	|   x   |  x  |      |     |      |
> Cardiff	|       |     |      |  x  |      | Ultrix
> Cranfield	|       |     |      |  x  |      | Ultrix
> Durham	|       |  x  |      |     |  x   |
> Exeter	|	|     |  x   |     |      |
> Glasgow	|       |     |  x   |     |      |
> Lancaster	|       |     |      |     |  x   |
> Leeds		|       |     |  x   |     |      |
> Leicester(1)	|	|     |  x   |     |      |
> Leicester(2)	|       |     |  x   |     |      |
> Liverpool	|   x   |  x  |  x   |     |      |
> MCC		|       |  x  |      |     |  x   | AIX, Fujitsu
> Reading	|   x   |  x  |      |     |      |
> Rutherford	|       |     |      |  x  |  x   | AIX, Cray
> Sheffield	|       |     |  x   |     |      |
> Sotton	|   x   |  x  |  x   |     |      | AIX
> Surrey	|       |     |      |     |      | Alliant
> UEA		| 	|     |      |  x  |	  |
> UKC		|   x   |  x  |      |     |      |
> ULCC		|       |     |      |     |      | Convex
> ------------------------------------------------------------------
> Total	     21	|     5 |   7 |    9 |   4 |    5 |               9
> ------------------------------------------------------------------


Comments

  • As this project is restricted to UNIX, VMS systems aren't included here at all.


Analysis

  • A total of eleven platforms are considered by the population sample to be suitable for batch processing.

  • Solaris 2 and IRIX (SYSV-compatible platforms) make up over 1/3 of the platforms.


Conclusions

  • The product of this project must be highly portable.

  • Where possible, portability should concentrate on supporting standard API's (such as SYSV or POSIX) to increase the portability.


Attitude To Features

The table below is an indication of the importance attached to different features at each site. This data is NOT an indication of the attitude of a site towards each feature, merely how important that feature is (or isn't) to that particular site and their facilities.


Table 4 : Feature Considerations

> ----------------------------------------------
> Site		| A | B | C | D | E | F | G | H
> ----------------------------------------------
>		|   |   |   |   |   |   |   |  
> Birmingham	| 1 | 2 | 2 | 3 | 3 |   | 3 | 3
> Bristol	| 2 | 3 | x | x | 1 | 3 | 3 | 3
> Brunel	| 2 | 3 | 2 | 2 | 3 | 3 | x | 3
> Cardiff	| 3 | 3 | 2 | 2 | 1 | 2 | 1 | 1
> Cranfield	| 2 | 3 | 2 | 3 | x | 2 | x | x
> De-Montfort	| 3 | 3 | 3 | 3 |   |   | 2 | 2
> Durham	|   | 3 | 2 | 3 | x | 3 | 1 | 1
> Exeter	| x | x | x | x | 3 | x | 1 | 1
> Glasgow	| x | 2 | 3 | 2 | 3 | 4 | 2 | 2
> Hull		| 3 | 3 | 2 | 3 | 2 | 1 | 1 | 1
> Lancaster	| 2 | 2 | x | x | 1 | 3 | 1 | 1
> Leeds		| 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2
> Leicester(1)	| x | 3 | 3 | 3 | 2 | 2 | 1 | 1
> Leicester(2)	| 1 | 1 | 2 | 2 | 1 | 2 | 1 | 2
> Liverpool	| 2 | 3 | 2 | 2 | 3 | 3 | 2 | 2
> MCC		| 3 | 3 | 2 | 3 | 3 | 3 | 1 | 2
> Reading	| 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3
> Rutherford	| 2 | 3 | 2 | 3 | 2 | 3 | 1 | 1
> Salford	| 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1
> Sheffield	| 2 | 2 | 2 | 3 | 1 | x | 1 | 3
> Sotton        | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
> SouthamptonI	| 1 | 2 | 1 | 1 | 1 | 4 | 1 | 1
> St. Andrews	|   | 3 |   |   | 2 | 3 |   | 2
> Strathclyde	| 2 | 2 | 2 | 3 | 2 | 2 | 1 | 3
> Surrey	| 2 | 3 | 3 | 3 | 3 | 2 | 2 | 2
> UEA		| 2 | 3 | 2 | 3 | 3 | 3 | 2 | 2
> UKC		| 1 | 1 | 2 | 3 | 3 | 1 | 0 | 1
> ULCC		| 2 | 3 | 2 | 3 | 4 | 4 | 4 | 1
> Ulster	| 2 | 2 | 1 | 3 | 3 | 3 | 1 | 2
> ----------------------------------------------
> Total		| 46| 68| 52| 64| 56| 60| 33| 50
> ----------------------------------------------


Comments

> ------------------------------------------------------------------
> Feature	| Wording In The Questionaire
> ------------------------------------------------------------------
>		|
>      A	| Ease of installation
>      B	| Ease of configuration
>      C	| Admin training material
>      D	| User training material
>      E	| Multi-processor support
>      F	| Support for more than one machine to run jobs on
>      G	| GUI-based tools for admin
>      H	| GUI-based tools for users
> ------------------------------------------------------------------
> ------------------------------------------------------------------
> Rating	| Meaning
> ------------------------------------------------------------------
>		|
>   (blank)	| Reply did not state an opinion
>      0        | You are joking, aren't you?
>      1	| Not important
>      2	| Important
>      3        | Very Important
>      4	| Essential (*)
>      x	| Okay as things stand
> ------------------------------------------------------------------
> (*) This rating was not included in the wording of the questionaire
>     when it was distributed, but was occasionally used in replies.
> ------------------------------------------------------------------
  • Sites which did not express any opinions at all are not included in the table.


Analysis

  • Ease of configuration, support for distributed processing, and user documentation are by far the most important areas to consider.

  • Good documentation is considered to be far more important than improving the user-friendliness of the software. This attitude may be due to UNIX's traditional reliance on command-line interfaces.


Conclusions

  • Sites are concerned mainly with the core functionality, and flexibility. So-called `value added features' are considered important, but they are definitely considered to be a lower priority.


Comments

The questionaire included a section asking respondants if they had any other comments. In addition, many replies were peppered with comments throughout - a welcome event.


Why Don't You Use Any Batch Processing Systems?

  • No (user) demand/cost/cron suffices for system administration.

  • Other items taking a higher priority, I'd class it as medium priority.

  • While it is certainly our intention to implement this, to date other activities have had to take higher priority.

  • Low priority at time of introducing service. Little experience then of Unix as a user service.

  • LOW DEMAND.

  • We have no use for batch processing systems.

  • Four sites stated no demand.


How Happy Are You With Your Existing Batch Processing System?

  • Happy. Would prefer to be able to pipe batch queues without having to use .rhosts files for submission/output return.

  • Unhappy - N thousand lines of uncommented code !

  • Happy. Easy to set up and it works OK. It fulflls a small demand for batch processing which would be uneconomic to fill with a system which had cost money.

  • Reasonably happy. It works OK but is difficult to maintain - directory structure a bit complex, and documentation not very good.

  • Fairly happy.

  • Happy - it provides adequate functionality for our needs.

  • No. NQS offers very primative scheduling. On the Cray we have had to produce software to shuffle NQS queues to offer better scheduling.

  • Relatively happy. It certainly gives us more than the vendors would have supplied. Availability of source has also enabled us to interface the system with our scheduling and control system.

  • Happy - does the job.

  • Reasonably happy.

  • Happy - OpenVMS - OK, DQS - satisfactory

  • Unhappy - no longer supported by the supplier - will likely stop working with some future OS version release ...

  • Happy. It is flexible enough to be used on our distributed network. Ability to schedule job to least loaded machine. Jobs can therefore be sent to a group and be guarenteed to run on least loaded machine.

  • Unhappy - it is not very flexible and doesn't give the users much control.

  • We are happy with SGI 4D-NQS, but they no longer support it. SGI recommend Sterling NQS, but this does not support ``non-degrading priority'' under IRIX and we find this a main requirement.

  • The problem that we have is that the batch system runs jobs on a first-come-first-served basis. Thus, if a user submits a large number of processes (each of which might run for up to 72 hours on up to 8 processors on our Power Challenger), then subsequent submissions have to wait until all the original jobs are complete.

  • We are looking to replace DQS v2.1.5 with v3.1 to tackle some of these issues, without UEA fixes. We're also looking at Codine and LSF. Athena and Kerberos support are currently required, but this is under review. We need support for PVM working.


How Much Would You Be Prepared To Pay In Total For A Commercial System?

  • If we had to pay, we would investigate public domain or do without.

  • Nothing - we're writing our own.

  • Any reasonable amount.

  • Can't really answer this. We are a National Supercomputing Centre. We obtain the batch processing system as part of a system procurement.

  • I anticipate that the `free'-to-us NQS system via MCC should be good enough. At present, I don't believe that we would feel able to spend much money on a batch system.

  • This would depend on the level of service we were aiming at. In our current position of offering a dedicated batch machine we would consider a figure around 5000 pounds.

  • Such packages are normally utilised on research projects with pretty tight budgets, and the cost of a commercial package is quite prohibitive.

  • Can not say.

  • Where we wish to standarise on something campus wide, public-domain/freeware codes are very cost effective vis a vis a vendor site license.

  • Few hundred pounds maybe.

  • 1,000 if it was really good.

  • < 1,000 for a multi-platform site-licence

  • 5,000 per year, depending on suitability and maintenance.


Comments On Requirements

  • Ease of installation is not important because it is a rare task. Ease of configuration is important, because configurations change to reflect changes in policy (although this rarely happens). Although admin training material is important, it may cause confusion because it isn't tailored to site policy and experience. User training material is very important because it is needed by far more people, who are far less experienced. GUI-based tools for users are expected nowadays.

  • Autoconfiguration may be desirable. Important that the users understand how jobs are being handled, but they shouldn't need to worry about the actual implementation and how, or if, jobs are migrated across hosts of processors. Anything that makes users lives simpler makes admin and support a lot simpler.

  • GUI tools may help shift users from interactive to batch work.


Any Other Comments

  • The main considerations are the amount of control over when queues run and what resource limits each queue places on the jobs it runs. This does not need to be easy to configure since it is generally only done once per queue.

  • Should not introduce any security weakness which needs to be continually monitored.

  • NQS is +very much+ better than MDQS which was the batch system we used previously. While MDQS was better than nothing, it was too close to call ;-)

  • I was interested to learn of this JISC funded project. ULCC has been providing batch services to London and the UK for 26 years using Control Data software, MVS, Cray software and latterly, Unix based NQS. Functionality has steadily decreased over the years. We now provide a far less flexible batch service than we used to. One of our major gripes against NQS is its FIFO scheduling. This is not really acceptable when you have 1000's of users competing for batch resources. In the past we adopted the scheduling algorithms developed by John Larmouth but have not been able to do this with the Convex NQS derivative.

  • GUI based tools themselves are not important, but ease of use and security are important. I've seen some awful GUIs - although I've not looked at batch processing tools.

  • `Multi-processor' support should refer to one computer with several processors and also to several separate computers/workstations.

  • Here are some things I feel are important or useful in a batch system and are lacking in current batch systems : Load balancing between machines, set user/group limits for an accounting period, allow scheduling (FIFO is not good enough), restrict number/size of jobs on a particular machine, allow users to have different jobs accounted to different groups.

  • Would like access to the source code.

  • Few users do use Unix at and cron commands to run disconnected jobs.

  • We are running the SGI 4D/NQS batch system on a SGI Challenge/XL. The system fulfills its role. However, we would welcome a few extensions to this batch processing system.

    The present version of the 4D/NQS batch system (2.0) doesn't include any job scheduler or a facility allowing easy prioritisation of requests according to a specified algorithm (eg Larmouth algorithm).

    Another missing feature from the 4D/NQS system is a conditional execution of starting a new job. It would be desirable to have a facility which allows a user to start a new job only if certain conditions are fulfilled.

    It would also be useful to have a summary of statistics for jobs processed by a batch processing system.

    We are not prepared to pay the full commercial price for a better product, although we could consider paying a small amount for the above useful features.

  • We are probably about to get a Cray, in which case we will probably also use Cray's particular flavour of NQS. Even with that, however, it would be nice if jobs could be submitted and managed via DQS to provide a coherent view of the whole system to users.

    I also experimented with Condor, now IBM Loadleveler, but this was not going to fly in our environment. Condor has a nice concept of checkpointing and process migration, but the overhead of this makes it infeasible - the checkpointing code must be linked into the user's program which makes things much more difficult for the user, and limits the facility to home-brewed code.

    The best way to make life easier for a sysadmin is not pretty icons, but the facility to be able to drive any part of the system with shell or PERL scripts. The best way to make life easy for the user is to provide a system which needs minimal intervention - often the way to do this is for a sysadmin to provide scripts which do what the user needs automatically. Anything which can be scripted can also use TCL/TK, so it should be possible to build GUIs which are precidesly tailored to the system's requirements easily in this way.

    I have a definite demand from users at Cranfield for PVM for parallelised tasks. DQS as it currently stands doesn't handle this elegantly, but I have a fair idea on how it can be improved to do so. I also have in mind support for ``semi-interactive'' jobs, which would use essentially the same mechanism. This is quite a common class of job at Cranfield. The typical ``batch'' job performs no user output which actually needs to be read for the job to terminate successfully, but ``semi-interactive'' covers code which prints out a line of text every few sections; the user looks at this from time to time, and decides if the results look hopeful - if not, the job is terminated by hand. This practice was fine in the days of dumb terminals, but not with networked public workstations and PCs. These jobs could be handled by e.g. expect scripts, but most of the people using these programs are not generally inclined to use some new-fangled script language just to keep a sysadmin happy. My idea is that a queued job, once started can generate standard output whtich the submitting process can feed back directly to the user much like rsh, but with the difference that a user can log out and then reconnect to the jobs stdout/stdin/stderr at a later date from a different client.

    I also definitely want more interaction between interactive, semi-interactive and batch processing - such things as suspedning (or reprioritisin) a batch queue when a user logs in, or just giving the user about to log in to a workstation information about what jobs are being processed there, giving her the opportunity to use the workstation next door if it is free. Also, a good batch queueing system permits the sysadmin to have much more control over runaway processes, and e.g. when a user logs out, kills processes belonging to her unless they have been ``legally'' batched.

    Although Condor proved impractical for us (and later became commercialised anyway) - I am still interested in the idea of checkpointing, if not process migration, to cope with automatic restarts from the last checkpoint if a system reboots.

  • All of the batch systems that I've looked at for Unix systems are compromised by the poor control of resources that is provided by Unix. Limits (CPU time, core size etc) are implemented on a per process basis, whilst in reality Unix encourages/forces many real user jobs to consist of multiple processes. The batch systems are unable to take full account of spawed processes. For example if a job script file spawns a number of background processes and then exits it regards the job as having completed even thoough it may have started one of more still active processes (DQS attempts to get round this by making a job script a process-group leader when it exits, however a user can easily get around this by having a spawned process change process-group). As a result using a batch system as a means of controlling resource allocation is compromised.

  • On a machine which was built as a high performance multi-user machine I would expect a good batch system as part of the operating system (some hope from most suppliers I know).

  • It appears that Sterling NQS varies from one platform to another. Some features do not work on all platforms.

  • We (Strathclyde) would be prepared to act as a beta site for testing a batch system for Suns running Solaris 2.3 or 2.4. on our uor SS1000 and Sun clusters of Sparc1s.


Analysis

  • Batch processing systems are seen as a low priority issue, with sites unwilling or unable to allocate significant resources in terms of finance or manhours.

  • Everyone has different requirements from batch processing systems, and no one, single system currently satisfies any of these needs.

  • Generally, people want more control over jobs, a simplification of the way jobs appear to migrate, and mechanisms for supporting third-party products.


Conclusions

  • Whichever system we choose to adopt, we must be prepared to perform major improvements before it will begin to satisfy the UK HE community.


Conclusions


Conclusion

The data suggests that we support some form of NQS derivative software, targetting no less than six platforms, concentrating on ease of configuration, core functionality, and user documentation.

As regards added functionality, we need to ensure NQS has a good scheduling system, and multi-processor support. Support for third-party products, such as PVM and AFS, should also be considered as a high priority.


References


References



This site (www.gnqs.org) is copyrighted. You can view the terms & conditions here.
You can contact the webmaster here.