c - crash b - bug fix e - enhancement f - new feature 2.3.10 b - Fixed a bug in run_pelog (src/resmom/prolog.c) where epilogue.user was given the argument list for prologue scripts and not epilogue scripts. Ticket 6296. b - Fixed pbs_mom's default restart behavior. On a restart the MOM is suppose to terminate jobs and report them to the batch server. But it should not try and terminate any running processes. The bug fix made it so the MOM would stop terminating running jobs and only terminate them. 2.3.9 b - Made a fix to svr_get_privilege(). On some architectures a non-root user name would be set to null after the line " host_no_port[num_host_chars] = 0;" because num_host_chars was = 1024 which was the size of hot_no_port. The null termination needed to happen at 1023. There were other problems with this function so code was added to validate the incoming variables before they were used. The symptom of this bug was that non-root managers and operators could not perform operations where they should have had rights. 2.3.8 b - fix return value of cpuset_delete() for Linux (Chris Samuel - VPAC) c - keep pbs_server from trying to free empty attrlist after recieving bad request (Michael Meier, University of Erlangen-Nurnberg) e - Set PBS_MAXUSER to 32 from 16 in order to accomodate systems that use a 32 bit user name.(Ken Nielson Cluster Resources) c - modified acct_job in server/accounting.c to dynamically allocate memory to accomodate strings larger than PBS_ACCT_MAX_RCD. (Ken Nielson Cluster Resources) e - moving jobs can now trigger a scheduling iteration b - fix how qsub sets PBS_O_HOST and PBS_SERVER (Eirikur Hjartarson, deCODE genetics) e - add qpool.gz to contrib directory b - all the user to turn off credential lifetimes so they don't have to lose iterations while credentials are renewed. b - fix so after* dependencies are handled correctly for exiting / completed jobs b - fix tracejob so it handles multiple server and mom logs for the same day b - close empty ACL files e - Removed the use of the global buffer dis_buffer. All of the DIS function calls now use their own scratch buffer which is allocated on the stack at run time. b - The tm calls use tcp for their transport. However, the tm calls were still using the RPP based DIS functions instead of the TCP. Now all tm functions use the tcp-based DIS functions. b - pbsD_authenticate() correctly splits PATH on : instead of ; if pbs_iif is not in the expected location b - pbs_mom now sets resource limits for tasks started with tm_spawn (Chris Samuel, VPAC) c - fix assumption about size of unsocname.sun_path in Libnet/net_server.c f - added a diagnostic script (contrib/diag/tdiag.sh). This script grabs the log files for the server and the mom, records the output of qmgr -c 'p s' and the nodefile, and creates a tarfile containing these. b - Changed momctl -s to use exit(EXIT_FAILURE) instead of return(-1) if a mom is not running. c - Fix for Bugzilla bug 36. "qsub crashes with long dependency list". b - Fix for Bugzilla bug 41. "tracejob creates a file in the local directory". 2.3.7 e - pbs_mom sisters can now tolerate an explicit group ID instead of only a valid group name. This helps TORQUE be more robust to group lookup failures. b - fixed a bug where UNIX domain socket communication was failing when "--disable-privports" was used. e - add job exit status as 10th argument to the epilogue script c - check filename for NULL to prevent crash e - merged in more logging and NOSIGCHLDMOM capability from Yahoo branch e - merged in new log_ext() function to allow more fine grained syslog events, you can now specify severity level. Also added more logging statements e - added code to allow compilers to override CLONE_BATCH_SIZE at configure time (allows for finer grained control on how arrays are created) (ported from Yahoo R2461) e - added code which prefixes the severity tag on all log_ext() and log_err() messages (ported from Yahoo R2358) e - added qmgr option accounting_keep_days, specifies how long to keep accounting files. e - changed mom config varattr so invoked script returns the varattr name and value(s) e - improved the performance of pbs_server when submitting large numbers of jobs with dependencies defined e - added new parameter "log_keep_days" to both pbs_server and pbs_mom. specifies how long to keep log files before they are automatically removed e - added qmgr server attribute lock_file, specifies where server lock file is located e - modified to allow retention of completed jobs across server shutdown e - added job_must_report qmgr configuration which says the job must be reported to scheduler. Added job attribute "reported". Added PURGECOMP functionality which allows scheduler to confirm jobs are reported. Also added -c option to qdel. Used to clean up unreported jobs. b - fix so interactive jobs run when using $job_output_file_umask userdefault b - changes to improve the qstat -x XML output and documentation b - fix truncated output in qmgr (peter h IPSec+jan n NANCO) b - fix so find_resc_entry still works after setting server extra_resc b - change so set_jobexid() gets called if JOB_ATR_egroup is not set b - fixed memory issue (underallocated array for a string) e - added a diagnostic script (contrib/diag/tdiag.sh). This script grabs the log files for the server and mom daemons, records the output of qmgr -c 'p s' and the nodefile and then creates a tarfile containing the output. 2.3.6 e - in Linux, a pbs_mom will now "kill" a job's task, even if that task can no longer be found in the OS processor table. This prevents jobs from getting "stuck" when the PID vanishes in some rare cases. e - forward-ported change from 2.1-fixes (r2581) (b - reissue job obit even if no processes are found) b - change back to not sending status updates until we get cluster addr message from server, also only try to send hello when the server stream is down. b - change pbs_server so log_file_max_size of zero behavior matches documentation e - added periodic logging of version and loglevel to help in support e - added pbs_mom config option ignvmem to ignore vmem/pvmem limit enforcement b - change to correct strtoks that accidentally got changed in astyle formatting 2.3.5 e - added new init.d scripts for Debian/Ubuntu systems b - fixed regression in 2.3.4 release which incorrectly changed soname for libtorque b - fixed a bug where TORQUE's exponential backoff for sending messages to the MOM could overflow 2.3.4 b - fixed a bug with RPM spec files due to new pbs_track executable b - fixed a bug with "max_report" where jobs not in the Q state were not always being reported to scheduler b - fixed bug with new UNIX socket communication when more than one TORQUE instance is running on the same host c - fixed a few memory errors due to a spurious comma and some uninitialized memory being allocated b - fixed a bug preventing multiple TORQUE servers and TORQUE MOMs from operating properly all from the same host f - enabled 'qsub -T' to specify "job type." Currently this will allow a per job prolog/epilog f - added a new '-E' option to qstat which allows command-line users to pass "extend" strings via the API f - added new max_report queue attribute which will limit the number of Idle jobs, per queue, that TORQUE reports to the scheduler e - enhanced logging when a hostname cannot be looked up in DNS e - PBS_NET_MAX_CONNECTIONS can now be defined at compile time (via CFLAGS) e - modified source code so that all .c and .h files now conform more closely to the new CRI format style c - fixed segfault when loading job files of an older/incompatible version b - fixed a bug where if attempt to send job to a pbs_mom failed due to timeout, the job would indefinitely remain the in 'R' state b - fixed a bug where CPU time was not being added up properly in all cases (fix for Linux only) e - pbs_track now allows passing of - and -- options to the a.out argument b - qsub now properly interprets -W umask=0XXX as octal umask e - allow $HOME to be specified for path e - added --disable-qsub-keep-override to allow the qsub -k flag to not override -o -e. e - updated with security patches for setuid, setgid, setgroups b - fixed correct_ct() in svr_jobfunc.c so we don't crash if we hit COMPLETED job b - fixed problem where momctl -d 0 showed ConfigVersion twice e - if a .JB file gets upgraded pbs_server will back up the original b - removed qhold / qrls -h n option since there is no code to support it b - set job state and substate correctly when job has a hold attribute and is being rerun e - fixed several compiler error and warnings for AIX 5.2 systems 2.3.3 b - fixed bug where pbs_mom would sometimes not connect properly with pbs_server after network failures b - changed so run_pelog opens correct stdout/stderr when join is used b - corrected pbs_server man page for SIGUSR1 and SIGUSR2 f - added new pbs_track command which may be used to launch an external process and a pbs_mom will then track the resource usage of that process and attach it to a specified job (experimental) (special thanks to David Singleton and David Houlder from APAC) e - added alternate method for sending cluster addresses to MOM (ALT_CLSTR_ADDR) 2.3.2 e - added --disable-posixmemlock to force mom not to use POSIX MEMLOCK. b - fix potential buffer overrun in qsub b - keep pbs_mom, pbs_server, pbs_sched from closing sockets opened by nss_ldap (SGI) e - added PBS_VERSION environment variable e - added --enable-acct-x to allow adding of x attributes to accounting log b - fix net_server.h build error b - fixed code that was causing jobs to fail due to "neednodes" errors when Moab/Maui was the scheduler 2.3.1 b - fixed a bug where torque would fail to start if there was no LF in nodes file b - fixed a bug where TORQUE would ignore the "pbs_asyrunjob" API extension string when starting jobs in asynchronous mode b - fixed memory leak in free_br for PBS_BATCH_MvJobFile case e - torque can now compile on Linux and OS X with NDEBUG defined f - when using qsub it is now possible to specify both -k and -o/-e (before -o/-e did not behave as expected if -k was also used) e - changed pbs_server to have "-l" option. Specifies a host/port that event messages will be sent to. Event messages are the same as what the scheduler currently receives. e - added --enable-autorun to allow qsub jobs to automatically try to run if there are any nodes available. e - added --enable-quickcommit to allow qsub to combine the ready to commit and commit phases into 1 network transmission. e - added --enable-nochildsignal to allow pbs_server to use inline checking for SIGCHLD instead of using the signal handler. e - change qsub so '-v var=' will look in environment for value. If value is not found set it to "". b - fixed mom_server code's HELLO initiation retry control to reduce occurrence of pbs_server incorrectly marking node as unknown/down b - fix qdel of entire job arrays for non operator/managers b - fix so we continue to process exiting jobs for other servers e - added source_login_batch and source_login_interactive to mom config. This allows us to bypass the sourcing of /etc/profile, etc. type files. b - fixed pbs_server segmentation fault when job_array submissions are rejected before ji_arraystruct was initialized e - add some casts to fix some compiler warnings with gcc-4.1 on i386 when -D_FILE_OFFSET_BITS=64 is set e - added --enable-maxnotdefault to allow not using resources_max as defaults. b - fixed file descriptor leak with Linux cpusets (VPAC) b - added new values to TJobAttr so we don't have mismatch with job.h values. Added some comments also. b - reset ji_momhandle so we cannot have more than one pjob for obit_reply to find. e - change qdel to accept 'ALL' as well as 'all' b - changed order of searching so we find most recent jobs first. Prevents finding old leftover job when pids rollover. Also some CACHEOBITFAILURES updates. b - handle case where mom replies with an unknown job error to a stat request from the server b - allow qalter to modify HELD jobs if BLCR is not enabled b - change to update errpath/outpath attributes when -e -o are used with qsub e - added string output for errnos, etc. 2.3.0 e - redesign how torque.spec is built e - added -a to qrun to allow asynchronous job start e - allow qrerun on completed jobs e - allow qdel to delete all jobs e - make qdel -m functionality match the documentation b - prevent runaway hellos being sent to server when mom's node is removed from the server's node list e - local client connections use a unix domain socket, bypassing inet and pbs_iff f - Linux 2.6 cpuset support (in development) e - new job array submission syntax b - fixed SIGUSR1 / SIGUSR2 to correctly change the log level f - health check script can now be run at job start and end e - tm tasks are now stored in a single .TK file rather than eat lots of inodes f - new "extra_resc" server attribute b - "pbs_version" attr is now correctly read-only e - increase max size of .JB and .SC file names e - new "sched_version" server attribute f - new printserverdb tool e - pbs_server/pbs_mom hostname arg is now -H, -h is help e - added $umask to pbs_mom config, used for generated output files. e - minor pbsnodes overhaul b - fixed memory leak in pbs_server 2.2.2 b - correctly parse /proc/pid/stat that contains parens (Meier) b - prevent runaway hellos being sent to server when mom's node is removed from the server's node list b - fix qdel of entire job arrays for non operator/managers b - fix problem where job array .AR files are not saved to disk b - fixed problem with tracking job memory usage on OS X b - fix memory leak in server and mom with MoveJobFile requests (backported from 2.3.1) b - pbs_server doesn't try to "upgrade" .JB files if they have a newer version of the job_qs struct 2.2.1 b - fix a bug where dependent jobs get put on hold when the previous job has completed but its state is still available for life of keep_completed b - fixed a bug where pbs_server never delete files from the "jobs" directory b - fixed a bug where compute nodes were being put in an indefinite "down" state e - added job_array_size attribute to pbs_submit documentation 2.2.0 e - improve RPP logging for corruption issues f - dynamic resources e - use mlockall() in pbs_mom if _POSIX_MEMLOCK f - consumable resource "tokens" support (Harte-Hanks) e - build process sets default submit filter path to ${libexecdir}/qsub_filter we fall back to /usr/local/sbin/torque_submitfilter to maintain compatibility e - allow long job names when not using -N f - new MOM $varattr config e - daemons are no longer installed 700 e - tighten directory path checks f - new mom configs: $auto_ideal_load and $auto_max_load e - pbs_mom on Darwin (OS X) no longer depends on libkvm (now works on all versions without need to re-enable /dev/kmem on newer PPC or all x86 versions) e - added PBS_SERVER env variable for job scripts e - add --about support to daemons and client commands f - added qsub -t (primitive job array) e - add PBS_RESOURCE_GRES to prolog/epilog environment e - add -h hostname to pbs_mom (NCIFCRF) e - filesec enhancements (StockholmU) e - added ERS and IDS documentation e - allow export of specific variables into prolog/epilog environment b - change fclose to pclose to close submit filter pipe (ABCC) e - add support for Cray XT size and larger qstat task reporting (ORNL) b - pbs_demux is now built with pbs_mom instead of with clients e - epilogue will only run if job is still valid on exec node e - add qnodes, qnoded, qserverd, and qschedd symlinks e - enable DEFAULTCKPT torque.cfg parameter e - allow compute host and submit host suffix with nodefile_suffix f - add --with-modulefiles=[DIR] support b - be more careful about broken tclx installs 2.1.11 b - nqs2pbs is now a generated script b - correct handling of priv job attr b - change font selectors in manpages to bold b - on pbs_server startup, don't skip job-exclusive nodes on initial MOM scan b - pbs_server should not connect to "down" MOMs for any job operation b - use alarm() around writing to job's stdio incase it happens to be a stopped tty 2.1.10 b - fix buffer overflow in rm_request, fix 2 printf that should be sprintf (Umea University) b - correct updating trusted client list (Yahoo) b - Catch newlines in log messages, split messages text (Eygene Ryabinkin) e - pbs_mom remote reconfig pbs_mom now disabled by default use $remote_reconfig to enable it b - fix pam configure (Adrian Knoth) b - handle /dev/null correctly when job rerun 2.1.9 f - new queue attribute disallowed_types, currently recognized types: interactive, batch, rerunable, and nonrerunable e - refine "node note" feature with pbsnodes -N e - bypass pbs_server's uid 0 check on cygwin e - update suse initscripts b - fix mom memory locking b - fix sum buffer length checks in pbs_mom b - fix memory leak in fifo scheduler b - fix nonstandard usage of 'tail' in tpackage b - fix aliasing error with brp_txtlen f - allow manager to set "next job number" via hidden qmgr attribute next_job_number 2.1.8 b - stop possible memory corruption with an invalid request type (StockholmU) b - add node name to pbsnodes XML output (NCIFCRF) b - correct Resource_list in qstat XML output (NCIFCRF) b - pam_authuser fixes from uam.es e - allow 'pbsnodes -l' to work with a node spec b - clear exec_host and session_id on job requeue b - fix mom child segfault when a user env var has a '%' b - correct buggy logging in chk_job_request() (StockholmU) e - pbs_mom shouldn't require server_name file unless it is actually going to be read (StockholmU) f - "node notes" with pbsnodes -n (sandia) 2.1.7 b - fix bison syntax error in Parser.y b - fix 2.1.4 regression with spool file group owner on freebsd b - don't exit if mlockall sets errno ENOSYS f - qalter -v variable_list f - MOMSLEEPTIME env delays pbs_mom initialization e - minor log message fixups e - enable node-reuse in qsub eval if server resources_available.nodect is set e - pbs_mom and pbs_server can now use PBS_MOM_SERVER_PORT, PBS_BATCH_SERVICE_PORT, and PBS_MANAGER_SERVICE_PORT env vars. e - pbs_server can also use PBS_SCHEDULER_SERVICE_PORT env var. e - add "other" resource to pelog's 5th argument 2.1.6 b - freebsd5 build fix b - fix 2.1.4 regression with TM on single-node jobs b - fix 2.1.4 regression with rerunning jobs b - additional spool handling security fixes 2.1.5 b - fix 2.1.4 regression with -o/dev/null 2.1.4 b - fix cput job status b - Fix "Spool Job Race condition" 2.1.3 b - correct run-time symbol in pam module on RHEL4 b - some minor hpux11 build fixes (PACCAR) b - fix bug with log roll and automatic log filenames b - compile error with size_fs() on digitalunix e - pbs_server will now print build details with --about e - new freebsd5 mom arch for Freebsd 5.x and 6.x (trasz) f - backported new queue attribute "max_user_queuable" e - optimize acl_group_sloppy e - fix "list_head" symbol clash on Solaris 10 e - allow pam_pbssimpleauth to be built on OSX and Solaris b - networking fixes for HPUX, fixes pbs_iff (PACCAR) e - allow long job names when not using -N c - using depend=syncwith crashed pbs_server c - races with down nodes and purging jobs crashed pbs_server b - staged out files will retain proper permission bits f - may now specify umask to use while creating stderr and stdout spools e.g. qsub -W umask=22 b - correct some fast startup behaviour e - queue attribute max_queuable accounts for C jobs 2.1.2 b - fix momctl queries with multiple hosts b - don't fail make install if --without-sched b - correct MOM compile error with atol() f - qsub will now retry connecting to pbs_server (see manpage) f - X11 forwarding for single-node, interactive jobs with qsub -X f - new pam_pbssimpleauth PAM module, requires --with-pam=DIR e - add logging for node state adjustment f - correctly track node state and allocation based for suspended jobs e - entries can always be deleted from manager ACL, even if ACL contains host(s) that no longer exist e - more informative error message when modifying manager ACL f - all queue create, set, and unset operations now set a queue mtime f - added support for log rolling to libtorque f - pbs_server and pbs_mom have two new attributes log_file_max_size, log_file_roll_depth e - support installing client libs and cmds on unsupported OSes (like cygwin) b - fix subnode allocation with pbs_sched b - fix node allocation with suspend-resume b - fix stale job-exclusive state when restarting pbs_server b - don't fall over when duplicate subnodes are assigned after suspend-resume b - handle suspended jobs correctly when restarting pbs_server b - allow long host lists in runjob request b - fix truncated XML output in qstat and pbsnodes b - typo broke compile on irix6array and unicos8 e - momctl now skips down nodes when selecting by property f - added submit_args job attribute 2.1.1 c - fix mom_sync_job code that crashes pbs_server (USC) b - checking disk space in $PBS_SERVER_HOME was mistakenly disabled (USC) e - node's np now accessible in qmgr (USC) f - add ":ALL" as a special node selection when stat'ing nodes (USC) f - momctl can now use :property node selection (USC) f - send cluster addrs to all nodes when a node is created in qmgr (USC) - new nodes are marked offline - all nodes get new cluster ipaddr list - new nodes are cleared of offline bit f - set a node's np from the status' ncpus (only if ncpus > np) (USC) - controlled by new server attribute "auto_node_np" c - fix possible pbs_server crash when nodes are deleted in qmgr (USC) e - avoid dup streams with nodes for quicker pbs_server startup (USC) b - configure program prefix/suffix will now work correctly (USC) b - handle shared libs in tpackages (USC) f - qstat's -1 option can now be used with -f for easier parsing (USC) b - fix broken TM on OSX (USC) f - add "version" and "configversion" RM requests (USC) b - in pbs-config --libs, don't print rpath if libdir is in the sys dlsearch path (USC) e - don't reject job submits if nodes are temporarily down (USC) e - if MOM can't resolve $pbsserver at startup, try again later (USC) - $pbsclient still suffers this problem c - fix nd_addrs usage in bad_node_warning() after deleting nodes (MSIC) b - enable build of xpbsmom on darwin systems (JAX) e - run-time config of MOM's rcp cmd (see pbs_mom(8)) (USC) e - momctl can now accept query strings with spaces, multiple -q opts (USC) b - fix linking order for single-pass linkers like IRIX (ncifcrf) b - fix mom compile on solaris with statfs (USC) b - memory corruption on job exit causing cpu0 to be allocated more than once (USC) e - add increased verbosity to tracejob and added '-q' commandline option e - support larger values in qstat output (might break scripts!) (USC) e - make 'qterm -t quick' shutdown pbs_server faster (USC) 2.1.0p0 fixed job tracking with SMP job suspend/resume (MSIC) modify pbs_mom to enforce memory limits for serial jobs (GaTech) - linux only enable 'never' qmgr maildomain value to disable user mail enable qsub reporting of job rejection reason add suspend/resume diagnostics and logging prevent stale job handler from destroying suspended jobs prevent rapid hello from MOM from doing DOS on pbs_server add diagnostics for why node not considered available add caching of local serverhost addr lookup enable job centric vs queue centric queue limit parameter brand new autoconf+automake+libtool build system (USC) automatic MOM restarts for easier upgrades (USC) new server attributes: acl_group_sloppy, acl_logic_or, keep_completed, kill_delay new server attributes: server_name, allow_node_submit, submit_hosts torque.cfg no longer used by pbs_server pbsdsh and TM enhancements (USC) - tm_spawn() returns an error if execution fails - capture TM stdout with -o - run on unique nodes with -u - run on a given hostname with -h largefile support in staging code and when removing $TMPDIR (USC) use bindresvport() instead of looping over calls to bind() (USC) fix qsub "out of memory" for large resource requests (SANDIA) pbsnodes default arg is now '-a' (USC) new ":property" node selection when node stat and manager set (pbsnodes) (USC) fix race with new jobs reporting wrong walltime (USC) sister moms weren't setting job state to "running" (USC) don't reject jobs if requested nodes is too large node_pack=T (USC) add epilogue.parallel and epilogue.user.parallel (SARA) add $PBS_NODENUM, $PBS_MSHOST, and $PBS_NODEFILE to pelogs (USC) add more flexible --with-rcp='scp|rcp|mom_rcp' instead of --with-scp (USC) build/install a single libtorque.so (USC) nodes are no longer checked against server host acl list (USC) Tcl's buildindex now supports a 3rd arg for "destdir" to aid fakeroot installs (USC) fixed dynamic node destroy qmgr option install rm.h (USC) printjob now prints saved TM info (USC) make MOM restarts with running jobs more reliable (USC) fix return check in pbs_rescquery fixing segfault in pbs_sched (USC) add README.pbstools to contrib directory workaround buggy recvfrom() in Tru64 (USC) attempt to handle socklen_t portably (USC) fix infinite loop in is_stat_get() triggered by network congestion (USC) job suspend/resume enhancements (see qsig manpage) (USC) support higher file descriptors in TM by using poll() instead of select() (USC) immediate job delete feedback to interactive queued jobs (USC) move qmgr manpage from section 8 to section 1 add SuSE initscripts to contrib/init.d/ fix ctrl-c race while starting interactive jobs (USC) fix memory corruption when tm_spawn() is interrupted (USC) 2.0.0p8 really fix torque.cfg parsing (USC) fix possible overlapping memcpy in ACL parsing (USC) fix rare self-inflicted sigkill in MOM (USC) 2.0.0p7 fixed pbs_mom SEGV in req_stat_job() fixed torque.cfg parameter handling fixed qmgr memory leak 2.0.0p6 fix segfault in new "acl_group_sloppy" code if a group doesn't exist (USC) configure defaults changed to enable syslog, enable docs, and disable filesync (USC) pelog now correctly restores previous alarm handler (Sandia) misc fixes with syscalls returns, sign-mismatches, and mem corruption (USC) prevent MOM from killing herself on new job race condition (USC) - so far, only linux is fixed remove job delete nanny earlier to not interrupt long stageouts (USC) display C state later when using keep_completed (USC) add 'printtracking' command in src/tools (USC) stop overriding the user with name resolution on qsub's -o/-e args (USC) xpbsmon now works with Tcl 8.4 (BCGSC) don't bother spooling/keeping job output intended for /dev/null (USC) correct missing hpux11 manpage (USC) fix compile for freebsd - missing symbols (yahoo) fix momctl exit code (yahoo) new "exit_status" job attribute (USC) new "mail_domain" server attribute (overrides --maildomain) (USC) configure fixes for linux x86_64 and tcl install weirdness (USC) extended mom parameter buffer space change pbs_mkdirs to use standard var names so that chroot installs work better (USC) torque.spec now has tcl/gui and wordexp enabled by default enable multiple dynamic+static generic resources per node (GATech) make sure attrs on job launch are sent to server (fixes session_id) (USC) add resmom job modify logging torque.cfg parsing fixes 2.0.0p5 reorganize ji_newt structure to eliminate 64 bit data packing issues enable '--disable-spool' configure directive enable stdout/stderr stageout to search through $HOME and $HOME/.pbs_spool fixes to qsub's env handling for newlines and commas (UMU) fixes to at_arst encoding and decoding for newlines and commas (USC) use -p with rcp/scp (USC) several fixes around .pbs_spool usage (USC) don't create "kept" stdout/err files ugo+rw (avoid insane umask) (USC) qsub -V shouldn't clobber qsub's environ (USC) don't prevent connects to "down" nodes that are still talking (USC) allow file globs to work correctly under --enable-wordexp (USC) enable secondary group checking when evaluating queue acl_group attribute - enable the new queue parameter "acl_group_sloppy" sol10 build system fixes (USC) fixed node manager buffer overflow (UMU) fix "pbs_version" server attribute (USC) torque.spec updates (USC) remove the leading space on the node session attribute on darwin (USC) prevent SEGV if config file is missing/corrupt "keep_completed" execution queue attribute several misc code fixes (UMU) 2.0.0p4 fix up socklen_t issues fixed epilog to report total job resource utilization improved RPM spec (USC) modified qterm to drop hung connections to bad nodes enhance HPUX operation 2.0.0p3 fixed dynamic gres loading in pbs_mom (CRI) added torque.spec (rpmbuild -tb should work) (USC) new 'packages' make target (see INSTALL) (USC) added '-1' qstat option to display node info (UMICH) various fixes in file staging and copying (USC) - reenable stageout of directories - fix confusing email messages on failed stageout - child processes can't use MOM's logging, must use syslog fix overflow in RM netload (USC) don't check walltime on sister nodes, only on MS (ANU) kill_task wasn't being declared properly for all mach types (USC) don't unnecessarily link with libelf and libdl (USC) fix compile warnings with qsort/bsearch on bsd/darwin (USC) fix --disable-filesync to actually work (USC) added prolog diagnostics to 'momctl -d' output (CRI) added logging for job file management (CRI) added mom parameter $ignwalltime (CRI) added $PBS_VNODENUM to job/TM env (USC) fix self-referencing job deps (USC) Use --enable-wordexp to enable variables in data staging (USC) $PBS_HOME/server_name is now used by MOM _iff $pbsserver isn't used_ (USC) Fix TRU64 compile issues (NCIFCRF) Expand job limits up to ULONG_MAX (NCIFCRF) user-supplied TMPDIR no longer treated specially (USC) remtree() now deals with symlinks correctly (USC) enable configurable mail domain (Sandia) configure now handles darwin8 (USC) configure now handles --with-scp=path and --without-scp correctly (USC) 2.0.0p2 fix check_pwd() memory leak (USC) 2.0.0p1 fix mpiexec stdout regression from 2.0.0p0 (USC) add 'qdel -m' support to enable annotating job cancellation (CRI) add mom diagnostics for prolog failures and timeouts (CRI) interactive jobs cannot be rerunable (USC) be sure nodefile is removed when job is purged (USC) don't run epilogue multiple times when multiple jobs exit at once (USC) fix clearjob MOM request (momctl -c) (USC) fix detection of local output files with localhost or /dev/null (USC) new qstat/qselect -e option to only select jobs in exec queues (USC) $clienthost and $headnode removed, $pbsclient and $pbsserver added (USC) $PBS_HOME/server_name is now added to MOM's server list (USC) resmom transient TMPDIR (USC) add joblist to MOM's status & add experimental server "mom_job_sync" (USC) export PBS_SCHED_HINT to pelogues if set in the job (USC) don't build or install pbs_rcp if --enable-scp (USC) set user hold on submitted jobs with invalid deps (USC) add initial multi-server support for HA (CRI) Altix cpuset enhancements (CSIRO) enhanced momctl to diagnose and report on connectivity issues (CRI) added hostname resolution diagnostics and logging (CRI) fixed 'first node down' rpp failure (USC) improved qsub response time 2.0.0p0 torque patches for RCP and resmom (UCHSC) enhanced DIS logging improved start-up to support quick startup with down nodes fixed corrupt job/node/queue API reporting fixed tracejob for large jobs (Sandia) changed qdel to only send one SIGTERM at mom level fixed doc build by adding AIX 5 resources docs added prerun timeout change (RENTEC) added code to handle select() EBADF - 9 disabled MOM quota feature by default, enabled with -DTENABLEQUOTA cleanup MOM child error messages (USC) fix makedepend-sh for gcc-3.4 and higher (DTU) don't fallback to mom_rcp if configured to use scp (USC) 1.2.0p6 enabled opsys mom config (USC) enabled arch mom config (CRI) fixed qrun based default scheduling to ignore down nodes (USC) disable unsetting of key/integer server parameters (USC) allow FC4 support - quota struct fix (USC) add fix for out of memory failure (USC) add file recovery failure messages (USC) add direct support for external scheduler extensions add passwd file corruption check add job cancel nanny patch (USC) recursively remove job dependencies if children can never be satisfied (USC) make poll_jobs the default behavior with a restat time of 45 seconds added 'shell-use-arg' patch (OSC) improved API timeout disconnect feature added improved rapid start up reworked mom-server state management (USC) - removed 'unknown' state - improved pbsnodes 'offline' management - fixed 'momctl -C' which actually _prevented_ an update - fixed incorrect math on 'tmpTime' - added 'polltime' to the math on 'tmpTime' - consolidated node state changes to new 'update_node_state()' - tightened up the "node state machine" - changed mom's state to follow the documented state guidelines - correctly handle "down" from mom - moved server stream handling out of 'is_update_stat()' to new 'init_server_stream()' - refactored the top of the main loop to tighten up state changes - fixed interval counting on the health check script - forced health check script if update state is forced - don't spam the server with updates on startup - required new addr list after connections are dropped - removed duplicate state updates because of broken multi-server support - send "down" if internal_state is down (aix's query_adp() can do this) - removed ferror() check on fread() because fread() randomly fails on initial mom startup. - send "down" if health check returns "ERROR" - send "down" if disk space check fails. 1.2.0p5 make '-t quick' default behavior for qterm added '-p' flag to qdel to enable forced job purge (USC) fixed server resources_available n-1 issue added further Altix CPUSet support (NCSA) added local checkpoint script support for linux fixed 'premature end of message warning' clarify job deleted mail message (SDSC) fixed AIX 5.3 support in configure (WestGrid) fixed crash when qrun issued on job with incomplete requeue added support for >= 4GB memory usage (GMX) log job execution limits failures added more detailed error messages for missing user shell on mom fixed qsub env overflow issue 1.2.0p4 extended job prolog to include jobname, resource, queue, and account info (MAINE) added support for Darwin 8/OS X 10.4 (MAINE) fixed suspend/resume for MPI jobs (NORWAY) added support for epilog.precancel to enable local job cancellation handling fixed build for case insensitive filesystems fixed relative path based Makefiles for xpbsmom added support for gcc 4.0 added PBSDEBUG support to client commands to allow more verbose diagnostics of client failures added ALLOWCOMPUTEHOSTSUBMIT option to torque.cfg fixed dynamic pbs_server loglevel support added mom-server rpp socket diagnostics added support for multi-homed hosts w/SERVERHOST parameter in torque.cfg added support for static linking w/PBSBINDIR added availmem/totmem support to Darwin systems (MAINE) added netload support to Darwin systems (MAINE) 1.2.0p3 enable multiple server to mom communication fixed node reject message overwrite issue enable pre-start node health check (BOEING) fixed pid scanning for RHEL3 (VPAC) added improved vmem/mem limit enforcement and reporting (UMU) added submit filter return code processing to qsub 1.2.0p2 enhance network failure messages fixed tracejob tool to only match correct jobs (WESTGRID) modified reporting of linux availmem and totmem to allow larger file sizes fixed pbs_demux for OSF/TRU64 systems to stop orphaned demux processes added dynamic pbs_server loglevel specification added intelligent mom job stat sync'ing for improved scalability (USC/CRI) added mom state sync patch for dup join (USC) added spool dir space check (MAINE) 1.2.0p1 add default DEFAULTMAILDOMAIN configure option improve configure options to use pbs environment (USC) use openpty() based tty management by default enable default resource manager extensions make mom config parameters case insensitive added jobstartblocktime mom parameter added bulk read in pbs_disconnect() (USC) added support for solaris 5 added support for program args in pbsdsh (USC) added improved task recovery (USC) 1.2.0p0 fixed MOM state update behavior (USC/Poland) fixed set_globid() crash added support for > 2GB file size job requirements updated config.guess to 2003 release general patch to initialize all function variables (USC) added patch for serial job TJE leakage (USC) add "hw.memsize" based physmem MOM query for darwin (Maine) add configure option (--disable-filesync) to speed up job submission set PBS mail precedence to bulk to avoid vactaion responses (VPAC) added multiple changes to address gcc warnings (USC) enabled auto-sizing of 'qstat -Q' columns purge DOS EOL characters from submit scripts 1.1.0p6 added failure logging for various MOM job launch failures (USC) allow qsub '-d' relative path qsub specification enabled $restricted parameter w/in FIFO to allow used of non-privileged ports (SAIC) checked job launch status code for retry decisions added nodect resource_available checking to FIFO disabled client port binding by default for darwin systems (use --enable-darwinbind to re-enable) - workaround for darwin bind and pclose OS bugs fixed interactive job terminal control for MAC (NCIFCRF) added support for MAC MOM-level cpu usage tracking (Maine) fixed __P warning (USC) added support for server level resources_avail override of job nodect limits (VPAC) modify MOM copy files and delete file requests to handle NFS root issues (USC/CRI) enhance port retry code to support mac socket behavior clean up file/socket descriptors before execing prolog/epilog enable dynamic cpu set management (ORNL) enable array services support for memory management (ORNL) add server command logging to diagnostics fix linux setrlimit persistance on failures 1.1.0p5 added loglevel as MOM config parameter distributed job start sequence into multiple routines force node state/subnode state offline stat synchronization (NCSA) fixed N-1 cpu allocation issue (no sanity checking in set_nodes) enhance job start failure logging added continued port checking if connect fails (rentec) added case insensitive host authentication checks added support for submitfilter command line args added support for relocatable submitfilter via torque.cfg fixed offline status cleared when server restarted (USC) updated PBSTop to 4.05 (USC) fixed PServiceType array to correctly report service messages fixed pbs_server crash from job dependencies prevent mom from truncating lock file when mom is already running tcp timeout added as config option 1.1.0p4 added 15004 error logging added use of openpty() call for locating pseudo terminals (SNL) add diagnostic reporting of config and executable version info add support for config push add support for MOM config version parameters log node offline/online and up/down state changes in pbs_server logs add mom fork logging and home directory check add timeout checking in rpp socket handling added buffer overflow prevention routines added lockfile logging supported protected env variables with qstat 1.1.0p3 added support for node specification w/pbsnodes -a added hstfile support to momctl added chroot (-D) support (SRCE) added mom chdir pjob check (SRCE) fixed MOM HELLO initialization procedure added momctl diagnostic/admin command (shutdown, reconfig, query, diagnose) added mom job abort bailout to prevent infinite loops added network reinitialization when socket failure detected added mom-to-scheduler reporting when existing job detected added mom state machine failure logging 1.1.0p2 add support for disk size reporting via pbs_mom fixed netload initialization fixed orphans on mom fork failure updated to pbstop v 3.9 (USC) fixed buffer overflow issue in net_server.c added pestat package to contrib (ANU) added parameter checking to cpy_stage() (NCSA) added -x (xml output) support for 'qstat -f' and 'pbsnodes -a' added SSS xml library (SSS) updated user-project mapping enforcement (ANL) fix bogus 'cannot find submitfilter' message for interactive jobs fix incorrect job allocation issue for interactive jobs (NCSA) prevent failure with invalid 'servername' specification (NCSA) provide more meaningful 'post processing error' messages (NCSA) check for corrupt jobs in server database and remove them immediately enable SIGUSR1/SIGUSR2 pbs_mom dynamic loglevel adjustment profiling enhancements use local directory variable in scan_non_child_tasks() to prevent race condition (VPAC) added AIX 5 odm support for realmem reporting (VPAC) 1.1.0p1 added pbstop to contrib (USC) added OSC mpiexec patch (OSC) confirmed OSC mom-restart patch (OSC) fix pbsd_init purge job tracking allow tracking of completed jobs (w/TORQUEKEEPCOMPLETED env) added support for MAC OS 10 added qsub wrapper support added '-d' qsub command line flag for specifying working directory fixed numerous spelling issues in pbs docs enable logical or'ing of user and group ACL's allow large memory sizes for physmem under solaris (USC) fixed qsub SEGV on bad '-o' specification add null checking on ap->value fixed physmem() routine for tru64 systems to load compute node physical memory added netload tracking 1.1.0p0 fixed linux swap space checking fixed AIX5 resmom ODM memory leak handle split var/etc directories for default server check (CHPC) add pbs_check utility added TERAGRID nospool log bounds checking add code to force host domains to lower case verified integration of OSC prologue-environment.patch (export Resource_List.nodes in an environment variable for prologue) verified integration of OSC no-munge-server-name.patch (do not install over existing server_name) verified integration of OSC docfix.patch (fix minor manpage type) 1.0.1p6 add messaging to report remote data staging failures to pbs_server added tcp_timeout server parameter add routine to mark hung nodes as down add torque.setup initialization script track okclient status fixed INDIANA ji_grpcache MOM crash fixed pbs_mom PBSLOGLEVEL/PBSDEBUG support fixed pbs_mom usage added rentec patch to mom 'sessions' output fixed pbs_server --help option added OSC patch to allow jobs to survive mom shutdown added patch to support server level node comments added support for reporting of node static resources via sss interface added support for tracking available physical memory for IRIX/Linux systems added support for per node probes to dynamically report local state of arbitrary value fixed qsub -c (checkpoint) usage 1.0.1p5 add SUSE 9.0 support add Linux 2.4 meminfo support add support for inline comments in mom_priv/conf allow support for upto 100 million unique jobs add pbs_resources_all documentation fix kill_task references add contrib/pam_authuser 1.0.1p4 fixed multi-line readline buffer overflow extended TORQUE documentation fixed node health check management 1.0.1p3 added support for pbs_server health check and routing to scheduler added support for specification of more than one clienthost parameter added PW unused-tcp-interrupt patch added PW mom-file-descriptor-leak patch added PW prologue-bounce patch added PW mlockall patch (release mlock for mom children) added support for job names up to 256 chars in length added PW errno-fix patch 1.0.1p2 added support for macintosh (darwin) fixed qsub 'usage' message to correctly represent '-j', '-k', '-m', and '-q' support add support for 'PBSAPITIMEOUT' env variable fixed mom dec/hp/linux physmem probes to support 64 bit fixed mom dec/hp/linux availmem probes to support 64 bit fixed mom dec/hp/linux totmem probes to support 64 bit fixed mom dec/hp/linux disk_fs probes to support 64 bit removed pbs server request to bogus probe added support for node 'message' attribute to report internal failures to server/scheduler corrected potential buffer overflow situations improved logging replacing 'unknown' error with real error message enlarged internal tcp message buffer to support 2000 proc systems fixed enc_attr return code checking Patches incorporated prior to patch 2: HPUX superdome support add proper tracking of HP resources - Oct 2003 (NOR) is_status memory leak patches - Oct 2003 (CRI) corrects various memory leaks Bash test - Sep 2003 (FHCRC) allows support for linked shells at configure time AIXv5 support -Sep 2003 (CRI) allows support for AIX 5.x systems OSC Meminfo -- Dec 2001 (P. Wycoff) corrects how pbs_mom figures out how much physical memory each node has under Linux Sandia CPlant Fault Tolerance I (w/OSC enhancements) -- Dec 2001 (L. Fisk/P. Wycoff) handles server-MOM hangs OSC Timeout I -- Dec 2001 (P. Wycoff) enables longer inter daemon timeouts OSC Prologue Env I -- Jan 2002 (P. Wycoff) add support for env variable PBS_RESOURCE_NODES in job prolog OSC Doc/Install I -- Dec 2001 (P. Wycoff) fix to the pbsnodes man page Configuration information for Linux on the IA64 architecture fix the build process to make it clean out the documentation directories during a "make distclean" fix the installation process to keep it from overwriting ${PBS_HOME}/server_name if it already exists correct code creating compile time warnings allow PBS to compile on Linux systems which do not have the Linux kernel source installed Maui RM Extension -- Dec 2002 (CRI) enable Maui resource manager extensions including QOS, reservations, etc NCSA Scaling I -- Mar 2001 (G. Arnold) increase number of nodes supported by PBS to 512 NCSA No Spool -- Apr 2001 (G. Arnold) support $HOME/.pbs_spool for large jobs NCSA MOM Pin pin PBS MOM into memory to keep it from getting swapped ANL RPP Tuning -- Sep 2000 (J Navarro) tuning RPP for large systems WGR Server Node Allocation -- Jul 2000 (B Webb) addresses issue where PBS server incorrectly claims insufficient nodes WGR MOM Soft Kill -- May 2002 (B Webb) processes are killed with SIGTERM followed by SIGKILL PNNL SSS Patch -- Jun 2002 (Skousen) improves server-mom communication and server-scheduler CRI Job Init Patch -- Jul 2003 (CRI) correctly initializes new jobs eliminating unpredictable behavior and crashes VPAC Crash Trap -- Jul 2003 (VPAC) supports PBSCOREDUMP env variable CRI Node Init Patch -- Aug 2003 (CRI) correctly initializes new nodes eliminating unpredictable behavior and crashes SDSC Log Buffer Patch -- Aug 2003 (SDSC) addresses log message overruns