CHANGES

= Changes from previous Cobalt Versions =

== Changes to 1.0.33 ==
 * Fix for a bug where upgrading from significantly older versions of Cobalt
   with existing reservations set will cause those reservations to be
   unresponsive.

== Changes to 1.0.32 ==
 * Added a utility qsub2pbs which takes Cobalt qsub command line arguments
   and provides equivalents for PBSPro/OpenPBS.

== Changes to 1.0.31 ==
 * Fixed a race condition where a K-accounting record could be missed if a
   reservation was manually removed as it was ending.
 * Reservation accounting records have had numerous fields added.
 * cluster_systems now respect the --attrs location attribute with a
   ','-delimited list of nodes.
 * Job running location has been removed from Cobalt email subject lines.  Large
   jobs would cause the subject line to exceed the maximum permitted length and
   could cause the email to be dropped.
 * Added system-related PBS-style accounting records for node up and node down
   in cluster systems.

== Changes to 1.0.30 ==
 * Fixed a race condition for Cray systems where a job draining may be
   disrupted briefly on job termination, allowing other jobs to run
   instead.

== Changes to 1.0.29 ==
 * A timeout for automatic functions that was forcing a once-per-ten-seconds
   cycle time has been relaxed.  The default is now much faster and should
   allow for improved job startup rates if schedule interval is set to a
   time.
 * A lock has been relaxed that should allow for most Cobalt commands to
   be responsive while ALPS updates are being fetched for Cray systems.

== Changes to 1.0.28 ==
 * Fix an issue where deferring a cyclic reservation could result in the
   reservation start time being sent two cycle periods forward rather
   than one.

== Changes to 1.0.27 ==
 * Security fix for Cray systems.

== Changes to 1.0.26 ==
 * Interactive mode jobs on cluster systems no longer get the Cray-style
   double shell execution.
 * Fixed an error in the logging timestamps of Cray KNL memory reboot scripts
 * Added extended accounting log records for Jobs.  Holds are now tracked and
   most records have additional information on job specification and resource.
 * Added in accounting records for Cobalt reservations.
 * Fixed double reporting of all-holds-clear on a dependency hold release.
 * Fixed a logging bug in the event that a job fails to create a
   .cobaltlog file.
 * $COBALT_JOBID now properly expanded for .cobaltlog files
 * Users may see a start time estimate for their jobs in qstat by adding
   the start_time_est header to their header option.  This is a conservative
   naive start time estimate based on the running job times and the number of
   machine-hours of jobs ahead of a given job in the queue.  The minimum of this
   estimate may be configured by sites to allow for potentially long resource
   cleanups.
 * A utility, find-location, has been added.  This can be used to find a
   location on cluster systems or Cray systems that can be used for setting
   reservations or as a location attribute.  This may be set to allow or avoid
   currently down nodes, restricted to queues, avoid reservations, and on Cray
   systems may be restricted to specific node attributes.
 * An update to the Cray Cobalt initialization script has been made to tag the
   component output redirect with a year and month for easier management of this
   log.


== Changes to 1.0.25 ==
 * Fixed not writing a TE record for jobs that had started running but
   were administratively force-killed.
 * Fixed an issue where an error during run initialization was not
   entering the Run_Retry state correctly and would cause a job to fail.
 * Fixed a condition that could cause a node to be allocated to two jobs
   simultaneously on cluster systems if a job is selected to run in both
   the primary and backfill scheduling passes.
 * Fixed a race condition where a job that was starting up and deleted by a
   user could set up a situation where a node would be freed twice, allowing
   for a double-allocation of a node.
 * Logging for component communication, especially with location lookup has
   been greatly enhanced.

== Changes to 1.0.24 ==
 * Cweb now reports jobs in the starting state.
 * Fixed an issue where a process exit status could be fetched twice, and
   overwrite the actual exit status of a job with an erroneous value.
 * Heckle and Gram support has been removed.

== Changes to 1.0.23 ==
 * Support for cgroups via cgclassify has been added.  This is off by default.
   Currently, this is only supported for Cray systems, and is intended for systems
   that do not have kernel support for using proc connector.
 * Fixed an unhandled UnboundLocalError that could arise when querying child status
   when interacting with apbasil.

== Changes to 1.0.22 ==
 * Fixed error on Cray systems that prevented cqadm --run from working on those
   systems.
 * Forkers that are de-registered while Cobalt is running, such as a login node
   crashing will now be properly removed from the pool for running jobs, until
   they are brought back up and re-registered on Cray systems.  Jobs that were
   running on that forker will be cleaned up (or will continue running
   normally) once the forker is re-registered with the rest of Cobalt.
 * The default signal sent to processes at the end of a job in Cray systems is
   now SIGTERM not SIGINT.
 * Cobalt will now detect SSDs on Cray XC40 systems through the CAPMC
   interface.  This allows users to request to be placed on nodes with optional
   SSDs and allows users to specify a minimum free size for their job.  By
   default no SSDs are requested.

== Changes to 1.0.21 ==
 * Fixed numerous locking issues in job startup that could result in a small chance
   of failed job initialization.
 * Fixed an issue on Cray systems where brief network disruptions on service nodes
   can cause a loss of contact with a forker resulting in a loss of the running jobs.
 * Added additional logging for ProxyErrors.  The original trace/error message should
   now be presented to the caller in the event of a ProtocolError.
 * Fixed an issue in the cdbwriter where an error in writing job attributes could
   cause a recurring error loop preventing database updates.

== Changes to 1.0.20 ==
 * Fix for certain IO error messages that were ambiguous.
 * Fix for an issue where attrs on a job are changed as a part of a filter script while
   using qmove that can cause attrs to become a string instead of a dictionary and
   disrupt scheduling on a queue.

== Changes to 1.0.19 ==
 * Hotfix for issue preventing rereservations for interactive jobs from being detected.

== Changes to 1.0.18 ==
 * Fix for a hang that can occur with interactive jobs if a Cray ALPS reservation
   needs to be recreated at job execution time, usually in a rebooting context.
 * Fix for a display issue when using the cluster_system component when the cluster
   has nodes with a non-uniform prefix, or no prefix at all.  This changes some
   existing numbering condensation as well, and allows the location string for
   jobs and reservations to be used with tools like pdsh

== Changes to 1.0.17 ==
 * Added queue and project information to environment in Cray jobs.

== Changes to 1.0.15 ==
 * Fix for an issue on Cray systems where Cobalt was not properly passing
   a job's node list into the ALPS re-reservation.  This could cause a mismatch
   in the nodes that Cobalt was actually running a job on and what it thought the job
   was running on.

== Changes to 1.0.14 ==
 * Fix for cluster systems where nodes that weren't numbered could cause
   showres/qstat to fail to error out when displaying locations.
 * TS has location added, TE records have location, and seconds from epoch UTC
   start and end for tasks
 * Memory management scripts have been updated so that they can handle larger
   Cray XC40 systems.
 * Memory management boot records have been enhanced with start/end seconds from
   epoch UTC times and blocked locations on the boot-start record.
 * Fix for a potential memory mode script switch crash when rebooting from certain
   memory mode mixes.
 * Group membership checks for queues are now being handled more efficiently.
 * Boot start and boot end records from the memory mode management scripts now have
   a corrected date format in the message timestamp that is consistient with other
   accounting records.

== Changes to 1.0.13 ==
 * Added a web-service interface "cweb" client.  This presents job, reservation
   and node data over a REST API.  This is based on the gronkd daemon from
   the ALCF.
 * Added TaskStart and TaskEnd (TS and TE) PBS accounting records.  These are
   specific to Cobalt.  They correspond to when the user task is started/ended
   by the queue-manager.
 * Enhanced the Cray CAPMC boot scripts and filters.  PBS style logs are now
   provided wtih Boot Start and Boot End records.
 * Enhanced the system component node allocator to try and prefer nodes that
   match a job's requested memory modes to avoid unnecesssary memory mode shift
   reboots.
 * alpssystem now updates node attributes from ALPS.  Notably this updates the
   MCDRAM(HBM) and NUMA information on the nodes.  This happens at the standard
   hardware update interval.

== Changes to 1.0.12 ==
 * The BlueGene/Q now has a more optimistic backfilling algorithm that also
   properly accounts for exactly which jobs are currently draining.

== Changes to 1.0.11 ==
 * Fix for a race condition on Cray systems where a correctly timed qdel can
   cause the nodes for a job to get stuck in cleanup.

== Changes to 1.0.10 ==
 * Sites can now specify the qsub comamnd for mom nodes on Cray systems in case
   it is in a non-default location.  This also fixes a permission issue for
   eLogin interactive jobs.
 * Preemptive fix for a possible process leak from AlpsBridge.
 * Preemptive fix for a possible loss of some output when using string_stdout
   in the forkers.
 * Adding in DDL for a summary view for reservations for CDB.
 * Fix for a failure in loading very old statefiles into system_script_forker
   if there are prior runs left to clean up on restart.

== Changes to 1.0.9 ==
 * Cobalt database writer now working properly with Cray Systems.
 * Updates to the Cobalt Database DDL build scripts
 * Interactive jobs enhanded to work in Cray's eLogin environment.  Users will
   now be sent to a node that can run a real aprun prior to their shell being
   launched.
 * The Cray init script will now tag all redirected console output files with
   the originating node.
 * Components now log what host they are launched on at startup

== Changes to 1.0.8 ==
 * Fixed a bug where a node going down during a running job could allow that
   node to be selected for draining.
 * Expanded a buffer used by the forkers that should significantly improve the
   speed of reading large ALPS responses and no longer interfere with node
   reacquisition
 * Cleaned up stray print statements in node information
 * Fixed an error where score could be set to a string rather than a float via
   the filter scripts
 * Fixed an error where the queue-manager statefile could be lost if Cobalt was
   brought down with a job in the "Terminal" state.

== Changes to 1.0.7 ==
 * Updated a new Cray-system specific init script.

== Changes to 1.0.6 ==
 * Backfill now supported on Cray systems.
 * Major performance improvement to nodelist and nodeadm -l.
 * Fix for an error that can cause loss of job tracking during job startup
   resulting in a reported exit status of "1234567".
 * Fix for an error in system_script_forker that can cause an error loop
   if using stdout to string functionality.
 * Fix for an error that caused jobs with a "terminating" status to cause
   qstat to fail.

== Changes to 1.0.5 ==
 * Fix for an issue that would cause jobs with their location set to not
   run if they were submitted to a reservation.
 * Performance of setres improved significantly.
 * Fixed an issue where reservations and soft-down status on nodes may not
   be respected by job placement.
 * An enhanced_apkill script is now available to improve job cleanup and
   ensure that jobs that ignore signals are otherwise properly terminated.

== Changes to 1.0.4 ==
 * Fix for apid not being properly checked when apkill was being invoked.

== Changes to 1.0.3 ==
 * Fix for an issue for nodes coming out of disabled state and suddenly
   appearing in ALPS.
 * Interactive job support
 * Support for CAPMC memory management scripts added.  Cobalt now gracefully
   handles long startup times if these are being used.
 * Improved job cleanup using apkill.

== Changes to 1.0.2 ==
 * Singleton job filter (maxtotaljobs) now supported
 * Multiple alps_script_forkers running simultaneously is now supported. Forkers
   are chosen for execution based on which forker has the lowest number of jobs
   currently assigned to it.

== Changes to 1.0.0 ==
 * Initial support of Cray systems

== Changes to 0.99.42 ==
 * Update to database writer to handle attrs fields that are
   improperly formatted.

== Changes to 0.99.41 ==
 * Minor patch to clean up an issue where the user was not being properly
   cleared from BlueGene blocks at the end of a run.

== Changes to 0.99.40 ==
 * A nokillpg attribute may be added to a job to cause kill instead of killpg
   to be used for job termination so that a signal handler in a script may
   handle signal propigation itself.
 * Fixed a bug where a job requesting a location could cause Cobalt to drain
   for a job that would require the use of offline hardware
 * Added a check to the job validator where a job is rejected if the job size
   and block geometry have a mismatch.

== Changes to 0.99.39 ==
 * Session information added to job submission information.  This also includes
   submitting host information (Cobalt Ticket #4315)
 * A fix for a bug that can cause "phantom drains" to occur on lightly loaded
   cluster systems. (Cobalt Ticket #4319)

== Changes to 0.99.38 ==
 * Added numerous dependency changes.  Notably, jobs in dep_hold states will no
   longer accrue eligible wait time or score.  Previously, dep_hold jobs would
   accrue eligible wait times unlike admin_hold or user_hold states.
   (Cobalt Ticket #4308)
 * Two display fixes for cluster systems that allow multiple node-name prefixes
   to be handled correctly by the nodelist merging code  and to correctly
   compact names if leading 0's are used in location numbers like
   "foo001.machine". (Cobalt Ticket #4313 and #4277)
 * Corrected two race conditions in the BlueGene/Q system components where a
   location with offline hardware could be selected for draining on a
   scheduling pass.  Though Cobalt would never try to actually run on the
   offline block, It could end up draining resources needlessly, and impact
   machine usage.  The first could happen at any time, the second only occurred
   at the end of booting a block. (Cobalt Ticket #4311)

== Changes to 0.99.37 ==
 * Fixed issue where a reservation could hit it's activation check while it is being
   deactivated, causing those events should be reversed.  Released reservations can't
   activate. (Cobalt Ticket #4289)
 * COBALT_JOBSIZE and COBALT_PARTSIZE have been added to interactive jobs.
   (Cobalt Ticket #4294)
 * Job dependencies require the same project to gain score from eachother.
   (Cobalt Ticket #4293)
 * prologue_helper.py now handles "=" in arguments correctly. (Cobalt Ticket #4292)

== Changes to 0.99.36 ==
 * Fix for issues in get-bootable-blocks and with block booting.  Block names are now
   properly sorted.  A race condition between boot-block and update_block_state has
   been fixed.  The return statuses for boot-block have also been corrected for failed
   boots.  (Cobalt Ticket #4282 and #4280)

== Changes to 0.99.35 ==
 * Fixed an issue where a block being removed in the BGQ control system between block
   detection and the block being added for tracking could result in blocks being
   erroniously marked as idle. (Cobalt Ticket #4268)

== Changes to 0.99.34 ==
 * Fixed an issue where Cobalt could have a memory leak if boot-block.py was
   terminated early the boot tracking object may never be cleaned up.
 * Numerous enhancements and fixes to handle multiple overlaping queues in the
   system with respect to draining and backfilling. (Cobalt Ticket #663)
 * Fixed some typos in the documentation for qsub.
 * COBALT_RESID is now appropriately set in interactive mode jobs (Cobalt Ticket #703)

== Changes to 0.99.33 ==
 * Fixed issue where adding blocks while the system component is running can
   cause the system component's block update loop to fail. (Cobalt Ticket #695)
 * Preliminary fix for a racecondition that can cause subblock jobs to terminate
   early (Cobalt Ticket #690)
 * Added T-Minus to showres (Cobalt Ticket #692)
 * Fixed an error in qalter where attemtpts to change the users on a running job
   caused qalter to fail. (Cobalt Ticket #694)

== Changes to 0.99.32 ==
 * Cluster systems now drain for larger jobs, and will backfill into the
   draining hardware. (Cobalt Ticket #663)
 * Fixed a bug in the prologue_helper where a silent failure on sending the
   hostfile to nodes could occur silently (Cobalt Ticket #686)

== Changes to 0.99.31 == 
 * Interactive mode jobs (qsub -I) are now supported on the BlueGene/Q.
   (Cobalt Ticket #107)
 * Timestamps are now included in cobaltlog lines (Cobalt Ticket #344 and #669)
 * Multiple mode arguments will result in a qsub rejection (Cobalt Ticket #664)
 * Fixed an issue that was causing COBALT_STARTTIME and COBALT_ENDTIME to not be
    set for compute node jobs in cX jobs. (Cobalt Ticket #673)

== Changes to 0.99.30 ==
 * This is a fix release that corrects a bug that can be encountered when upgrading
   to 0.99.27 from prior versions where ion_kerneloptions in cqm is improperly set.
   (Cobalt Ticket #665)

== Changes to 0.99.29 ==
 * Interactive jobs are experimentally available for cluster systems.

== Changes to 0.99.28 ==
 * Cobalt now supports qsub options in script mode jobs similarly to other
   job scheduler scripts.  The form is #Cobalt <qsub option> <option argument>.
   Most qsub options are supported, except for mode.  Jobs using this feature must
   be script mode jobs.  This is similar in style to PBS-directives. (Cobalt Ticket 354)
 * Fixed a bug in cluster systems where a failure duing job startup could result
   in all nodes for a job being stuck in the allocated state (Cobalt Ticket 662)
 * $COBALT_STARTTIME and $COBALT_ENDTIME added to job's environments, these are
   the Unix timestamp for when the job started, and when Cobalt will terminate
   the job due to exceeding walltime. (Cobat Ticket 593)
 * A requested walltime of 0 will set the job to run for the time remaining in
   a reservation. (Cobalt Ticket 474)
 * The --env flag to qsub now aggregates multiple instances of the flag rather
   than overriding the prior instance (Cobalt Ticket 657)

== Changes to 0.99.27 ==
 * Alternate kernel support for BG/Q compute nodes and IO nodes is available.
  Numerous configuration settings have been added to support this.
  (Cobalt ticket 621)
 * If $jobid is included in a job name specification or an environment variable,
  at submission time, Cobalt will substitute the generated jobid for that job.
  See man qsub for more details. (Cobalt ticket 617)
 * The Cobalt jobid is now explicitly added in the cobaltlog file. (Cobalt ticket 390)
 * The name of a job can now be specified seperately from naming the output
  files. (Cobalt ticket 625)
 * Multiple locations can be passed to reservations through the -p option as a
  ':'-delimited list. (Cobalt ticket 405)
 * cqadm can now be used to release user hold, as well as put jobs into a user hold
  as well as an admin hold (Cobalt ticket 382)
 * The users field can be removed from a reservation using setres -m -u '*'. (Cobalt
   ticket 381)
 * Releaseres should now report the correct partiton count on reservation release
  (Cobalt ticket 470)
 * Reservations now show time remaining in showres once the reservation is active.
  (Cobalt ticket 585)
 * Schedctl should handle flipping score and jobid arguments more gracefully and 
  no longer silently discards invalid options. (Cobalt ticket 502)
 * ':' and '=' may now be escaped as '\:' and '\=' in the --envs argument (Cobalt
  ticket 589)
 * Reservations may now take 'now' for the start time of a reservation (Cobalt
  ticket 546)
 * "GEOMETRY", "ION_KERNEL" and "ION_KERNELOPTIONS have been added to the JOB_DATA
  table of the Cobalt database 


== Changes to 0.99.26 ==
 * Corrected a bug that was causing messages in the clients that used to go to
  stdout to go to stderr instead

== Changes to 0.99.25 ==
 * Fixes for clients that were causing erronious default values for proccount
  have been fixed.
 * The setgid wrapper now calls a site-selectable python interperter with the
  '-E' flag by default.
 * A bug that could cause TimeRemaining to show negative values has been fixed
 * Queues may be added or removed from blocks without overwriting the entire
  list with new options in partadm using --rmq or --appq with the --queue option.


== Changes to 0.99.24 ==
 * Reverting client changes.  Found some issues when deploying to test systems
 * --defer flag has been reverted with these as a casualty.

== Changes to 0.99.23 ==
 * Client code has undergone a refactor.  Numerous bugs have been addressed
  with these that were found while improving test coverage of this code
 * IO Block booting and automatic control has been added.  Please consult the
  partadm manpage for details
 * This release requires pybgsched-0.1.3
 * A user may voluntarily set their job's score to 0 and let it re-accrue using
  qalter --defer (Ticket #590)

== Changes to 0.99.22 ==
 * Fix for a deadlock that could occur due to a race condition in the 
  boot-reaping code.
 * Fix for a pid collision that could occur after a forced delete of a job in
  the system-script-forker component leading to a job getting stuck in the
  Job_Epilogue state.
 * python 2.7 and it's change to the XMLRPC libraries are now supported.

== Changes to 0.99.21 ==
 * Minor fix release to correct a failure mode where we can lose a process group
  if a job's boot is delayed.

== Changes to 0.99.20 ==
 * Small changeset to fix a race condition that was hanging jobs that came in
   on 0.99.19
 * Better error logging for situations where the boots and the process group
  information haven't lined up yet.  This usually happens during high-load and
  can result in block boot completion never being detected.

== Changes to 0.99.19 ==
 * Manual boot control has been added
 * major refactor of compute block rebooting

== Changes to 0.99.18 ==
 * Fix for other systems that were seeing erronious geometry data
 * SoftwareFailure now accurately reported

== Changes to 0.99.17 ==

 * Support for --geometry has been added.  This will constrain a job to run 
blocks having a specific geometry in nodes, partadm, partlist and qstat have
been updated to show this.
 * Support for passthrough blocking reservations has been added.
 * Fixed a bug where a job run outside of Cobalt would not properly block resources.
 * Removed autoreboot in the event of a block entering a control system 
  SoftwareFalure state on BG/Q.  Normal cleanup should handle this now.

== Changes to 0.99.15 ==
 * Fix for an issue that was causing partadm to malfunction when reservations
were set on the system.

== Changes to 0.99.12 ==
 * As a note, this is a release of fixes and backported features from BG/Q
 * Fixed issues that could cause a bad hardware state to be overridden as 
"blocked" allowing the block to be drained
 * Fix for a situation where a partition that is inelligible to run due to a
pending reservation is selected for draining.

== Changes to 0.99.11 ==
 * Fix for an additional circumstance that can cause a small reservation to
collapse a backfill window to zero.

== Changes to 0.99.10 ==
 * Fixed an issue that was introduced in 0.99.6 where children were not being
properly added to a block.  This caused strange behavior in reservations,
including failure to run jobs smaller than the reservation or a small job
running on the largest block in a reservation
 * Refactored the update_block_state code to operate more similarly to the way
BG/P is currently doing it.  This should correct an issue where a block can
have an error state overridden.
 * Cleaned up a circumstance where wires between passthrough midplanes can be
added to a block's wire list.  This can cause erronious wiring conflicts to be
detected.
 * Fixed an issue where duplicate wires were being added

== Changes to 0.99.9 ==

 * Fixed an issue where a job could choose a block for draining that is blocked
by a pending reservation.  This would cause the backfill time to collapse to zero.
 * Added code to take blocks offline if passthrough nodeboards reported their state
as in error.  It does not offline them if it is only a compute node down, it must
be the nodeboard itself in error as would be set by a BOARD_IN_ERROR control action.

== Changes to 0.99.8 ==

 * Fixed an issue where dimensions without cables (like the A,B, and C
  dimensions on a 2 midplane system) would cause the system component to fail
  on boot.

== Changes to 0.99.7 ==

 * Cable errors are now properly detected
 * IO Node errors in available links are detected
 * Due to changes, libpybgsched v 0.1.1 is required.  This also
  means the V1R1M1 driver changes can also be enabled.

== Changes to 0.99.6 ==

 * Performance improvements over 0.99.5 in update_block_state and wiring 
  conflict detection.  This makes the partadm commands far more responsive.
 * Corrected an error in a block's children that affected cold (non-statefile)
  startup.

== Changes to 0.99.0pre36 ==

 * Fixed bug in resource reservation that would leave a reserved partition in
  the 'idle' state
 * Updated the code that sets the kernel profile for a partition to set the
  profile even if no profile is currently set


== Changes to 0.99.0pre35 ==

 * Fixed bug where pickling IncrID could fail when deleting db related
  attributes from the object's dictionary


== Changes to 0.99.0pre34 ==

 * Time spent in the hold state is now exported to the utility function
 * bgsystem now tracks forked tasks using a (forker, id) tuple instead of just
  id in order to prevent id collisions in the active jobs list
 * Updated cleanup and resource reservation in bgsystem updated to prevent
  multiple partition cleanups for a single job
 * Forker classes updated so that logging messages properly identify the
  component from which they came
 * Added LOGNAME and SHELL to the environment of script jobs
 * Resolved a bug in the cdbwriter-api where a job object which had no data
  records could endless recurse in _jr_obj.__getattr__ looking for valid fields
 * Implemented support for common job ID pool through database ID generation
 * PYTHONPATH may now be explicitly set when building the wrapper program
 * Fix for cobalt-mpirun where the -env flag was being improperly handled
 * Updated Cobalt's database documentation diagrams


== Changes to 0.99.0pre31 ==

 * Cleaned up reservation logging and database logging
 * Reservations can now be tagged by adding a -A flag to setres
 * A view of reservations with resid, cycleid and project can be obtained via
    the -x flag to showres.


== Changes to 0.99.0pre30 ==

 * Improved logging of reservations to the cobalt database.
    * "ending" event changed to "terminated"  indicates that the reservaiton 
        has been ended and cleaned up.
    * added "deferred" event for when reservations are deferred (setres -D)
    * added "deactivating" event for when reservation come to a natural end
        (i.e. run out of time)
    * added "releasing" event for a user-requested release of a reservation
    * deferrals of reservations now no longer cause new reservation_data 
        table entries to be generated.  The subsequent cycle does that.
 * cluster_system has had a variety of enhancements.  Notably:
    * setting simulation_mode true in the cobalt config file will allow the
        cluster_system component to enter a test mode, where it can be run
        on a development machine without a supporting cluster.
    * nodes are now recognized as being allocated when a location is selected 
        and sent back to the scheduler component as a runnable location.
        A timeout is set for the partition, and by default this is 300 seconds
        (5 minutes).  This can be changed reset at execution time from the 
        cobalt config file.  Should resources be allocated, but the job be 
        terminated prior to a run (prescript failure due to a dead node, or 
        a hasty user kill, for instance), this will ensure that the nodes are
        released.
 * Jobs submitted to cluster_system will now have arguments passed to the job
    recognized, as they are on BlueGene type systems.


== Changes to 0.99.0pre29 ==

 * Arguments that get passed to scripts (not script jobs but pre and 
    postscripts) are now being properly escaped.
 * Cobaltlogs are now being written to by cqm in a separate thread, and
    should allow scheduling to continue in the event of a filesystem hang.
 * qalter has had a bug fixed where it was passing back nodecounts and proccounts
    as a string rather than an int.
 * jobs in cqm that do not have a timer are appropriately destroyed rather
    than entering into an infinite loop of attempting to run the job.
 * Fix for the situation where a resid can be duplicated in the case of a
    cycling reservation


== Changes to 0.99.0pre28 ==

 * Modification to the XMLRPC base proxy class that allows you to turn off 
    automatic retries from the cobalt side on the proxy by requesting no
    retries.  This should be used for non-idempotent tasks like job-launching.
 * Modified cqm's retry behavior to include a runid when running a job, so and 
    retrying so that bgforker doesn't try to start the same job twice.
 * Timezone display corrected.  If available, cobalt will try to use the pytz
    library (Olsen Timezone Database).


== Changes to 0.99.0pre27 ==

 * Restored COBALT_JOBID and COBALT_RESID in back-end jobs. This was erroniously
    removed in 0.99.0pre26.


== Changes to 0.99.0pre26 ==

 * Fixed bug where environmental variables passed into a job from the --env flag
    of qsub and qalter would leak into the environment of bgforker and from 
    there into subsequent mpirun process environments.  These did not leak 
    into the backend-job environments.  Cluster-system component jobs did not
    see issues with this due to an apparent bug in envrionment handling.
 * A job's exit status is now added to the Cobalt Database as exit_status
 * COBALT_RESID is set in mpirun jobs


== Changes to 0.99.0pre25 ==

 * Various timezone formatting changes including:
    * setres takes the reservation time depending on what TZ is set to
    * qstat and showres show times in a unified format, that includes
        offset and three-letter timezone designation
    * showres has a flag --oldts where it shows timestamps in the original
        format for compatibility with old scripts
 * Forker was not handling spaces in arguments to scripts correctly
 * schedctl no longer gives an erronious error message about requiring a job id 
    for --start, --stop 
 * Statemachine state-transition function pointers are now regenerated at 
    restart as well as job initialization, to allow for transparent state-
    machine adjustments


== Changes to 0.99.0pre24 ==

 * Fixes for arguments to pre and post scripts for jobs
 * showres reverted to not include id's
 * Fixes for script-mode jobs


== Changes to 0.99.0pre23 ==

 * Script jobs run with their own shells from forker to prevent 32/64 bit execution environment issues

== Changes to 0.99.0pre22 ==

 * Bgforker modified so that helper scripts can be run
 * KNOWN ISSUE: job preemption is not working correctly in this version.  
 * Resid now added to jobs at runtime, if they are run in a reservation.
 * Fix for partially overlapping partitions being scheduled at the same time
    and properly detecting reservations.

== Changes to 0.99.0pre21 ==

 * Fix to reduce chance of cluster system nodes from being hung-up on job exit.
 * Adding in backfill prediction functionality.  This is by default disabled.
 * Various fixes and updates to the mk_* cobalt backup scripts.  These can be 
    used to restore state should a change that is destructive to statefiles is
    made (like the 0.99.0pre16 to 0.99.0pre17 change).
 * resid and cycleid can be set via setres
 * time.sleep() wrapped to prevent kernel level IOError from creeping out on 
    ppc64 linux platforms.  


== Changes to 0.99.0pre20 ==

 * stderr log messages now contain timestamps
 * ensemble jobs that use -nofree without a -wait free are now cleaned up properly
 * environment variables in cert, key and ca options in config files are expanded