Contents:
PySLURM
Here we modify the dlopen settings to RTLD_GLOBAL otherwise we cannot see the symbols correctly.
Get information about slurm controllers.
| Returns: | Name of primary controller, Name of backup controller |
|---|---|
| Return type: | tuple |
Return slurm controller status for host.
| Parameters: | Host (string) – Name of host to check |
|---|---|
| Returns: | None, primary, backup |
| Return type: | string |
Return the slurm API version number.
| Returns: | version_major, version_minor, version_micro |
|---|---|
| Return type: | tuple |
Issue RPC to check if slurmctld is responsive.
| Parameters: | Controller (int) – 1 for primary (Default=1), 2 for backup |
|---|---|
| Returns: | 0 for success or slurm error code |
| Return type: | int |
Issue RPC to have slurmctld reload its configuration file.
| Returns: | 0 for success or a slurm error code |
|---|---|
| Return type: | int |
Issue RPC to have slurmctld cease operations, both the primary and backup controller are shutdown.
| Parameters: | Options (int) – 0 - All slurm daemons (default) 1 - slurmctld generates a core file 2 - slurmctld is shutdown (no core file) |
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Issue a RPC to have slurmctld backup controller take over the primary controller.
| Returns: | 0 for success or a slurm error code |
|---|---|
| Return type: | int |
Set the slurm controller debug level.
| Parameters: | DebugLevel (int) – 0 (default) to 6 |
|---|---|
| Returns: | 0 for success, -1 for error and set slurm error number |
| Return type: | int |
Set the slurm scheduler debug level.
| Parameters: | Enable (int) – True = 0, False = 1 |
|---|---|
| Returns: | 0 for success, -1 for error and set slurm error number |
| Return type: | int |
Get the end time in seconds for a slurm job step.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | Remaining time in seconds or -1 on error |
| Return type: | int |
Get the remaining time in seconds for a slurm job step.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | Remaining time in seconds or -1 on error |
| Return type: | int |
| Parameters: |
|
|---|---|
| Returns: | Data whose key is the job and step ID |
| Return type: | dict |
Get the slurm job step layout from a given job and step id.
| Parameters: |
|
|---|---|
| Returns: | List of job step layout. |
| Return type: | list |
Return if a node could run a slurm job now if despatched.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | Node Ready code |
| Return type: | int |
Notify a message to a running slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 on error |
| Return type: | int |
Get the slurm job id from a process id.
| Parameters: | JobPID (int) – Job process id |
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
| Returns: | Job Identifier |
| Return type: | int |
Terminate a running slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for error and set slurm errno |
| Return type: | int |
Terminate a running slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |
Send a signal to a slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for error and the set Slurm errno |
| Return type: | int |
Send a signal to a slurm job step.
| Parameters: |
|
|---|---|
| Returns: | Error code - 0 for success or -1 for error and set slurm errno |
| Return type: | int |
Complete a running slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for error and set slurm errno |
| Return type: | int |
Terminate a running slurm job step.
| Parameters: | JobID (int) – Job identifier (default=0) |
|---|---|
| Returns: | 0 for success or -1 for error and set slurm errno |
| Return type: | int |
Terminate a running slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |
Suspend a running slurm job.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Resume a running slurm job step.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Requeue a running slurm job step.
| Parameters: | JobID (int) – Job identifier |
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Report if checkpoint operations can presently be issued for the specified slurm job step.
If yes, returns SLURM_SUCCESS and sets start_time if checkpoint operation is presently active. Returns ESLURM_DISABLED if checkpoint operation is disabled.
| Parameters: |
|
|---|---|
| Returns: | 0 can be checkpointed or a slurm error code |
| Return type: | int |
Disable checkpoint requests for a given slurm job step.
This can be issued as needed to prevent checkpointing while a job step is in a critical section or for other reasons.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Enable checkpoint requests for a given slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Request a checkpoint for the identified slurm job step and continue its execution upon completion of the checkpoint.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Request a checkpoint for the identified slurm Job Step. Terminate its execution upon completion of the checkpoint.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Request that a previously checkpointed slurm job resume execution.
It may continue execution on different nodes than were originally used. Execution may be delayed if resources are not immediately available.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Note that a requested checkpoint has been completed.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Note that a requested checkpoint has been completed.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Get error information about the last checkpoint operation for a given slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | tuple |
| Returns: | Slurm error message |
| Return type: | string |
Send checkpoint request to tasks of specified slurm job step.
| Parameters: |
|
|---|---|
| Returns: | 0 for success, non zero on failure and with errno set |
| Return type: | tuple |
| Returns: | Error message |
| Return type: | string |
Return the slurm error as set by a slurm API call.
| Returns: | slurm error number |
|---|---|
| Return type: | int |
Set the slurm error number.
| Parameters: | Errno (int) – slurm error number |
|---|
Print to standard error the supplied header followed by a colon followed by a text description of the last Slurm error code generated.
| Parameters: | Msg (string) – slurm program error String |
|---|
Return slurm error message represented by slurm error number
| Parameters: | Errno (int) – slurm error number. |
|---|---|
| Returns: | slurm error string |
| Return type: | string |
Class to access/update slurm block Information.
Search for a property and associated value in the retrieved block data.
| Parameters: |
|
|---|---|
| Returns: | List of IDs that match |
| Return type: | list |
Retrieve block ID data.
| Parameters: | str (int) – Block key string to search |
|---|---|
| Returns: | Dictionary of values for given block key |
| Return type: | dict |
Get slurm block information.
| Returns: | Dictionary whose key is the Block ID |
|---|---|
| Return type: | dict |
Return the block IDs from retrieved data.
| Returns: | Dictionary of block IDs |
|---|---|
| Return type: | dict |
Get the time (epoch seconds) the retrieved data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Load slurm block information.
Output information about all Bluegene blocks
This is based upon data returned by the slurm_load_block.
| Parameters: | oneLiner (int) – Print information on one line - False (Default), True |
|---|
Update slurm block to a given state.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or -1 for failure and the slurm error code is set appropiately. |
| Return type: | int |
Set slurm block to ERROR state.
| Parameters: | blockID (string) – The ID string of the block |
|---|
Set slurm block to FREE state.
| Parameters: | blockID (string) – The ID string of the block |
|---|
Set slurm block to RECREATE state.
| Parameters: | blockID (string) – The ID string of the block |
|---|
Set slurm block to REMOVE state.
| Parameters: | blockID (string) – The ID string of the block |
|---|
Set slurm block to RESUME state.
| Parameters: | blockID (string) – The ID string of the block |
|---|
Class to access slurm config Information.
Prints the contents of the data structure loaded by the slurm_load_ctl_conf function.
Retrieve config ID data.
| Parameters: | str (int) – Config key string to search |
|---|---|
| Returns: | Dictionary of values for given config key |
| Return type: | dict |
Return the slurm control configuration information.
| Returns: | Configuration data |
|---|---|
| Return type: | dict |
Return the config IDs from retrieved data.
| Returns: | Dictionary of config key IDs |
|---|---|
| Return type: | dict |
Return a dict of the slurm control data as key pairs.
| Returns: | Dictionary of slurm key-pair values |
|---|---|
| Return type: | dict |
Get the time (epoch seconds) the retrieved data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Wrapper around slurm hostlist functions.
Class to access/modify Slurm Job Information.
Search for a property and associated value in the retrieved job data.
| Parameters: |
|
|---|---|
| Returns: | List of IDs that match |
| Return type: | list |
Retrieve job ID data.
| Parameters: | str (int) – Job id key string to search |
|---|---|
| Returns: | Dictionary of values for given job id |
| Return type: | dict |
Return the job IDs from retrieved data.
| Returns: | Dictionary of job IDs |
|---|---|
| Return type: | dict |
Get the time (epoch seconds) the job data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Load slurm job information.
Prints the contents of the data structure describing all job step records loaded by the slurm_get_job_steps function.
| Parameters: | Flag (int) – Default=0 |
|---|
Class to access/modify/update Slurm Node Information.
Search for a property and associated value in the retrieved node data.
| Parameters: |
|
|---|---|
| Returns: | List of IDs that match |
| Return type: | list |
Retrieve node ID data.
| Parameters: | str (int) – Node key string to search |
|---|---|
| Returns: | Dictionary of values for given node |
| Return type: | dict |
Get slurm node information.
| Returns: | Data whose key is the node name. |
|---|---|
| Return type: | dict |
Return the node IDs from retrieved data.
| Returns: | Dictionary of node IDs |
|---|---|
| Return type: | dict |
Return last time (sepoch seconds) the node data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Load slurm node data.
Output information about all slurm nodes.
| Parameters: | oneLiner (int) – Print on one line - False (Default) or True |
|---|
Update slurm node information.
| Parameters: | node_dict (dict) – A populated node dictionary, an empty one is created by create_node_dict |
|---|---|
| Returns: | 0 for success or -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |
Class to access/modify Slurm Partition Information.
Create a slurm partition.
| Parameters: | partition_dict (dict) – A populated partition dictionary, an empty one can be created by create_partition_dict |
|---|---|
| Returns: | 0 for success or -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |
Delete a give slurm partition.
| Parameters: | PartID (string) – Name of slurm partition |
|---|---|
| Returns: | 0 for success else set the slurm error code as appropriately. |
| Return type: | int |
Search for a property and associated value in the retrieved partition data.
| Parameters: |
|
|---|---|
| Returns: | List of IDs that match |
| Return type: | list |
Retrieve partition ID data.
| Parameters: | str (int) – Partition key to search |
|---|---|
| Returns: | Dictionary of values for given partition key |
| Return type: | dict |
Get the slurm partition data from a previous load partition method.
| Returns: | Partition data, key is the partition ID |
|---|---|
| Return type: | dict |
Return the partition IDs from retrieved data.
| Returns: | Dictionary of partition IDs |
|---|---|
| Return type: | dict |
Get the time (epoch seconds) the retrieved data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Load slurm partition information.
Display the partition information from previous load partition method.
| Parameters: | Flag (int) – Display on one line (default=0) |
|---|
Update a slurm partition.
| Parameters: | partition_dict (dict) – A populated partition dictionary, an empty one is created by create_partition_dict |
|---|---|
| Returns: | 0 for success, -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |
Class to access/update slurm reservation Information.
Create slurm reservation.
Delete slurm reservation.
Search for a property and associated value in the retrieved reservation data
| Parameters: |
|
|---|---|
| Returns: | List of IDs that match |
| Return type: | list |
Retrieve reservation ID data.
| Parameters: | resID (str) – Reservation key string to search |
|---|---|
| Returns: | Dictionary of values for given reservation key |
| Return type: | dict |
Get slurm reservation information.
| Returns: | Data whose key is the Reservation ID |
|---|---|
| Return type: | dict |
Return a list of reservation IDs from retrieved data.
| Returns: | Dictionary of reservation IDs |
|---|---|
| Return type: | dict |
Get the time (epoch seconds) the reservation data was updated.
| Returns: | epoch seconds |
|---|---|
| Return type: | integer |
Output information about all slurm reservations.
| Parameters: | Flags (int) – Print on one line - False (Default) or True |
|---|
Update a slurm reservation attributes.
Clear or remove a slurm trigger.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Get the information on slurm triggers.
| Returns: | Where key is the trigger ID |
|---|---|
| Return type: | dict |
Pull a slurm trigger.
| Parameters: |
|
|---|---|
| Returns: | 0 for success or a slurm error code |
| Return type: | int |
Set or create a slurm trigger.
| Parameters: | trigger_dict (dict) – A populated dictionary of trigger information |
|---|---|
| Returns: | 0 for success or -1 for error, and the slurm error code is set appropriately. |
| Return type: | int |