pyslurm

Previous topic

Get the Source Code

This Page

Welcome to PySLURM’s documentation!

Contents:

PySLURM

Here we modify the dlopen settings to RTLD_GLOBAL otherwise we cannot see the symbols correctly.

pyslurm.get_controllers()

Get information about slurm controllers.

Returns:Name of primary controller, Name of backup controller
Return type:tuple
pyslurm.is_controller(Host='')

Return slurm controller status for host.

Parameters:Host (string) – Name of host to check
Returns:None, primary, backup
Return type:string
pyslurm.slurm_api_version()

Return the slurm API version number.

Returns:version_major, version_minor, version_micro
Return type:tuple
pyslurm.slurm_ping(int Controller=1) → int

Issue RPC to check if slurmctld is responsive.

Parameters:Controller (int) – 1 for primary (Default=1), 2 for backup
Returns:0 for success or slurm error code
Return type:int
pyslurm.slurm_reconfigure() → int

Issue RPC to have slurmctld reload its configuration file.

Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_shutdown(uint16_t Options=0) → int

Issue RPC to have slurmctld cease operations, both the primary and backup controller are shutdown.

Parameters:Options (int) – 0 - All slurm daemons (default) 1 - slurmctld generates a core file 2 - slurmctld is shutdown (no core file)
Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_takeover() → int

Issue a RPC to have slurmctld backup controller take over the primary controller.

Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_set_debug_level(uint32_t DebugLevel=0) → int

Set the slurm controller debug level.

Parameters:DebugLevel (int) – 0 (default) to 6
Returns:0 for success, -1 for error and set slurm error number
Return type:int
pyslurm.slurm_set_schedlog_level(uint32_t Enable=0) → int

Set the slurm scheduler debug level.

Parameters:Enable (int) – True = 0, False = 1
Returns:0 for success, -1 for error and set slurm error number
Return type:int
pyslurm.slurm_get_end_time(uint32_t JobID=0) → int

Get the end time in seconds for a slurm job step.

Parameters:JobID (int) – Job identifier
Returns:Remaining time in seconds or -1 on error
Return type:int
pyslurm.slurm_get_rem_time(uint32_t JobID=0) → int

Get the remaining time in seconds for a slurm job step.

Parameters:JobID (int) – Job identifier
Returns:Remaining time in seconds or -1 on error
Return type:int
pyslurm.slurm_get_job_steps(uint32_t JobID=0, uint32_t StepID=0, uint16_t ShowFlags=0)
Loads into details about job steps that satisfy the job_id
and/or step_id specifications provided if the data has been updated since the update_time specified.
Parameters:
  • JobID (int) – Job Identifier
  • StepID (int) – Jobstep Identifier
  • ShowFlags (int) – Display flags (Default=0)
Returns:

Data whose key is the job and step ID

Return type:

dict

pyslurm.slurm_job_step_layout_get(uint32_t JobID=0, uint32_t StepID=0)

Get the slurm job step layout from a given job and step id.

Parameters:
  • JobID (int) – slurm job id (Default=0)
  • StepID (int) – slurm step id (Default=0)
Returns:

List of job step layout.

Return type:

list

pyslurm.slurm_job_node_ready(uint32_t JobID=0) → int

Return if a node could run a slurm job now if despatched.

Parameters:JobID (int) – Job identifier
Returns:Node Ready code
Return type:int
pyslurm.slurm_notify_job(uint32_t JobID=0, char *Msg='') → int

Notify a message to a running slurm job step.

Parameters:
  • JobID (string) – Job identifier (default=0)
  • Msg (string) – Message to send to job
Returns:

0 for success or -1 on error

Return type:

int

pyslurm.slurm_pid2jobid(uint32_t JobPID=0)

Get the slurm job id from a process id.

Parameters:JobPID (int) – Job process id
Returns:0 for success or a slurm error code
Return type:int
Returns:Job Identifier
Return type:int
pyslurm.slurm_kill_job(uint32_t JobID=0, uint16_t Signal=0, uint16_t BatchFlag=0) → int

Terminate a running slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • Signal (int) – Signal to send
  • BatchFlag (int) – Job batch flag (default=0)
Returns:

0 for success or -1 for error and set slurm errno

Return type:

int

pyslurm.slurm_kill_job_step(uint32_t JobID=0, uint32_t JobStep=0, uint16_t Signal=0) → int

Terminate a running slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • Signal (int) – Signal to send (default=0)
Returns:

0 for success or -1 for error, and the slurm error code is set appropriately.

Return type:

int

pyslurm.slurm_signal_job(uint32_t JobID=0, uint16_t Signal=0) → int

Send a signal to a slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • Signal (int) – Signal to send (default=0)
Returns:

0 for success or -1 for error and the set Slurm errno

Return type:

int

pyslurm.slurm_signal_job_step(uint32_t JobID=0, uint32_t JobStep=0, uint16_t Signal=0) → int

Send a signal to a slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • Signal (int) – Signal to send (default=0)
Returns:

Error code - 0 for success or -1 for error and set slurm errno

Return type:

int

pyslurm.slurm_complete_job(uint32_t JobID=0, uint32_t JobCode=0) → int

Complete a running slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobCode (int) – Return code (default=0)
Returns:

0 for success or -1 for error and set slurm errno

Return type:

int

pyslurm.slurm_terminate_job(uint32_t JobID=0) → int

Terminate a running slurm job step.

Parameters:JobID (int) – Job identifier (default=0)
Returns:0 for success or -1 for error and set slurm errno
Return type:int
pyslurm.slurm_terminate_job_step(uint32_t JobID=0, uint32_t JobStep=0) → int

Terminate a running slurm job step.

Parameters:
  • JobID (int) – Job identifier (default=0)
  • JobStep (int) – Job step identifier (default=0)
Returns:

0 for success or -1 for error, and the slurm error code is set appropriately.

Return type:

int

pyslurm.slurm_suspend(uint32_t JobID=0) → int

Suspend a running slurm job.

Parameters:JobID (int) – Job identifier
Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_resume(uint32_t JobID=0) → int

Resume a running slurm job step.

Parameters:JobID (int) – Job identifier
Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_requeue(uint32_t JobID=0) → int

Requeue a running slurm job step.

Parameters:JobID (int) – Job identifier
Returns:0 for success or a slurm error code
Return type:int
pyslurm.slurm_checkpoint_able(uint32_t JobID=0, uint32_t JobStep=0, time_t StartTime=0)

Report if checkpoint operations can presently be issued for the specified slurm job step.

If yes, returns SLURM_SUCCESS and sets start_time if checkpoint operation is presently active. Returns ESLURM_DISABLED if checkpoint operation is disabled.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • StartTime (int) – Checkpoint start time
Returns:

0 can be checkpointed or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_disable(uint32_t JobID=0, uint32_t JobStep=0) → int

Disable checkpoint requests for a given slurm job step.

This can be issued as needed to prevent checkpointing while a job step is in a critical section or for other reasons.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_enable(uint32_t JobID=0, uint32_t JobStep=0) → int

Enable checkpoint requests for a given slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_create(uint32_t JobID=0, uint32_t JobStep=0, uint16_t MaxWait=60, char *ImageDir='') → int

Request a checkpoint for the identified slurm job step and continue its execution upon completion of the checkpoint.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • MaxWait (int) – Maximum time to wait
  • ImageDir (string) – Directory to write checkpoint files
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_vacate(uint32_t JobID=0, uint32_t JobStep=0, uint16_t MaxWait=60, char *ImageDir='') → int

Request a checkpoint for the identified slurm Job Step. Terminate its execution upon completion of the checkpoint.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • MaxWait (int) – Maximum time to wait
  • ImageDir (string) – Directory to store checkpoint files
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_restart(uint32_t JobID=0, uint32_t JobStep=0, uint16_t Stick=0, char *ImageDir='') → int

Request that a previously checkpointed slurm job resume execution.

It may continue execution on different nodes than were originally used. Execution may be delayed if resources are not immediately available.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • Stick (int) – Stick to nodes previously running om
  • ImageDir (string) – Directory to find checkpoint image files
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_complete(uint32_t JobID=0, uint32_t JobStep=0, time_t BeginTime=0, uint32_t ErrorCode=0, char *ErrMsg='') → int

Note that a requested checkpoint has been completed.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • BeginTime (int) – Begin time of checkpoint
  • ErrorCode (int) – Error code, highest value fore all complete calls is preserved
  • ErrMsg (string) – Error message, preserved for highest error code
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_task_complete(uint32_t JobID=0, uint32_t JobStep=0, uint32_t TaskID=0, time_t BeginTime=0, uint32_t ErrorCode=0, char *ErrMsg='') → int

Note that a requested checkpoint has been completed.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • TaskID (int) – Task identifier
  • BeginTime (int) – Begin time of checkpoint
  • ErrorCode (int) – Error code, highest value fore all complete calls is preserved
  • ErrMsg (string) – Error message, preserved for highest error code
Returns:

0 for success or a slurm error code

Return type:

int

pyslurm.slurm_checkpoint_error(uint32_t JobID=0, uint32_t JobStep=0)

Get error information about the last checkpoint operation for a given slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
Returns:

0 for success or a slurm error code

Return type:

tuple

Returns:

Slurm error message

Return type:

string

pyslurm.slurm_checkpoint_tasks(uint32_t JobID=0, uint16_t JobStep=0, uint16_t MaxWait=60, char *NodeList='') → int

Send checkpoint request to tasks of specified slurm job step.

Parameters:
  • JobID (int) – Job identifier
  • JobStep (int) – Job step identifier
  • MaxWait (int) – Seconds to wait for the operation to complete
  • NodeList (string) – String of nodelist
Returns:

0 for success, non zero on failure and with errno set

Return type:

tuple

Returns:

Error message

Return type:

string

pyslurm.slurm_get_errno()

Return the slurm error as set by a slurm API call.

Returns:slurm error number
Return type:int
pyslurm.slurm_seterrno(int Errno=0)

Set the slurm error number.

Parameters:Errno (int) – slurm error number
pyslurm.slurm_perror(char *Msg='')

Print to standard error the supplied header followed by a colon followed by a text description of the last Slurm error code generated.

Parameters:Msg (string) – slurm program error String
pyslurm.slurm_strerror(int Errno=0)

Return slurm error message represented by slurm error number

Parameters:Errno (int) – slurm error number.
Returns:slurm error string
Return type:string
class pyslurm.block

Class to access/update slurm block Information.

find(self, char *name='', char *val='')

Search for a property and associated value in the retrieved block data.

Parameters:
  • name (str) – key string to search
  • value (str) – value string to match
Returns:

List of IDs that match

Return type:

list

find_id(self, char *blockID='')

Retrieve block ID data.

Parameters:str (int) – Block key string to search
Returns:Dictionary of values for given block key
Return type:dict
get(self)

Get slurm block information.

Returns:Dictionary whose key is the Block ID
Return type:dict
ids(self)

Return the block IDs from retrieved data.

Returns:Dictionary of block IDs
Return type:dict
lastUpdate(self)

Get the time (epoch seconds) the retrieved data was updated.

Returns:epoch seconds
Return type:integer
load(self)

Load slurm block information.

print_info_msg(self, int oneLiner=False)

Output information about all Bluegene blocks

This is based upon data returned by the slurm_load_block.

Parameters:oneLiner (int) – Print information on one line - False (Default), True
update(self, char *blockID='', int blockOP=0)

Update slurm block to a given state.

Parameters:
  • blockID (string) – The ID string of the block
  • blockOP (int) –

    The block operation to perform (default=0).

    FREE 0 RECREATE 1 #HAVE_BGL

    READY 2 BUSY 3
    #else
    REBOOTING 2 READY 3

    RESUME 4 ERROR 5 REMOVE 6 ==========================

Returns:

0 for success or -1 for failure and the slurm error code is set appropiately.

Return type:

int

update_error(self, char *blockID='')

Set slurm block to ERROR state.

Parameters:blockID (string) – The ID string of the block
update_free(self, char *blockID='')

Set slurm block to FREE state.

Parameters:blockID (string) – The ID string of the block
update_recreate(self, char *blockID='')

Set slurm block to RECREATE state.

Parameters:blockID (string) – The ID string of the block
update_remove(self, char *blockID='')

Set slurm block to REMOVE state.

Parameters:blockID (string) – The ID string of the block
update_resume(self, char *blockID='')

Set slurm block to RESUME state.

Parameters:blockID (string) – The ID string of the block
class pyslurm.config

Class to access slurm config Information.

display_all(self)

Prints the contents of the data structure loaded by the slurm_load_ctl_conf function.

find_id(self, char *keyID='')

Retrieve config ID data.

Parameters:str (int) – Config key string to search
Returns:Dictionary of values for given config key
Return type:dict
get(self)

Return the slurm control configuration information.

Returns:Configuration data
Return type:dict
ids(self)

Return the config IDs from retrieved data.

Returns:Dictionary of config key IDs
Return type:dict
key_pairs(self)

Return a dict of the slurm control data as key pairs.

Returns:Dictionary of slurm key-pair values
Return type:dict
lastUpdate(self)

Get the time (epoch seconds) the retrieved data was updated.

Returns:epoch seconds
Return type:integer
class pyslurm.hostlist

Wrapper around slurm hostlist functions.

count(self) → int
create(self, char *HostList='') → int
destroy(self)
find(self, char *Host='') → int
get(self)
pop(self)
push(self, char *Hosts) → int
uniq(self)
class pyslurm.job

Class to access/modify Slurm Job Information.

find(self, char *name='', char *val='')

Search for a property and associated value in the retrieved job data.

Parameters:
  • name (str) – key string to search
  • value (str) – value string to match
Returns:

List of IDs that match

Return type:

list

find_id(self, char *jobID='')

Retrieve job ID data.

Parameters:str (int) – Job id key string to search
Returns:Dictionary of values for given job id
Return type:dict
get(self)
ids(self)

Return the job IDs from retrieved data.

Returns:Dictionary of job IDs
Return type:dict
lastUpdate(self)

Get the time (epoch seconds) the job data was updated.

Returns:epoch seconds
Return type:integer
load(self)

Load slurm job information.

print_job_info_msg(self, int oneLiner=0)

Prints the contents of the data structure describing all job step records loaded by the slurm_get_job_steps function.

Parameters:Flag (int) – Default=0
class pyslurm.node

Class to access/modify/update Slurm Node Information.

find(self, char *name='', char *val='')

Search for a property and associated value in the retrieved node data.

Parameters:
  • name (str) – key string to search
  • value (str) – value string to match
Returns:

List of IDs that match

Return type:

list

find_id(self, char *nodeID='')

Retrieve node ID data.

Parameters:str (int) – Node key string to search
Returns:Dictionary of values for given node
Return type:dict
get(self)

Get slurm node information.

Returns:Data whose key is the node name.
Return type:dict
ids(self)

Return the node IDs from retrieved data.

Returns:Dictionary of node IDs
Return type:dict
lastUpdate(self)

Return last time (sepoch seconds) the node data was updated.

Returns:epoch seconds
Return type:integer
load(self)

Load slurm node data.

print_node_info_msg(self, int oneLiner=False)

Output information about all slurm nodes.

Parameters:oneLiner (int) – Print on one line - False (Default) or True
update(self, dict node_dict={})

Update slurm node information.

Parameters:node_dict (dict) – A populated node dictionary, an empty one is created by create_node_dict
Returns:0 for success or -1 for error, and the slurm error code is set appropriately.
Return type:int
class pyslurm.partition

Class to access/modify Slurm Partition Information.

create(self, dict Partition_dict={}) → int

Create a slurm partition.

Parameters:partition_dict (dict) – A populated partition dictionary, an empty one can be created by create_partition_dict
Returns:0 for success or -1 for error, and the slurm error code is set appropriately.
Return type:int
delete(self, char *PartID='') → int

Delete a give slurm partition.

Parameters:PartID (string) – Name of slurm partition
Returns:0 for success else set the slurm error code as appropriately.
Return type:int
find(self, char *name='', char *val='')

Search for a property and associated value in the retrieved partition data.

Parameters:
  • name (str) – key string to search
  • value (str) – value string to match
Returns:

List of IDs that match

Return type:

list

find_id(self, char *partID='')

Retrieve partition ID data.

Parameters:str (int) – Partition key to search
Returns:Dictionary of values for given partition key
Return type:dict
get(self)

Get the slurm partition data from a previous load partition method.

Returns:Partition data, key is the partition ID
Return type:dict
ids(self)

Return the partition IDs from retrieved data.

Returns:Dictionary of partition IDs
Return type:dict
lastUpdate(self)

Get the time (epoch seconds) the retrieved data was updated.

Returns:epoch seconds
Return type:integer
load(self)

Load slurm partition information.

print_info_msg(self, int oneLiner=False)

Display the partition information from previous load partition method.

Parameters:Flag (int) – Display on one line (default=0)
update(self, dict Partition_dict={})

Update a slurm partition.

Parameters:partition_dict (dict) – A populated partition dictionary, an empty one is created by create_partition_dict
Returns:0 for success, -1 for error, and the slurm error code is set appropriately.
Return type:int
class pyslurm.reservation

Class to access/update slurm reservation Information.

create(self, dict reservation_dict={})

Create slurm reservation.

delete(self, char *ResID='')

Delete slurm reservation.

find(self, char *name='', char *val='')

Search for a property and associated value in the retrieved reservation data

Parameters:
  • name (str) – key string to search
  • value (str) – value string to match
Returns:

List of IDs that match

Return type:

list

find_id(self, char *resID='')

Retrieve reservation ID data.

Parameters:resID (str) – Reservation key string to search
Returns:Dictionary of values for given reservation key
Return type:dict
get(self)

Get slurm reservation information.

Returns:Data whose key is the Reservation ID
Return type:dict
ids(self)

Return a list of reservation IDs from retrieved data.

Returns:Dictionary of reservation IDs
Return type:dict
lastUpdate(self)

Get the time (epoch seconds) the reservation data was updated.

Returns:epoch seconds
Return type:integer
load(self)
print_reservation_info_msg(self, int oneLiner=False)

Output information about all slurm reservations.

Parameters:Flags (int) – Print on one line - False (Default) or True
update(self, dict reservation_dict={})

Update a slurm reservation attributes.

class pyslurm.trigger
clear(self, uint32_t TriggerID=-1, uint32_t UserID=-1, char *ID='') → int

Clear or remove a slurm trigger.

Parameters:
  • TriggerID (string) – Trigger Identifier
  • UserID (string) – User Identifier
  • ID (string) – Job Identifier
Returns:

0 for success or a slurm error code

Return type:

int

get(self)

Get the information on slurm triggers.

Returns:Where key is the trigger ID
Return type:dict
pull(self, uint32_t TriggerID=0, uint32_t UserID=0, char *ID='') → int

Pull a slurm trigger.

Parameters:
  • TriggerID (int) – Trigger Identifier
  • UserID (int) – User Identifier
  • ID (string) – Job Identifier
Returns:

0 for success or a slurm error code

Return type:

int

set(self, dict trigger_dict={}) → int

Set or create a slurm trigger.

Parameters:trigger_dict (dict) – A populated dictionary of trigger information
Returns:0 for success or -1 for error, and the slurm error code is set appropriately.
Return type:int