PyLoadL

Python Bindings for IBM LoadLeveler


 

Contents

Introduction

Example Code

API :

Download

Issues / ToDo

Miscellaneous

Links / Info

Workload Management API

Workload Management API


Synopsis


  (rc, errObj) = ll_cluster( host_list, CLUSTER_SET | CLUSTER_UNSET )

  (rc, errObj) = ll_cluster_auth()

  rc = ll_control( control_op, host_list, user_list, job_list, class_list, priority )

  rc = llctl( LL_CONTROL_RECYCLE | LL_CONTROL_RECONFIG | 
              LL_CONTROL_START | LL_CONTROL_STOP |
	      LL_CONTROL_DRAIN | LL_CONTROL_DRAIN_STARTD |
              LL_CONTROL_DRAIN_SCHEDD | LL_CONTROL_PURGE_SCHEDD | 
	      LL_CONTROL_FLUSH | LL_CONTROL_SUSPEND | LL_CONTROL_RESUME |
	      LL_CONTROL_RESUME_STARTD | LL_CONTROL_RESUME_SCHEDD | 
	      LL_CONTROL_FAVOR_JOB | LL_CONTROL_UNFAVOR_JOB |
              LL_CONTROL_FAVOR_USER | LL_CONTROL_UNFAVOR_USER |
              LL_CONTROL_HOLD_USER | LL_CONTROL_HOLD_SYSTEM | 
	      LL_CONTROL_HOLD_RELEASE | LL_CONTROL_PRIO_ABS |
	      LL_CONTROL_PRIO_ADJ | LL_CONTROL_START_DRAINED |
              LL_CONTROL_DUMP_LOGS,
              host_list, class_list )

  rc = llfavorjob( LL_CONTROL_FAVOR_JOB | LL_CONTROL_UNFAVOR_JOB, job_list )

  rc = llfavoruser( LL_CONTROL_FAVOR_USER | LL_CONTROL_UNFAVOR_USER, user_list )

  rc = llhold( LL_CONTROL_HOLD_USER | LL_CONTROL_HOLD_SYSTEM |
               LL_CONTROL_HOLD_RELEASE, host_list, user_list, job_list )

  (rc, errObj) = ll_modify( EXECUTION_FACTOR | CONSUMABLE_CPUS |
                            CONSUMABLE_MEMORY | WCLIMIT_ADD_MIN |
                            JOB_CLASS | ACCOUNT_NO | STEP_PREEMPTABLE |
                            SYSPRIO | BG_SIZE | BG_SHAPE | BG_CONNECTION |
                            BG_PARTITION | BG_ROTATE | BG_REQUIREMENTS |
                            RESOURCES | NODE_RESOURCES, value, job_step )

  (rc, errObj) = ll_move_job( job_id, cluster_name )

  rc = llprio( LL_CONTROL_PRIO_ABS | LL_CONTROL_PRIO_ADJ, job_list, priority )
  
  (rc, errObj) = ll_preempt( job_step_id, PREEMPT_STEP | RESUME_STEP | SYSTEM_PREEMPT_STEP )

  (rc, errObj) = ll_preempt_jobs( user_list, host_list, job_list, PREEMPT_STEP | RESUME_STEP,
                                  LL_PREEMPT_SUSPEND | LL_PREEMPT_VACATE | LL_PREEMPT_REMOVE
                                  LL_PREEMPT_SYS_HOLD | LL_PREEMPT_USER_HOLD )

  (rc, errObj) = ll_run_scheduler()

  rc = ll_start_job_ext( cluster, proc, from_host, node_list )
  
  rc = ll_terminate_job( cluster, proc, from_host, msg )

API Functions


The LoadLeveler Workload Management API via PyLoadL has the following functions:


ll_cluster

Function to set following function calls on a selected cluster or unselect a previous selected cluster.

  (rc, errObj) = ll_cluster( cluster_list, cluster_op )

Parameters

  1. cluster_list

    List which is currently restricted to a list of one cluster.

  2. cluster_op

    • CLUSTER_SET - select cluster
    • CLUSTER_UNSET - unselect cluster
ll_cluster_auth

Function to generate SSL keys, necessary for secure multicluster communications.

  (rc, errObj) = ll_cluster_auth()

ll_control

Function to perform control operations against hosts, jobs, users or job classes.

  rc = ll_control( control_op, host_list, user_list, job_list, class_list, priority )

Parameters

  1. control_op


  2. host_list

    List of host machines to perform control operation on.

  3. user_list

    List of users to perform control operation on.

  4. job_list

    List of job step IDs to perform control operation on.

  5. class_list

    List of users to perform control operation on.

  6. priority

    Value to be assigned fro control operation.

llfavoruser

Function to favour and unfavour given users, this is really just a wrapper function of ll_control.

  rc = llfavoruser( LL_CONTROL_FAVOR_USER | LL_CONTROL_UNFAVOR_USER, user_list )

Parameters

  1. Operation

    • LL_CONTROL_FAVOR_USER : Favour the users in user_list.
    • LL_CONTROL_UNFAVOR_USER : Unfavour the users in user_list.

  2. user_list

    List of users to perform hold operation on.

llhold

Function to hold and release given job steps or users, this is really just a wrapper function of ll_control.

   rc = llhold( LL_CONTROL_HOLD_USER | LL_CONTROL_HOLD_SYSTEM | LL_CONTROL_HOLD_RELEASE, host_list, user_list, job_list )

Parameters

  1. Hold Operation

    • LL_CONTROL_HOLD_USER

      Place on user hold.

    • LL_CONTROL_HOLD_SYSTEM

      Place on system hold, you need to be a LoadLeveler administer to perfrom this operation.

    • LL_CONTROL_HOLD_RELEASE

      Release from hold, you need to be a LoadLeveler adminster to perfrom this against system held jobs.

  2. host_list

    List of host machines.

  3. user_list

    List of users to perform hold operation on.

  4. job_list

    List of job step IDs to perform hold operation on.

llprio

Function to adjust the priorities of job steps, this is really just a wrapper function of ll_control.

  rc = llprio( LL_CONTROL_PRIO_ABS | LL_CONTROL_PRIO_ADJ, job_list, priority )

Parameters

  1. Priority Operation

    • LL_CONTROL_PRIO_ABS : New absolute priority value.
    • LL_CONTROL_PRIO_ADJ : New adjusted priority value.

  2. job_list

    List of job step IDs.

  3. priority

    Priority value to assign to the list of job step IDs.

ll_preempt

Function to preempt a running job step or to resume a job_step that has already been preempted through the LoadLeveler llpreempt command or via ll_preempt. ll_preempt cannot resume a job step preempted through PREEMPT_CLASS (system-initiated).

  (rc, errObj) = ll_preempt( job_step, preempt_op )

Parameters

  1. job_step - The Job Step ID.
  2. preempt_op - Preemption operation, which can be the following -

    • PREEMPT_STEP - Preempts the job step ID.
    • RESUME_STEP - Resumes the job step ID.
ll_preempt_jobs

Function to preempt a set of running job steps using the specified preempt method, or to resume job steps that have already been preempted with the preempt method of suspend through the llpreempt command or the ll_preempt_jobs routine. The ll_preempt_jobs routine cannot resume a job step that was preempted through the PREEMPT_CLASS rules, or a job step that was preempted with a preempt method other than suspend.

  (rc, errObj) = ll_preempt_jobs( user_list, host_list, job_list, preempt_op, preempt_method )

Parameters

  1. user_list

    List of users to be targeted.

  2. host_list

    List of hosts to be targeted.

  3. job_list

    List of job step IDs in the form host.job_id.step_id i.e shivling.5.0

  4. preempt_op - Preemption operation to perform

    • PREEMPT_STEP

      Preempts the job step.

    • RESUME_STEP

      Resumes the job step.

  5. preempt_method - Preemption method to perform

    • LL_PREEMPT_SUSPEND

      Preempts the job step.

    • LL_PREEMPT_VACATE

      Resumes the job step.

    • LL_PREEMPT_REMOVE

      Resumes the job step.

    • LL_PREEMPT_SYS_HOLD

      Resumes the job step.

    • LL_PREEMPT_USER_HOLD

      Resumes the job step.

ll_modify

Function to modify the attributes of the submitted job step. This interface only supports one job step ID, the API also only allows one job step at present but it is designed for expansion, therefore this interface may change in the future.

  (rc, errObj) = ll_modify( modify_op, value, job_step )

Parameters

  1. modify_op - The modify operation to perform.

    • EXECUTION FACTOR : New execution factor, modify_data input is a numeric.
    • CONSUMABLE_CPUS : New consumable cpus value, modify_data input is a numeric.
    • CONSUMABLE_MEMORY : New consumable memory in megabytes, modify_data input is a numeric.
    • WCLIMIT_ADD_MIN : Additional minutes to add to hard wallclock limit, modify_data input is a numeric.
    • JOB_CLASS : New job class, modify_data input is a string.
    • ACCOUNT_NO : Changes the account number to the specified value for an idle-like job step.
    • STEP_PREEMPTABLE : Specifies whether a job is preemptable or nonpreemptable.
    • SYSPRIO : Changes the q_sysprio for a job step to the specified integer value. The new job step priority will be fixed. This is a LoadLeveler administrator only option.
    • BG_SIZE : Changes the size of an idle-like Blue Gene job. The subsequent value argument must be an integer in units of compute nodes. If this value is specified, any value previously specified for bg_shape or bg_partition will be ignored.
    • BG_SHAPE : Changes the shape of an idle-like Blue Gene job. The subsequent value argument must be of the form "XxYxZ", where X, Y, and Z are integers in units of the number of base partitions. If this value is specified, any value previously specified for bg_size or bg_partition will be ignored.
    • BG_CONNECTION : Changes the connection option of an idle-like Blue Gene job. The subsequent value argument must be a string that is either TORUS, MESH, or PREFER_TORUS.
    • BG_PARITION : Changes the requested partition ID of an idle-like Blue Gene job. If this value is specified, any value specified for bg_connection, bg_shape, bg_size, or bg_rotate will be ignored.
    • BG_ROTATE : Changes the rotate option of an idle-like Blue Gene job. The subsequent value argument must be a string that is either True or False.
    • BG_REQUIREMENTS : Changes the memory requirement that a Blue Gene base partition in the LoadLeveler cluster must meet to run an idle-like Blue Gene job. The subsequent value option must be an expression. Memory is the only variable that is supported. bg_requirements cannot be modified if bg_partition is already specified.
    • RESOURCES : Replaces the task resource requirements specified in the job command file at submit time. The entire resource requirement must be specified. The rules for the syntax of the resources_string are the same as the rules for the corresponding job command file keywords. Only a job step in an idle-like state can be changed. Any resource requirement that was originally specified and is omitted from this string will be removed from the job step. The command will fail if you specify the same resources in both the resources and node_resources statements.
    • NODE_RESOURCES : Replaces the node resource requirements specified in the job command file at submit time. The entire resource requirement must be specified. The rules for the syntax of the resources_string are the same as the rules for the corresponding job command file keywords. Only a job step in an idle-like state can be changed. Any resource requirement that was originally specified and is omitted from this string will be removed from the job step. The command will fail if you specify the same resources in both the resources and node_resources statements.
  2. modify_data

    The new data value for modify_op.

  3. job_step

    String representing the job step ID.

ll_run_scheduler

This is used when the internal scheduling interval has been disabled so that an external program can control when the central manager attempts to schedule job steps. The ll_run_scheduler subroutine sends a request to the central manager to run the scheduling algorithm.

  (rc, errObj) = ll_run_scheduler()

ll_start_job_ext

Function to instruct the LoadLeveler negotiator to start a job on the specified nodes and adapters. This is meant for use by people writing external schedulers.

  rc = ll_start_job_ext( step_id, node_list, adapter_list )

Parameters

    List of node names where the job will be started. The first member of the list is the parallel master node.
  1. step_id

    String representing the job step ID.

  2. node_list

    List of node names where the job will be started. The first member of the list is the parallel master node.

  3. adapter_list

    List of lists containing adapter information for each node. The members of the list are :

    • dev_name

      Device name of adapter to be used such as css0

    • protocol

      Communication protocol this usage supports. Valid values are MPI, LAPI, and MPI_LAPI.

    • subsystem

      Communication subsystem this usage supports. Valid values are IP or US.

    • wid

      For US subsystem usages, this indicates which adapter window ID to use. For IP subsystem usages, this field is ignored.

    • mem

      For US subsystem usages, this is the amount of adapter memory to dedicate to the adapter usage. For IP subsystem usages, this field is ignored.

    Each element in the adapter_list represents one communication channel for a task If the subsystem is US (User Space), a communication channel will require a switch adapter window. Adapter windows, and User Space usages, must be specified on actual switch adapters that are only accessible if AGGREGATE_ADAPTERS=False is specified in the configuration file.The name of the schedd host.

ll_terminate_job

Function to instruct the LoadLeveler negotiator to cancel the specified job_step.

  rc  = ll_terminate_job( cluster, proc, from_ host, msg )

Parameters

  • cluster

    String representing the job step ID.

  • proc

    String representing the job step to be cancelled.

  • from_host

    String representing the name of the schedd host.

  • msg

    String of the message via ll_get_data as to why the job was cancelled.