본문 바로가기
소프트웨어/LSF

LSF 데몬

by yororing 2024. 4. 16.

00 LSF daemons

  • LSF interacts with several daemons, each responsible for different aspects of the entire LSF workflow.
  • patch하면 cluster를 restart 필수
# lsf_daemons restart
Stopping the LSF subsystem
Starting the LSF subsystem
  • cluster에 설치된 lsf version, cluster 이름,  관리 호스트 이름 확인
# lsid
IBM Spectrum LSF Standard 10.1.0.13, Apr 15 2022
Copyright International Business Machines Corp. 1992, 2016.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

My cluster name is cluster1
My master name is dcvbroker1dya

 

 

- mbatchd (mbd)

  • Management batch daemon running on the management host
  • Started by sbatchd
  • Responsible for the overall state of jobs in the system
  • Receives job submission, and information query requests
  • Manages jobs that are held in queues. Dispatches jobs to hosts as determined by mbschd
  • Configuration:
    • Port number is defined in lsf.conf

- lsfproxyd

  • LSF rate limiter daemon
  • By default, all LSF batch commands contact the mbatchd daemon (or the mbatchd query child, if configured). When there are excessive requests, such as scripts with tight loop running bjobs commands, mbatchd can become overloaded, negatively affecting cluster performance. Starting in Fix Pack 14, to protect mbatchd from heavy loads, enable the LSF rate limiter (controlled by the lsfproxyd daemon), which acts as a gatekeeper between the commands and the mbatchd daemon. The rate limiter is supported on Linux.
  • Configuration:
    • You can configure to have multiple lsfproxyd daemons run within a single cluster; use the LSF_PROXY_HOSTS parameter to list the hosts on which you want lsfproxyd daemons to run. When multiple lsfproxyd daemons are defined for a cluster, they work together to balance workload and provide high availability: the client command first randomly picks one to use, and if an lsfproxyd daemon is unavailable, then the command locates another one to use.
    • LIM controls starting and restarting the lsfproxyd daemon on the LSF hosts specified in the LSF_PROXY_HOSTS parameter in the lsf.conf file. When the lsfproxyd daemon starts, it binds to the listening port specified by the LSF_PROXY_PORT parameter in the lsf.conf file. LIM restarts the lsfproxyd daemon if it dies.

- mbschd

  • Management batch scheduler daemon running on the management host.
  • Works with mbatchd.
  • Started by mbatchd.
  • Makes scheduling decisions based on job requirements, and policies, and resource availability. Sends scheduling decisions to mbatchd.

- sbatchd

  • Server batch daemon running on each server host.
  • Receives the request to run the job from mbatchd and manages local execution of the job.
  • Responsible for enforcing local policies and maintaining the state of jobs on the host.
  • The sbatchd forks a child sbatchd for every job. The child sbatchd runs an instance of sbatchd to create the execution environment in which the job runs. The child sbatchd exits when the job is complete.
  • Configuration: 
    • Port number is defined in lsf.conf.

Commands

  • bctrld start sbd: Start sbatchd.
  • bctrld stop sbd: Shut down sbatchd.
  • bctrld restart sbd: Restart sbatchd.

- res

  • Remote execution server (res) running on each server host.
  • Accepts remote execution requests to provide transparent and secure remote execution of jobs and tasks.
  • Configuration:
    • Port number is defined in lsf.conf.

Commands

  • bctrld start res: Start res.
  • bctrld stop res: Shut down res.
  • bctrld restart res: Restart res.

- lim

  • Load information manager (LIM) running on each server host.
  • Collects host load and configuration info and forwards it to the management host LIM running on the management host.
  • Reports the info that is displayed by the lsload and lshosts commands.
  • Configuration: 
    • Port number is defined in lsf.conf.
  • Static indices are reported when the LIM starts up or when the number of CPUs (ncpus) change. Static indices are:
    • Number of CPUs (ncpus)
    • Number of disks (ndisks)
    • Total available memory (maxmem)
    • Total available swap (maxswp)
    • Total available temp (maxtmp)
  • Dynamic indices for host load collected at regular intervals are:
    • Hosts status (status)
    • 15 second, 1 minute, and 15 minute run queue lengths (r15s, r1m, and r15m)
    • CPU utilization (ut)
    • Paging rate (pg)
    • Number of login sessions (ls)
    • Interactive idle time (it)
    • Available swap space (swp)
    • Available memory (mem)
    • Available temp space (tmp)
    • Disk IO rate (io)

Commands

  • bctrld start lim: Start LIM.
  • bctrld stop lim: Shut down LIM.
  • bctrld restart lim: Restart LIM.
  • lsload: View dynamic load values.
  • lshosts: View static host load values.

ConfigurationPort number is defined in lsf.conf.

- Parent lim

  • The LIM running on the management host. Receives load information from the LIMs running on hosts in the cluster.
  • Forwards load information to mbatchd, which forwards this information to mbatchd to support scheduling decisions. If the management host LIM becomes unavailable, a LIM on another host automatically takes over.

Commands

  • bctrld start lim: Start LIM.
  • bctrld stop lim: Shut down LIM.
  • bctrld restart lim: Restarts LIM.
  • lsload: View dynamic load values.
  • lshosts: View static host load values.

- elim

  • External Load Information Manager  (ELIM) is a site-definable executable that collects and tracks custom dynamic load indices.
  • An ELIM can be a shell script or a compiled binary program, which returns the values of the dynamic resources you define.
  • The ELIM executable must be named elim and located in LSF_SERVERDIR.

- pim

  • Process information manager (PIM) running on each server host.
  • Started by LIM, which periodically checks on pim and restarts it if it dies.
  • Collects info about job processes running on the host such as CPU and memory that is used by the job, and reports the information to sbatchd.

Commands

  • bjobs: View job info

 

참조

  1. https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-daemons 
  2.  

'소프트웨어 > LSF' 카테고리의 다른 글

lsfstartup, lsfrestart, lsfshutdown (LSF 명령어)  (0) 2024.04.29
LSF 빠른 참조  (0) 2024.04.22
LSF 클러스터, 잡, 큐  (0) 2024.04.22
LSF 명령어  (0) 2024.04.16
LSF란 (IBM Spectrum LSF)  (0) 2024.04.03