본문 바로가기
소프트웨어/LSF

LSF란 (IBM Spectrum LSF)

by yororing 2024. 4. 3.

01 LSF란

1. 정의

  • 'Load Sharing Facility'의 약자
  • IBM에서 만든 소프트웨어 제품으로 간략하게 말해 job scheduling 프로그램 (workload management platform)
  • 빠르고 믿을만한 workload performace를 실행하고 비용 또한 절감되는 shared, scalable, and fault-tolerant 인프라를 생성하기 위해 다양한 IT 자원(resource)에 작업을 분배 (distribute jobs)
  • 부하(load)를 균형있게 분산하고 자원을 할당하며 해당 자원에 접근할 수 있는 기능을 제공
  • provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress
  • 작업 요구 사항을 고려하여 작업을 실행하기에 최적의 자원을 찾고 작업의 진행 상황을 모니터링하는 자원 관리 프레임워크를 제공
    • (Job) 작업은 항상 호스트 부하(host load)와 사이트 정책에 따라 실행됨

02 LSF Cluster란

1. Cluster의 정의

  • A group of computers (hosts) running LSF that work together as a single unit, combining computing power, workload, and resources. A cluster provides a single-system image for a network of computing resources.
  • Hosts can be grouped into a cluster in a number of ways. A cluster can contain:
    • All the hosts in a single administrative group
    • All the hosts on a subnetwork

2. Hosts

  • Hosts in your cluster perform different functions

1) Management host

  • LSF server host that acts as the overall coordinator for the cluster, doing all job scheduling and dispatch

2) Server host

  • a host that submits and runs jobs

3) Client host

  • a host that only submits jobs and tasks

4) Execution host

  • a host that runs jobs and tasks

5) Submission host

  • a host from which jobs and tasks are submitted

03 Job이란

  • A unit of work that is running in the LSF system.
  • A job is a command that is submitted to LSF for execution
  • LSF schedules, controls, and tracks the job according to configured policies.
  • Jobs can be complex problems, simulation scenarios, extensive calculations, or anything that needs compute power.

04 Job slot이란

  • A job slot is a bucket into which a single unit of work is assigned in the LSF system.
  • Hosts can be configured with multiple job slots and you can dispatch jobs from queues until all the job slots are filled.
  • You can correlate job slots with the total number of CPUs in the cluster.

05 Queue란

  • A cluster-wide container for jobs. 
  • All jobs wait in queues until they are scheduled and dispatched to hosts. 
  • Queues do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts. 
  • When you submit a job to a queue, you do not need to specify an execution host. 
  • LSF dispatches the job to the best available execution host in the cluster to run that job. 
  • Queues implement different job scheduling and control policies.

06 자원(Resources)이란

  • Resources are the objects in your cluster that are available to run work
  • 예시) include but are not limited to hosts, CPU slots, and licenses

참조

  1. https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=overview-lsf-introduction
  2.  
  3.  
  4.  

'소프트웨어 > LSF' 카테고리의 다른 글

lsfstartup, lsfrestart, lsfshutdown (LSF 명령어)  (0) 2024.04.29
LSF 빠른 참조  (0) 2024.04.22
LSF 클러스터, 잡, 큐  (0) 2024.04.22
LSF 데몬  (0) 2024.04.16
LSF 명령어  (0) 2024.04.16