Skip to main content

Kueue

Kueue is a Kubernetes-native queue and job scheduler designed for batch workloads, with a focus on machine learning and data processing jobs.

https://kueue.sigs.k8s.io

Overview

Kueue extends Kubernetes with job queueing capabilities, allowing for:

  • Fair resource sharing across teams and users
  • Advanced job scheduling policies
  • Resource quota management for batch workloads
  • Integration with existing Kubernetes Job objects
Basic Kueue LocalQueue
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: ml-training-queue
spec:
clusterQueue: ml-cluster-queue
queueingStrategy: BestEffortFIFO

Use Cases in ML

Kueue is particularly useful for:

  • Training multiple ML models with fair resource allocation
  • Scheduling GPU-intensive workloads
  • Managing batch inference jobs
  • Prioritizing different types of ML workflows (research vs. production)

Integrations

Kueue integrates with several Kubernetes job frameworks:

  • Native Kubernetes Jobs
  • Kubeflow Training Operators (PyTorchJob, TFJob)
  • Ray on Kubernetes
  • Batch/v1 Jobs

Architecture

Kueue uses a multi-level queueing system:

  • ClusterQueue: Defines resource pools and admission control at cluster level
  • LocalQueue: Represents team or project-specific queues that submit to ClusterQueues
  • Admission Check: Controls when jobs are admitted based on available resources
  • Quota Management: Enforces fair sharing of resources among queues
ClusterQueue Example
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: ml-cluster-queue
spec:
namespaceSelector: {}
resources:
- name: "cpu"
nominalQuota: 24
- name: "memory"
nominalQuota: 96Gi
- name: "nvidia.com/gpu"
nominalQuota: 4
cohort: ml-workloads
queueingStrategy: StrictFIFO

Benefits over Direct Kubernetes Jobs

  • Fair Sharing: Prevents resource hogging by single teams or workloads
  • Oversubscription: Can schedule more jobs than available resources, increasing utilization
  • Preemption: Higher priority jobs can preempt lower priority ones
  • Resource Pools: Flexible resource allocation across different hardware types (GPU, CPU)
  • Quotas: Enforces resource usage limits across teams

Using with ML Workloads

For ML workloads, Kueue can manage access to specialized hardware like GPUs:

ML Job with Kueue
apiVersion: batch/v1
kind: Job
metadata:
name: pytorch-training
annotations:
kueue.x-k8s.io/queue-name: ml-training-queue
spec:
template:
spec:
containers:
- name: pytorch
image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
command: ["python", "/train.py"]
resources:
requests:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "4"
limits:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "4"
restartPolicy: Never

Comparison to Other Schedulers

FeatureKueueDefault K8s SchedulerVolcano
Fairness✅✅✅✅✅
Queue-based✅✅✅✅✅
Gang Scheduling✅✅✅✅✅
Preemption✅✅✅✅
K8s Native✅✅✅✅✅✅✅✅
ML Focus✅✅✅✅✅
Ease of Setup✅✅✅✅✅

Best for

ML training workloads, GPU scheduling, batch job management, and ensuring fair allocation of expensive compute resources across multiple teams in a Kubernetes environment.