Community Contributions

Edition 2026

Speaker
Talk
ENG

Pushing the Limits: Running Large-Scale HPC Workloads on Kubernetes

Kubernetes is rapidly emerging as the platform of choice for running large-scale, high-performance computing (HPC) workloads across industries—from financial services and analytics to scientific research and simulation. But what does it take to push Kubernetes to its limits, orchestrating thousands of jobs and scaling clusters efficiently for demanding, data- and compute-intensive workloads? In this session, we'll explore practical strategies, architectural patterns, and lessons learned from running HPC workloads on Kubernetes at scale. We'll cover how to design clusters for elasticity and throughput, leverage open source autoscaling tools like Cluster Autoscaler and Karpenter, and optimize resource usage for both cost and performance. You'll learn about real-world approaches to stress-testing Kubernetes, best practices for node group design and workload scheduling, and the trade-offs between different autoscaling solutions.

View Session