This blog post was authored by Robert Northard, Principal Container Specialist SA, Eric Chapman, Senior Product Manager EKS, and Elamaran Shanmugam, Senior Specialist Partner SA. Introduction Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes transform how you run generative AI inference workloads across cloud and on-premises environments. Extending your EKS cluster to on-premises infrastructure allows you […]| Amazon Web Services
In this post, we dive deep into cluster networking configurations for Amazon EKS Hybrid Nodes, exploring different Container Network Interface (CNI) options and load balancing solutions to meet various networking requirements. The post demonstrates how to implement BGP routing with Cilium CNI, static routing with Calico CNI, and set up both on-premises load balancing using MetalLB and external load balancing using AWS Load Balancer Controller.| Amazon Web Services
We’re excited to announce that Amazon Elastic Kubernetes Service (Amazon EKS) now supports up to 100,000 worker nodes in a single cluster, enabling customers to scale up to 1.6 million AWS Trainium accelerators or 800K NVIDIA GPUs to train and run the largest AI/ML models. This capability empowers customers to pursue their most ambitious AI […]| Amazon Web Services
In this post, we demonstrate how to implement Fully Sharded Data Parallel (FSDP) fine-tuning of the dolly-v2-7b model using Amazon ECS. The solution uses a Ray cluster running on ECS with two services (head and worker) connected to Amazon S3, enabling efficient distributed training across multiple GPUs while abstracting away container orchestration complexities.| Amazon Web Services
This blog post was jointly authored by Carlos Santana, Sr. Solution Architect, Containers; Sriram Ranganathan, Sr. Product Manager, Kubernetes; Sabari Sawant, Product Marketing Manager, Kubernetes; and Frank Carta, Sr. GTM specialist, Containers. As organizations grow their Kubernetes infrastructure across AWS Regions and accounts, they face increasing challenges in maintaining oversight of their Kubernetes clusters. Without […]| Amazon Web Services