In this post, we explore how AWS and NVIDIA Run:ai are extending GPU fractionalization and orchestration capabilities beyond traditional cloud regions to edge environments, including AWS Local Zones, Outposts, and EKS Hybrid Nodes. The collaboration addresses the growing demand for distributed AI/ML workloads that require efficient GPU resource management across geographically separated locations while maintaining consistent performance, compliance, and cost optimization .| Containers
In this post, we explore advanced traffic routing patterns with the Kubernetes Gateway API through a practical Calendar web application example, demonstrating how it streamlines and standardizes application connectivity and service mesh integration in Kubernetes. The post covers three key use cases: exposing applications to external clients through hostname-based routing, implementing canary deployments between microservices using gRPC traffic splitting, and controlling egress traffic to exte...| Containers
In this post, we explore how KubeArmor, an open source container-aware security enforcement system, enhances the security posture of containerized workloads running on EKS Auto Mode clusters. Although EKS Auto Mode significantly streamlines cluster management by automating control plane and node operations, securing the workloads running within the cluster remains a critical user responsibility.| Containers
In this post, we explore how to use AWS Identity and Access Management (IAM) Roles Anywhere, supported by HashiCorp Vault PKI, to facilitate joining EKS Hybrid Nodes to an Amazon EKS Cluster. This solution enables businesses to flexibly make use of compute resources outside of AWS by extending an Amazon Elastic Kubernetes Service (Amazon EKS) data plane beyond the AWS Cloud boundary, addressing use cases focused on data sovereignty, low latency communication, and regulatory compliance.| Containers
In this post, we explore the latest Amazon Elastic Kubernetes Service (Amazon EKS) Auto Mode features that enhance security, network control, and performance for enterprise Kubernetes deployments. These new capabilities address critical operational challenges including capacity management, network segmentation, enterprise PKI integration, and comprehensive encryption while maintaining the automated cluster management that makes EKS Auto Mode transformative for development teams.| Containers
In this post, we explore how the integration of Amazon CloudWatch Logs Live Tail and Amazon ECS Exec with AWS CloudShell in the Amazon ECS console streamlines container troubleshooting by eliminating the need to switch between multiple interfaces or maintain separate CLI configurations. These new features centralize essential debugging capabilities, allowing DevOps engineers and developers to maintain reliable container-based applications while preserving necessary security and governance con...| Containers
In this post, we introduce the Slinky Project and explore how it enables organizations to run Slurm workload management within Amazon EKS, combining the deterministic scheduling capabilities of Slurm with Kubernetes' dynamic resource allocation for efficient hybrid workload management. This unified approach allows teams to maximize resource utilization across both batch processing jobs and cloud-native applications without maintaining separate infrastructure silos.| Containers
In this post, we explore how to manage EKS Pod Identity associations at scale using Argo CD and AWS Controllers for Kubernetes (ACK), addressing the critical challenge of the eventually consistent EKS Pod Identity API. The guide demonstrates automation techniques to ensure proper IAM role associations before application deployment, maintaining GitOps workflows while preventing permission-related failures.| Amazon Web Services
This post was co-authored by Shyam Jeedigunta, Principal Engineer, Amazon EKS; Apoorva Kulkarni, Sr. Specialist Solutions Architect, Containers and Raghav Tripathi, Sr. Software Dev Manager, Amazon EKS. Today, Amazon Elastic Kubernetes Service (Amazon EKS) announced support for clusters with up to 100,000 nodes. With Amazon EC2’s new generation accelerated computing instance types, this translates to […]| Amazon Web Services
Introduction The dockershim, an application programming interface (API) shim between the kubelet and the Docker Engine, deprecated from Kubernetes 1.24 in favor of supporting Container Runtime Interface (CRI) compatible runtimes. Amazon Elastic Kubernetes Service (Amazon EKS) also ended support of the dockershim starting with the Kubernetes version 1.24 release. The official EKS Amazon Machine Images(AMI) […]| Amazon Web Services
In this post, we explore patterns and practices for building and operating distributed Amazon Elastic Kubernetes Service (Amazon EKS)-based applications effectively. We examine three deployment models - SaaS Provider Hosted, Remote Application Plane, and Hybrid Nodes - each offering distinct advantages for specific use cases as companies scale their software as a service (SaaS) offerings.| Containers
In this post, we guide you through the process of migrating from AWS App Mesh to Amazon VPC Lattice, highlighting key considerations and benefits that this transition offers for your cloud infrastructure. We demonstrate how to migrate an IT Inventory Management System application from AWS App Mesh to VPC Lattice using Amazon ECS, with detailed steps for creating VPC Lattice resources, updating task definitions, and implementing blue/green deployment strategies.| Amazon Web Services
In this post, Amazon ECS announces support for IPv6-only workloads, allowing users to run containerized applications in IPv6-only environments without IPv4 dependencies while maintaining compatibility with existing applications and AWS services. The new capability helps organizations address IPv4 address exhaustion challenges, streamline network architecture, improve security posture, and meet compliance requirements for IPv6 adoption.| Amazon Web Services
In this post, we demonstrate how to use a Raspberry Pi 5 as an Amazon EKS hybrid node to process edge workloads while maintaining cloud connectivity. We show how to set up an EKS cluster that connects cloud and edge infrastructure, secure connectivity using WireGuard VPN, enable container networking with Cilium, and implement a real-world IoT application using an ultrasonic sensor that demonstrates edge-cloud integration.| Amazon Web Services
In this post, we explore the migration path from AWS CodeDeploy to Amazon ECS for blue/green deployments, discussing key architectural differences and implementation considerations. We examine three different migration approaches - in-place update, new service with existing load balancer, and new service with new load balancer - along with their respective trade-offs in terms of complexity, risk, downtime, and cost.| Amazon Web Services
In this post, we explore how Amazon ECS's native support for blue/green deployments can be extended using lifecycle hooks to integrate test suites, manual approvals, and metrics into deployment pipelines.| Amazon Web Services
In this post, we introduce an automated, GitOps-driven approach to resource optimization in Amazon EKS using AWS services such as Amazon Managed Service for Prometheus and Amazon Bedrock. The solution helps optimize Kubernetes resource allocation through metrics-driven analysis, pattern-aware optimization strategies, and automated pull request generation while maintaining GitOps principles of collaboration, version control, and auditability.| Amazon Web Services
In this post, we explore how to build highly available Kubernetes applications using Amazon EKS Auto Mode by implementing critical features like Pod Disruption Budgets, Pod Readiness Gates, and Topology Spread Constraints. Through various test scenarios including pod failures, node failures, AZ failures, and cluster upgrades, we demonstrate how these implementations maintain service continuity and maximize uptime in EKS Auto Mode environments.| Amazon Web Services
In this post, we demonstrate how to test network resilience of AWS Fargate workloads on Amazon ECS using AWS Fault Injection Service's new network fault injection capabilities, including network latency, blackhole, and packet loss experiments. Through a sample three-tier application architecture, we show how to conduct controlled chaos engineering experiments to validate application behavior during network disruptions and improve system resilience.| Amazon Web Services
One of the most common requests we hear from customers is, “help me decide which container service to use.” We recommend that most teams begin by selecting a container solution with the attributes most aligned to their application requirements or operational preferences. This post covers some of the critical decisions involved in choosing between AWS […]| Amazon Web Services
This blog post was authored by Robert Northard, Principal Container Specialist SA, Eric Chapman, Senior Product Manager EKS, and Elamaran Shanmugam, Senior Specialist Partner SA. Introduction Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes transform how you run generative AI inference workloads across cloud and on-premises environments. Extending your EKS cluster to on-premises infrastructure allows you […]| Amazon Web Services
In this post, we dive deep into cluster networking configurations for Amazon EKS Hybrid Nodes, exploring different Container Network Interface (CNI) options and load balancing solutions to meet various networking requirements. The post demonstrates how to implement BGP routing with Cilium CNI, static routing with Calico CNI, and set up both on-premises load balancing using MetalLB and external load balancing using AWS Load Balancer Controller.| Amazon Web Services
We’re excited to announce that Amazon Elastic Kubernetes Service (Amazon EKS) now supports up to 100,000 worker nodes in a single cluster, enabling customers to scale up to 1.6 million AWS Trainium accelerators or 800K NVIDIA GPUs to train and run the largest AI/ML models. This capability empowers customers to pursue their most ambitious AI […]| Amazon Web Services