In this post, Amazon ECS announces support for IPv6-only workloads, allowing users to run containerized applications in IPv6-only environments without IPv4 dependencies while maintaining compatibility with existing applications and AWS services. The new capability helps organizations address IPv4 address exhaustion challenges, streamline network architecture, improve security posture, and meet compliance requirements for IPv6 adoption.| Amazon Web Services
In this post, we demonstrate how to configure Amazon Route 53 to enable unique failover behavior for each application within multi-tenant Amazon EKS environments across AWS Regions. This solution allows organizations to maintain the cost benefits of shared infrastructure while meeting diverse availability requirements by implementing application-specific health checks that provide granular control over failover scenarios.| Containers
In this post, we demonstrate how to use a Raspberry Pi 5 as an Amazon EKS hybrid node to process edge workloads while maintaining cloud connectivity. We show how to set up an EKS cluster that connects cloud and edge infrastructure, secure connectivity using WireGuard VPN, enable container networking with Cilium, and implement a real-world IoT application using an ultrasonic sensor that demonstrates edge-cloud integration.| Amazon Web Services
In this post, we explore the migration path from AWS CodeDeploy to Amazon ECS for blue/green deployments, discussing key architectural differences and implementation considerations. We examine three different migration approaches - in-place update, new service with existing load balancer, and new service with new load balancer - along with their respective trade-offs in terms of complexity, risk, downtime, and cost.| Amazon Web Services
In this post, we explore how Amazon ECS's native support for blue/green deployments can be extended using lifecycle hooks to integrate test suites, manual approvals, and metrics into deployment pipelines.| Amazon Web Services
In this post, we introduce an automated, GitOps-driven approach to resource optimization in Amazon EKS using AWS services such as Amazon Managed Service for Prometheus and Amazon Bedrock. The solution helps optimize Kubernetes resource allocation through metrics-driven analysis, pattern-aware optimization strategies, and automated pull request generation while maintaining GitOps principles of collaboration, version control, and auditability.| Amazon Web Services
In this post, we explore how to build highly available Kubernetes applications using Amazon EKS Auto Mode by implementing critical features like Pod Disruption Budgets, Pod Readiness Gates, and Topology Spread Constraints. Through various test scenarios including pod failures, node failures, AZ failures, and cluster upgrades, we demonstrate how these implementations maintain service continuity and maximize uptime in EKS Auto Mode environments.| Amazon Web Services
In this post, we show you how to swiftly deploy inference workloads on EKS Auto Mode and demonstrate key features that streamline GPU management. We walk through a practical example by deploying open weight models from OpenAI using vLLM, while showing best practices for model deployment and maintaining operational efficiency.| Containers
In this post, we demonstrate how to utilize the Kubecost Amazon EKS add-on to reduce infrastructure costs and enhance Kubernetes efficiency through Container Request Right Sizing, which helps identify and fix inefficient container resource configurations. We explore how to review Kubecost's right sizing recommendations and implement them through either one-time updates or scheduled automated resizing within Amazon EKS environments for continuous resource optimization.| Containers
In this post, we explore how Amazon EC2 P6e-GB200 UltraServers are transforming distributed AI workload through seamless Kubernetes integration, featuring NVIDIA GB200 Grace Blackwell architecture that enables memory-coherent domains of up to 72 GPUs. The post demonstrates how Dynamic Resource Allocation (DRA) on Amazon EKS enables sophisticated GPU topology management and cross-node GPU communication through IMEX channels, making it possible to efficiently train and deploy trillion-parameter...| Containers
In this post, we demonstrate how to test network resilience of AWS Fargate workloads on Amazon ECS using AWS Fault Injection Service's new network fault injection capabilities, including network latency, blackhole, and packet loss experiments. Through a sample three-tier application architecture, we show how to conduct controlled chaos engineering experiments to validate application behavior during network disruptions and improve system resilience.| Amazon Web Services
One of the most common requests we hear from customers is, “help me decide which container service to use.” We recommend that most teams begin by selecting a container solution with the attributes most aligned to their application requirements or operational preferences. This post covers some of the critical decisions involved in choosing between AWS […]| Amazon Web Services
This blog post was authored by Robert Northard, Principal Container Specialist SA, Eric Chapman, Senior Product Manager EKS, and Elamaran Shanmugam, Senior Specialist Partner SA. Introduction Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes transform how you run generative AI inference workloads across cloud and on-premises environments. Extending your EKS cluster to on-premises infrastructure allows you […]| Amazon Web Services
In this post, we dive deep into cluster networking configurations for Amazon EKS Hybrid Nodes, exploring different Container Network Interface (CNI) options and load balancing solutions to meet various networking requirements. The post demonstrates how to implement BGP routing with Cilium CNI, static routing with Calico CNI, and set up both on-premises load balancing using MetalLB and external load balancing using AWS Load Balancer Controller.| Amazon Web Services
We’re excited to announce that Amazon Elastic Kubernetes Service (Amazon EKS) now supports up to 100,000 worker nodes in a single cluster, enabling customers to scale up to 1.6 million AWS Trainium accelerators or 800K NVIDIA GPUs to train and run the largest AI/ML models. This capability empowers customers to pursue their most ambitious AI […]| Amazon Web Services