I've been following Arm for a while, so was glad that the C4A VM was GA'd earlier this year. These machines run Google silicon Axion chip. It's also great to see broad geographical availability including regions like Tokyo. Feels like Arm is finally ready for prime time. C4D running 5th Gen AMD E| William Denniss
Kubernetes has transformed cloud infrastructure by enabling scalable, containerized applications. While it initially gained traction for managing web applications and microservices, its capabilities now extend to AI/ML workloads, making it the go-to platform for data scientists and machine learning engineers. Running AI/ML workloads on Kubernetes presents unique challenges, including: Specialized hardware requirements (e.g., GPUs, TPUs) Scalability for model training and inference Complex dat...| Pulumi Blog
I’m continuing my series on running the test suite for each Pull Request on Kubernetes. In the previous post, I laid the groundwork for our learning journey: I developed a basic JVM-based CRUD app, tested it locally using Testcontainers, and tested it in a GitHub workflow with a GitHub service container. This week, I will raise the ante to run the end-to-end test in the target Kubernetes environment. For this, I’ve identified gaps that I’ll implement in this blog post: Create| A Java geek
GKE users get access to an awesome new tool this week: the Kubernetes History Inspector. This product, released as open source, parses Kubernetes and GKE logs to generate a timeline with all events in the cluster. Kubernetes is a complicated system with multiple objects, and various automated pro| William Denniss
本文将简单探索一下前段时间 GKE 官宣的名为 Workload Identity Federation for GKE 的特性。 功能介绍¶ Workload Identity Federation for GKE 是原有的 GKE Workload Identity 特性的改进版本, 核心的改进是减少了需要配置的信息,提升了用户体验。 使用方法¶ 可以通过下面几个步骤体验该特性: 创建一个启用 Workload Identity Federation for GKE 特性的 GKE 集群。具体启用位置是:创建集群 - 安全 - 启用 Wor...| mozillazg's Blog
In this article, we will briefly explore a feature called "Workload Identity Federation for GKE" that was recently announced by GKE in their official blog. Features Overview Workload Identity Federation for GKE is an improved version of the original GKE Workload Identity feature. The main improvement is that it needs less configuration and offers better user experience. How to Use Follow these steps to try this feature: Create a GKE cluster with Workload Identity Federation for GKE enabled. Y...| mozillazg's Blog
Introduction Google Kubernetes Engine (GKE) is Google’s managed Kubernetes service that simplifies deploying, managing, and scaling containerized applications. It is highly integrated with Google Cloud services, leveraging features such as autoscaling, workload security, and efficient storage. In this tutorial, we will guide you through the process of integrating Kubernetes with Google Cloud Platform (GCP), focusing […]| Collabnix
Anthos Service Mesh for GKE can be installed in the following modes: In-cluster ASM using the asmcli utility Managed ASM using the asmcli utility Managed ASM using the ‘gcloud container fleet’ command Managed ASM using the Terraform asm submodule If you need to determine the installation mode used on your GKE cluster, you can examine ... GCP: determining whether ASM is installed via asmcli or gcloud fleet| Fabian Lee : Software Engineer
If you need to determine at the CLI whether a GKE cluster is managed using Standard or Autopilot mode, this is available by using gcloud to describe the cluster. # identify cluster and location gcloud container clusters list cluster_name=<clusterName> location_flag="--region=<region>" # OR --zone=<zone> # returns 'True' if GKE AutoPilot cluster # returns empty if standard ... GCP: determining whether GKE cluster mode is Standard or Autopilot| Fabian Lee : Software Engineer
As much as Terraform pushes to be the absolute system of record for resources it creates, often valid external processes are assisting in managing those same resources. Here are some examples of legitimate external changes: Other company-approved Terraform scripts applying labeling to resources in order to track ownership and costs Security teams modifying IAM roles ... GKE: terraform lifecycle ‘ignore_changes’ to manage external changes to GKE cluster| Fabian Lee : Software Engineer
At some point, there will be a system change significant enough that a maintenance window needs to be scheduled with customers. But that doesn’t mean the end-user traffic or client integrations will stop requesting the services. What we need to present to end-users is a maintenance page during this outage to indicate the overall solution ... GCP: Cloud Run/Function to handle requests to GKE cluster during maintenance| fabianlee.org
If you are managing GKE clusters using Anthos Config Management (ACM) and need to take advantage of newer features or enhancements in ConfigSync or PolicyController, upgrading these components can be done using the gcloud utility. # check current version of ACM on GKE clusters gcloud beta container fleet config-management version # select membership to upgrade ... GKE: upgrade Anthos Config Management for GKE cluster| fabianlee.org
I recently set out to run Stable Diffusion on GKE in Autopilot mode, building a container from scratch using the AUTOMATIC1111's webui. This is likely not how you'd host a stable diffusion service for production (which would make for a good topic of another blog post), but it's a fun way to try out| William Denniss
When building your business using LLMs as a key component, you may wish to be a master of your own domain and run your own model. Running your own LLM protects you from changes like pricing increases or API availability with third-party services, guarantees the privacy of your data (no data needs to| William Denniss
Per the NVIDIA docs, CUDA 12 applications require driver 525.60.04+. This driver is available as part of GKE 1.28. To upgrade an existing cluster to the latest version of 1.28: VERSION="1.28" REGION="us-central1" CLUSTER_NAME="autopilot-cluster-1" gcloud container clusters upgrade $CLUSTER_NAME| William Denniss
Image streaming is a really great way to speed up workload scaling on GKE. Take for example the deep learning image from Google. In my testing, the container is created in just 20s, instead of 3m50s. While there is slightly higher latency on reads while the image streams, the 3m30s head start is goi| William Denniss
Update: this information is now available in the official docs. If you want to know what version of your GPU drivers are active on GKE, here's a one-liner: kubectl logs -l k8s-app=nvidia-gpu-device-plugin -c "nvidia-gpu-device-plugin" --tail=-1 -n kube-system | grep Driver What th| William Denniss
GKE operates on a flat VPC structure. That means that every Node and Pod has an identity within your VPC, and their IPs are not re-used. This is convenient, as Pods are addressable within the VPC, but unless you create multiple VPCs to isolate resources, you can end up using a lot of IPs very quickl| William Denniss
Did you know that you can now add Pod IP ranges to GKE clusters? Pods use a lot of IPs, which in the past forced you to compromise. Do you allocate a lot of IPs to the cluster allowing for growth while reserving a big group of IPs, or do you allocate just a little to conserve IPs but risking the nee| William Denniss
Connect internal services from multiple clusters together in one logical namespace. Easily connect services running in Autopilot to Standard and vice versa, share services between teams running on their own services, and back an internal service by replicas in multiple clusters for cross-regional av| William Denniss
[Update (2023-12-20): You can now turn off workload logging in Autopilot. That is the recommended approach if you want to remove all workload logs. To disable workload logs for Autopilot (for example, if you use a third-party logging agent like DataDog), pass this value at cluster creation: gclo| William Denniss
Do you need to provision a whole bunch of ephemeral storage to your Autopilot Pods? For example, as part of a data processing pipeline? In the past with Kubernetes, you might have used emptyDir as a way to allocate a bunch of storage (taken from the node's boot disk) to your containers. This however| William Denniss
Let's say you want to migrate a service in GKE from one cluster to another (including between Standard and Autopilot clusters), and keep the same external IP while you do. DNS might be the ideal way to update your service address, for whatever reason you need to keep the IP the same. Fortunately, it| William Denniss
Autopilot is a new mode of operation for Google Kubernetes Engine (GKE) where compute capacity is dynamically provisioned based on your pod's requirements. Among other innovations, it essentially functions as a fully automatic cluster autoscaler. Update: GKE now has an official guide for provisio| William Denniss