At my current employer, we use Kubernetes to run hundreds of thousands of bare metal servers, spread over hundreds of Kubernetes clusters. We use Kubernetes beyond officially supported/tested scale limits by running more than 5,000 nodes and over a...| ahmet.im
View more about this event at KubeCon + CloudNativeCon Europe 2025| kccnceu2025.sched.com
Ahmet Alp Balkan and Ronak Nathani are software engineers at LinkedIn compute infrastructure team running the Kubernetes platform for LinkedIn and they joined us today to talk about how they run Kubernetes at scale and what they learned along the way.| kubernetespodcast.com
Anyone who is running Kubernetes in a large-scale production setting cares about having a predictable Pod lifecycle. But there are so many ways Kubernetes terminates workloads, each one working in non-trivial (and not always predictable) ways. These...| ahmet.im
Kubernetes volumes provide a way for containers in a pod to access and share data via the filesystem. There are different kinds of volume that you can use for different purposes, such as: populating a configuration file based on a ConfigMap or a Secret providing some temporary scratch space for a pod sharing a filesystem between two different containers in the same pod sharing a filesystem between two different pods (even if those Pods run on different nodes) durably storing data so that it s...| Kubernetes
Any company using Kubernetes eventually starts looking into developing their custom controllers. After all, what’s not to like about being able to provision resources with declarative configuration: Control loops are fun, and Kubebuilder makes...| ahmet.im
Last week, OpenAI has suffered a several hours long outage and published a detailed postmortem about it. Highly recommend reading it. These technical reports are usually a gold mine for all large-scale Kubernetes users, as we all go through similar set of reliability issues running Kubernetes in production.| Ahmet Alp Balkan
This is the analysis of a low severity incident that took place in the Kubernetes clusters at the company I work at that taught me a lot about how to think about the off-the-shelf components we bring from the ecosystem into the critical path and...| ahmet.im
Introduction This post-mortem details an incident that occurred on December 11, 2024, where all OpenAI services experienced significant downtime. The issue stemmed from a new telemetry service deployment that unintentionally overwhelmed the Kubernetes control plane, causing cascading failures across critical systems. In this post, we break down the root cause, outline the steps taken for remediation, and share the measures we are implementing to prevent similar incidents in the future. Impa...| OpenAI Status
A quick code search query reveals at least 7,000 Kubernetes Custom Resource Definitions in the open source corpus,1 most of which are likely generated with controller-gen —a tool that turns Go structs with comments-based markers into Kubernetes CRD...| ahmet.im
I’ve recently done a Twitter poll and only 20% of the participants accurately predicted that it takes Kubernetes 60-90 seconds to propagate changes to Secrets and ConfigMaps on the mounted volumes. So I want to take you on a journey in the...| ahmet.im
Files on Kubernetes Secret and ConfigMap volumes work in peculiar and undocumented ways when it comes to watching changes to these files with the inotify(7) syscall. Your typical file watch that works outside Kubernetes might not work as you expect...| ahmet.im
man7.org > Linux > man-pages| man7.org
Custom resources are extensions of the Kubernetes API. This page discusses when to add a custom resource to your Kubernetes cluster and when to use a standalone service. It describes the two methods for adding custom resources and how to choose between them. Custom resources A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind; for example, the built-in pods resource contains a collection of Pod objects.| Kubernetes