Learn how to spot and resolve common ZFS issues like disk errors and degraded pools to maintain system health and data integrity.| Klara Systems
Have you ever seen two nearly identical SQL statements, differing only in date parameters or function variations, return similar results but with wildly different performance, sometimes by factors of 10 or 100? In real-world scenarios, we typically run the `EXPLAIN` statement to examine changes in the execution plan. But what if the execution plan doesn’t […] The post Exploring TiDB Observability: A Journey Through Real-World Case Studies appeared first on TiDB.| TiDB
AI exploded in 2023 – and so did the risks. Hallucinations, bias, toxic outputs… one wrong move can tank your brand. The secret? Read to find out.| AI Accelerator Institute
Just a few years ago, many enterprises were working toward consolidating log data into a single observability platform. By bringing The post Going for Silver: Making the Most of Tiered Observability appeared first on The New Stack.| The New Stack | DevOps, Open Source, and Cloud Native News
The eBPF-based observability provider groundcover today announced an observability solution specifically for monitoring LLMs and agents. It captures every interaction with LLM providers like OpenAI and Anthropic, including prompts, completions, latency, token usage, errors, and reasoning paths. According to groundcover, while LLMs can offer many benefits, they also introduce a lot of negatives: performance volatility, … continue reading The post Groundcover launches observability solution f...| ITOps Times
And why I don't want my database to choose the best encoding for me (yet)| Discover the Performance Engineer in you. | Polar Signals
The network isn't a simple conduit. It's the nervous system of your enterprise. Every application, every transaction, and every end-user experience hinges on its seamless operation. Network management has been a reactive discipline for far too long. It's a frantic scramble to extinguish fires reported by exasperated users. The true measure of network excellence, however, [...] The post The Silent Guardian: How Service Assurance Transforms the End-User Experience appeared first on Techstrong IT.| Techstrong IT
Grafana Labs announced a public preview of Grafana Assistant, an AI assistant that IT teams can use to interact with logs, metrics, and traces in a conversational manner. It is available in all parts of Grafana Cloud, and sees the context of what is on the page so that it can give specific, context-aware answers. … continue reading The post Grafana launches preview of AI assistant that works across all of Grafana Cloud appeared first on ITOps Times.| ITOps Times
Last week, I described several approaches to OpenTelemetry on the JVM, their requirements, and their different results. This week, I want to highlight several gotchas found across stacks in the zero-code instrumentation. The promise of OpenTelemetry Since its inception, OpenTelemetry has unified the 3 pillars of observability. In the distributed tracing space, it replaced proprietary protocols Zipkin and Jaeger. IMHO, it achieved such success for several reasons: First, a huge industry press| A Java geek
You may know I’m a big fan of OpenTelemetry. I recently finished developing a master class for the YOW! conference at the end of the year. During development, I noticed massive differences in configuration and results across programming languages. Even worse, differences exist across frameworks inside the same programming language. In this post, I want to compare the different zero-code OpenTelemetry approaches on the JVM, covering the most widespread: Spring Boot with Micrometer Tracing| A Java geek
Using eBPF to record your programs dying breathes| Discover the Performance Engineer in you. | Polar Signals
Traditional observability tools can’t keep up with modern complexity. Dashboard and alert-based approaches still rely heavily on manual processes, resulting in longer troubleshooting cycles, slower decisions, and higher MTTR. Engineering teams need something better. Today we’re launching Open 360 AI, the first observability platform designed for both humans and AI agents working together. Instead of […]| The Logz.io Blog — DevOps, Logging, Metrics, Tracing, and Security
Curious to see how AI actually performs in a real-world production scenario? Watch the webinar “AI-Driven Alert Triage and RCA” with Logz.io Customer Success Engineer, Seth King. Below, we also bring the main highlights of the webinar. AI claims to make engineers more efficient and agile, by shortening processes and surfacing insights that help drive […]| The Logz.io Blog — DevOps, Logging, Metrics, Tracing, and Security
In this guide, we’ll show technologies and examples of full stack observability for an application running on Kubernetes, OpenTelemetry and AWS.| Logz.io
A practical look at common log format standards, how JSON, XML, and key-value logs work, and when to use each in production systems.| Last9 Blog: Exploring the Realm of Monitoring, Observability, and Reliability...
The crew investigates a scientific outpost that has gone dark. They arrive to find the inhabitants have achieved "total observability," logging every single action, thought, and system metric. They are now so overwhelmed with data that they are paralyzed, unable to find the signal in the noise.| Seuros Blog - Navigation Logs from the Ruby Nebula
“Observability is the lens through which the invisible becomes visible, turning complex systems into understandable narratives.“ So before taking a […]| Distributed Computing Musings
In the last post, we explored how we can leverage tools such as Prometheus & Grafana for monitoring our applications. […]| Distributed Computing Musings
In the last post, we touched upon the requirement for observability & understood the basic components that form the common […]| Distributed Computing Musings
Imagine you are working on an issue in an existing feature. You started on Monday by reproducing the issue, you […]| Distributed Computing Musings
I woke up one day to Karpenter on AWS EKS throwing this error - "Controller isn't authorized to call ec2:RunInstances", and it took most of my day to figure this one out.| Technical Scratchpad
Prometheus Node Exporter is a tried and tested method to make hardware and OS metrics available as a scrapable endpoint to Prometheus server (or other downstream services/TSDB that support the same format). With OpenTelemetry (OTel) gaining more traction/recognition, I learnt that the OTel Collector's Host Metrics Receiver can also be used to expose host level metrics. However, I wondered if it would be able to make parity in terms of the type/number of metrics it is able to expose to downstr...| Technical Scratchpad
Thanks to Typesense for sponsoring the creation of this cartoon! GT2 Pro members, download a high-res version of this image that you can use royalty-free anywhere:| Good Tech Things
Learn the differences between traditional observability approaches and user-centric solutions when it comes to troubleshooting a mobile app issue.| Embrace
I wrote a lot of blog posts over my time at Parse, but they all evaporated after Facebook killed the product. Most of them I didn’t care about (there were, ahem, a lot of “service relia…| charity.wtf
Honeycomb is proud to be named a Visionary in the 2025 Gartner® Magic Quadrant™ for Observability Platforms. We feel that our recognition by Gartner showcases our commitment to help engineering teams gain observability over complex environments—not just for today’s systems, but for whatever comes next. The post Honeycomb Named a Visionary in the 2025 Gartner® Magic Quadrant™ for Observability Platforms appeared first on Honeycomb.| Honeycomb
I’m pleased to announce the public beta of Honeycomb Hosted MCP, along with our first wave of one-click integrations for Cursor, Visual Studio Code, and Claude Desktop. We’re also very excited to announce that Hosted MCP is available on AWS AI Agents marketplace and for all Honeycomb plans (including our free plan!) at no charge. The post Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS Marketplace AI Agents and Tools Category appeared first on Honeycomb.| Honeycomb
SAN FRANCISCO – July 16 2025 – Honeycomb , the creators of observability, today announced the availability of the Honeycomb Hosted Model Context Protocol (MCP) Server in the new AI Agents and Tools category of AWS Marketplace. Customers can now use AWS Marketplace to easily discover, buy, and deploy AI agents solutions, including Honeycomb’s MCP server, using their AWS accounts.| Honeycomb
Know how to use Elasticsearch with Python for indexing, searching, and analyzing data, complete with code, tips, and integration examples.| Last9 Blog: Exploring the Realm of Monitoring, Observability, and Reliability...
July 31st – Aug 3rd Portland State University @ Smith Memorial Student Union building This year Mark and I, with Richard Yen and Gabrielle Roth’s help, have been organizing the database…| PDXPUG
This is a very short Lesson Learned from migrating from fileabeat to Grafana Alloy - and how labels drove me nearly insane. td;dr; static_labels before labels!| ConSol Blog
Over and over, we’ve seen that teams who invest in adding rich, relevant context to their telemetry end up debugging faster and collaborating more effectively during incidents. Getting meaningful context added can feel like a big cross-team project, but some of the highest-leverage improvements don’t require app code changes or coordination across services. The post The Fast Path to More Useful Telemetry appeared first on Honeycomb.| Honeycomb
In this post, we’ll describe what traces are, how they work, and the value traces provide in observability, helping stakeholders understand their systems and delivering reliable services at scale and performance. The post What Are Traces? A Developer’s Guide to Distributed Tracing appeared first on Honeycomb.| Honeycomb
Many engineers & leaders are under pressure to apply sampling for cost savings purposes, but are concerned with the impacts on data quality.| Honeycomb
AWS PrivateLink is now supported — send logs and metrics privately, reduce AWS costs, and improve observability security with Logz.io.| Logz.io
Bringing what we've learned to our next generation database.| Discover the Performance Engineer in you. | Polar Signals
How we used WASM and some Go runtime modifications to run deterministic simulation tests against FrostDB| Discover the Performance Engineer in you. | Polar Signals
How we designed our database for complete control over concurrency, time, randomness, and failure injection.| Discover the Performance Engineer in you. | Polar Signals
At LDX3 in London last week, two roundtables I hosted with engineering leaders confirmed what many of us are starting to feel: observability isn’t just important—it’s becoming essential to how modern teams navigate the pressure to move fast and stay resilient. The post Is Your Observability Strategy Boardroom-Ready? appeared first on Honeycomb.| Honeycomb
Claude Code added OpenTelemetry metric and log support in a recent release, which led Austin to ask, can Claude Code observe itself?| Honeycomb
Learn how observability is transforming network management – helping IT teams move from reactive firefighting to proactive control| Information Age
As generative AI technologies become more integrated into our software products and workflows, those products and workflows start to look more and more like the LLMs themselves. They become less reliable, less deterministic, and occasionally wrong. LLMs are fundamentally non-deterministic, which means you’ll get a different response for the same input. If you’re using reasoning models and AI agents, then those errors can compound when earlier mistakes are used in later steps.| stackoverflow.blog
Databricks, Ataccama, Anomalo, and IBM are unlocking unstructured data for AI with trusted, scalable pipelines across Snowflake, Unity Catalog, and more.| theCUBE Research
Explore how Kubernetes is evolving to support AI and ML at scale—covering multi-cluster orchestration, GPU optimization, observability. Kubecon 2025| ITGix
Day 2 at KubeCon 2025 delved deep into the many facets of cloud-native security, illustrating how practitioners apply zero-trust principles, integrate policy-as-code, secure AI workloads, and harden Kubernetes clusters in real-world scenarios. Below is my technical summary of the notes I took during day 2 and lessons learned from a busy day dedicated to securing […]| ITGix
With this release, you can more easily build and reconfigure telemetry pipelines and sample safely with the ability to easily pull full-fidelity data from your own archive whenever you need it. The post Observability Without Tradeoffs: Introducing Powerful New Honeycomb Telemetry Pipeline Features appeared first on Honeycomb.| Honeycomb
Enhance and Pipeline Builder help teams to manage their data while controlling costs and accelerating OpenTelemetry adoption.| Honeycomb
Here are five practical tips to keep your OpenSearch cluster running smoothly and efficiently: 1. Balance Shards Across the Cluster Shards are the backbone of OpenSearch’s data distribution. If shards aren’t spread evenly, some nodes get overloaded, leading to CPU bottlenecks and sluggish performance. How to fix it:Use the _cat/shards API or OpenSearch Dashboards to […]| The Logz.io Blog — DevOps, Logging, Metrics, Tracing, and Security
Automatically detect and fix network documentation drift with NetBox Assurance, available as an add-on for NetBox Enterprise.| netboxlabs.com
Honeycomb recently hosted Observability Day London. Read a recap from Ken as he goes over all the talks and key takeaways from the day.| Honeycomb
You shipped your latest release. You tested it on emulators, QA devices, and the latest OS versions. But now it’s live and running on thousands or millions of real devices, across a jungle of screen sizes, hardware specs, OS versions, and network conditions. A user reports a crash on an old Samsung device over 3G. Someone else complains the app feels “sluggish” after updating. You dig through logs. Rebuild test cases. Ping the backend team. Try to reproduce. Yet, still no answers.| Honeycomb
The CSP response header can capture valuable reports that provide visibility into violations of your site's intended security policies.| Honeycomb
For the first time, Logz.io users can now create dashboards that bring together logs, metrics, and traces in a single unified view — making it easier than ever to monitor performance, detect issues, and troubleshoot incidents without switching tools or losing context. This launch is more than just a product update. It’s a clear signal […]| The Logz.io Blog — DevOps, Logging, Metrics, Tracing, and Security
Discover how Logz.io AI Agents are redefining observability with intelligent automation—resolving incidents, analyzing logs, and providing real-time answers for developers, engineers, and SREs when it matters most.| Logz.io
Modernize your app's observability on Heroku Fir. OpenTelemetry integration provides seamless monitoring, troubleshooting, and performance insights.| Heroku
I’m sitting in the Hungarian Railway Museum’s amazing park, under the shadow of buckeye trees, in the middle of a chirping concert from at least a dozen different birds. Halftime of the 2-day Craft Conference, I arrived a bit early to be able to finish this| Péter Szász
Explore the Prometheus design and see which components consume the most resources. Find out why it happens, what affects it, and how you can optimize your setups to get the best performance in monitoring.| blog.palark.com
Standard HTTP logs miss crucial details like request and response bodies, hindering debugging. Our article offers solutions for complete HTTP logging, ensuring you have all the necessary information for effective web management.| Kalvad
Capture user app behavior for use with Fullstory.| Retool Changelog
GitHub Actions lacks observability so we compared off-the-shelf observability solutions to find the best CI/CD monitoring platform The post Tracking the Signal in the Metrics – Level up your GitHub Actions with Observability appeared first on balena Blog.| balena Blog
Embrace spent time at KubeCon discovering that people are using OpenTelemetry in all sorts of ways, but maybe not around users.| Embrace
Learn the unique challenges of mobile environments so you can make the best decision when implementing observability for mobile apps.| Embrace
Debugging effectively requires a nuanced approach, similar to using tongs that tightly grip the problem from both sides. While low-level tools have their place in system-level service debugging, today's focus shifts towards a more sophisticated segme...| Java, Debugging, DevOps & Open Source
Announcing Sentry's OpenFeature integration for enhanced feature flag observability| OpenFeature Blog
Groan. Well, it’s not like I wasn’t warned. When I first started teasing out the differences between the pillars model and the single unified storage model and applying “2.0” to the latter, Christi…| charity.wtf
Embrace combines open-source SDKs with an analysis dashboard to help the entire engineering team understand exactly what is disrupting mobile user experiences.| Embrace
Discover how FreeBSD’s SO_SPLICE enables efficient kernel TCP proxying, reducing data copying overhead and improving performance.| Klara Systems
"Explainable AI" (xAi) or "explainability" is when you design and build systems that can explain their decisions. Turns out I do this right now.| Shattered Illusion by Chris Kenst
Uncover the secrets of ZFS space accounting and why available storage can appear lower than expected—see what impacts space calculation.| Klara Systems
Help us make feature observability better for everyone!| OpenFeature Blog
We caught up with Michael Garski, Director of Platform Engineering at Fender, to hear how things are going with Honeycomb for Frontend Observability.| Honeycomb
OTel was created to help collect and analyze observability data at scale. In this episode of Makers, Morgan McLean, its co-creator, explores the roadmap.| The New Stack
GKE users get access to an awesome new tool this week: the Kubernetes History Inspector. This product, released as open source, parses Kubernetes and GKE logs to generate a timeline with all events in the cluster. Kubernetes is a complicated system with multiple objects, and various automated pro| William Denniss
These days, systems and applications evolve at a rapid pace. This makes analyzing the internal performance of applications complex. Observability emerges as a path to efficient and effective operational insights. Imagine a team of doctors monitoring a patient’s vitals—heart rate, temperature, blood pressure. These readings, combined with observation of symptoms, paint a picture of the […] The post Introduction to Observability appeared first on pingdom.com.| Blog Posts Archive - Pingdom
I’ve had a wish list for a few years now of observability-related things I’d love to see someday in community/open-source Postgres. A few items from my wish list: Wait event counters an…| Ardent Performance Computing
Learn how to integrate OpenTelemetry with Kubernetes to enhance observability using Logz.io. This step-by-step guide covers logs, metrics, and traces setup, OTel demo deployment, and leveraging AI-powered insights for smarter monitoring and troubleshooting. Perfect for beginners and pros alike!| Logz.io
Struggling to scale your DIY ELK stack? Discover how migrating to a SaaS observability platform can simplify operations, reduce costs, and unlock advanced features like AI-powered insights. Learn when and how to make the shift for seamless scalability.| Logz.io
Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.| Blog
While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.Populated Fine Performance Metrics dashboard| Blog
Hendrik Makait2023-05-16| Blog
Hazel Weakly, you little troublemaker. As I whined to Hazel over text, after she sweetly sent me a preview draft of her post: “PLEASE don’t post this! I feel like I spend all my time trying to help bring clarity and context to what’s happening in the market, and this is NOT HELPING. Do you […]| charity.wtf
AI is transforming how we monitor, manage, and secure digital environments. Learn more about how AI log analysis can shape the future of observability.| Logz.io
Error monitoring for apps is now generally available on Retool Cloud. It will be available in Self-hosted Retool 3.114 Edge and a subsequent stable release.| Retool Changelog
In this webinar, product experts showed how the Logz.io AI Agent can transform root cause analysis, optimize performance, and de-risk deployments.| Logz.io
In this article, Charity Majors goes over the simple, technical distinction between observability 1.0 and observability 2.0.| Honeycomb
How we scrape callstack information from the LuaJIT engine for profiling| Debug Daily. Optimize Always | Polar Signals
Cisco boosts its cybersecurity and AI ambitions with $28B acquisition of Splunk - SiliconANGLE| SiliconANGLE
Effective cloud migration is about steady progress, proper monitoring, and adjusting to new insights.| The New Stack
For insights on what developers should consider when using AI with DevOps, we collected perspectives from DevOps experts and developers.| The New Stack
Making sense of mobile data, and finding useful signals, requires accounting for the effects of time.| The New Stack
Observability teams can leverage Logz.io AI Agent to automate RCA investigation and benefit from advanced, AI-powered data analytics.| Logz.io
Discover how AI is transforming cloud observability from manual monitoring to autonomous systems. This blog post explores the challenges faced by technical teams, the stages of maturity in observability, and the potential of generative AI to enhance performance and reduce MTTR.| Logz.io
When it comes to production-ready systems we need a way to know what’s going on in it, aiding us in debugging it, when the time comes.| Alexandru Burlacu
The Observability Crisis is an article from Jaya Gupta & Ashu Garg from Foundation Capital, a Silicon Valley based venture capital (VC) firm investing in tech startups. TLDR: Companies in the first wave of the observability space (such as Splunk, AppDynamics, Datadog and New Relic) focused on solving data storage and analysis problems. However, with […]| Shaun Abram
In this post we’ll show you how you visualize the cluster metrics in a web browser and also we’ll set up alerting so that when something like a drive needs to be replaced or drive runs out of space we can get alerted for it.| MinIO Blog
In this post, I focus on a middleware technique to add span links between request traces on ASP.NET Core during internal redirects on NET 9.| Steve Gordon - Code with Steve
GenAI promises evolutionary changes in how we use observability tools, but meeting expectations means heeding the lessons of our AIOps mistakes.| Logz.io
In this post, I share a solution to programmatically disable the recording (exporting) of an Activity when instrumenting code for OpenTelemetry.| Steve Gordon - Code with Steve
Written by: Nicole Fagen The backbone of global business operations runs through mainframes—which handle a staggering 30 billion transactions daily, hold 80% of the world’s business data, and process 90% […]| Planet Mainframe
Engineering has come a long way since the days of delivering discrete, point-in-time products that were often packaged on a CD and shipped to customers. The days of physical media and long development cycles are long gone. The advent of cloud computing and the rise of Software-as-a-Service (SaaS) transformed the landscape, creating a new model of continuous development and service delivery. This shift has not only revolutionized how software is developed, but has also redefined the engineer...| Honeycomb