I have added a static version of my CatVector demo app for visualizing embedding vectors: tanelpoder.com/catvector/ Previously you had to clone the CatBench repo and run the little Python webserver locally, but now it’s just a publicly accessible website. It loads a few multi-megabyte JSON files into arrays in your browser and then dynamically looks up a specific pet photo that you hover your mouse over in the UI. Depending on your Internet connection, it might take a bit of time for each p...| tanelpoder.com
I just added another tool into my 0x.tools toolset: With xstack, you can passively sample both kernel stacks and user stacks (as long as framepointers are present) with a pretty minimalistic tool, with no direct impact to your critical application processes! While xstack itself is just a data extraction tool, you probably want to summarize/profile all those thread states and stack profiles. Here’s an example of piping xstack output to flamelens to immediately display a (terminal) flamegraph...| Tanel Poder Consulting
The two screencasts below are a preview of what’s coming next week, when I formally introduce the latest xtop and xcapture v3 to the world. On next Tuesday, 19th August at 1pm EDT, I’ll upload the latest (beta) open source code to 0x.tools repo and run a 1-hour “hacking session” style webinar. It’s gonna be all demos of how I use these tools for my troubleshooting work. Go to the end of this page for the Zoom link. - Linux, Oracle, SQL performance tuning and troubleshooting - consul...| tanelpoder.com
TL;DR: I’m doing a CatBench Vector Search on AlloyDB webinar (hacking session style!) with Google on Wednesday, July 23th at 9am EDT. I was having some chats with Kiran Tailor (of Google) and one thing led to another: We are announcing a webinar/hacking session where I stress-test the latest (Postgres-compatible) Google AlloyDB Omni release using my CatBench (cat benchmarking) vector search stress-test suite! I created the CatBench Vector Search Playground mainly because I wanted to learn s...| tanelpoder.com
Today, my blog turned 18 years old! I published my first blog entry on June 18, 2007 and have since published over 300 posts. The first blog entry, like most of my others, was about systematic troubleshooting and performance optimization of various (Oracle) database workloads and especially their touchpoint with underlying Unix/Linux operating systems, storage and network I/O. The articles covering interactions between multiple layers of your application stack and how to navigate troubleshoot...| tanelpoder.com
CatBench is a Postgres + PgVector demo application that uses 25k pet photos from a Kaggle dataset for demonstrating how the similarity search features work together with the rest of your application code & schema in the backend (HammerDB Order Entry). [Jump to the new stress test + monitoring section] CatBench Interactive UI You can navigate around the recommendation engine for Cat Purchases UI, click on cat photos and find product recommendations, based on what other similar-looking cats hav...| tanelpoder.com
Here’s a useful tool for quickly testing whether a disk (or a cloud block store volume) is a good candidate for your database WAL/redo logs and any other files that require low latency writes. The pg_test_fsync tool is bundled with standard Postgres packages, so no extra installation is needed. You don’t actually have to use Postgres as your database, this tool’s output is universally valuable for any workload requiring fast writes. - Linux, Oracle, SQL performance tuning and troublesho...| tanelpoder.com
I’m finalizing things for launching my next-gen Linux Performance & Troubleshooting and Advanced Oracle Troubleshooting video courses in June, so expect much more blogging (and new scripts) to show up here. By the way, this week I’m running a spring-cleaning sale with unprecedented discounts! Oracle’s SQL*Net break/reset to client wait event shows up whenever Oracle (SQL, PL/SQL call, etc) returns an ORA- error back to the client instead of the normal expected results (and ORA-0 under t...| Posts on Tanel Poder Consulting
When dealing with disks and I/O things on Linux, you’d regularly run commands like lsblk, lsscsi, nvme list, etc. All of them tend to report a different set of information, so I ended up running multiple commands and correlating their output based on the device name or number. And then I had to run commands like these, to get extra info about the current OS-level configuration settings for specific disks: - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.| tanelpoder.com
In this post I will introduce a much more efficient method for accounting block I/O latencies with eBPF on Linux. In my stress test, the “new biolatency” accounting method had 59x lower CPU and probe latency overhead compared to the current biolatency approach. So I couldn’t help it and ended up putting 21 NVMe SSDs into one of my homelab servers. 8 of them are PCIe5 and the remaining 13 are PCIe4. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.| tanelpoder.com
To capture your attention, this is what you get: Table of contents What is xCapture? xCapture with modern eBPF Modular and decoupled at multiple levels Dimensional performance data analysis with DuckDB Top wall-clock time used by all threads Regular expressions in the filter Syscalls against network socket file descriptors What is the performance overhead? Performance overhead of active tracking probes Userspace processing overhead Running xcapture What’s next? What is xCapture? 0x. - Linux...| tanelpoder.com
What is this? When learning a new (human) language like Spanish or French, I’ve found that not knowing the various common “connecting words” of a sentence, like “from, to, it, at, with, without, under, on, less, more, is, not” and so on is a bottleneck for me. I might know the words like “dog” “cat” “food”, but when just starting my learning, not remembering the common “connecting words” always adds a recalling “tax” or bottleneck for me when trying to speak. - L...| tanelpoder.com
TLDR; Starting from Linux kernel version 6.9 on x86_64, there’s a new config option CONFIG_X86_FRED enabled and it adds 16 bytes to the starting point of a task’s kernel stack area, so you’ll need to account for this extra padding in your “raw” kernel stack & pt_regs lookup code. Introduction I’ve been using Ubuntu 24.04 as my main eBPF development and testing platform without issues since its release. It is shipped with Linux kernel version 6. - Linux, Oracle, SQL performance tun...| tanelpoder.com
This should be something useful for Linux kernel explorers and eBPF nerds! Last year I released a tool called syscallargs that walked through the Linux /sys/kernel/debug/tracing/events/syscalls directory tree and allowed you to query and list available system calls in your current system from the command line. And it printed out syscall argument names and their datatypes, so it has helped me avoid opening man-pages (or web-pages) a few times. Later I realized that since many other kinds of ke...| tanelpoder.com
Last year, I published a simple demo app that visualizes embedding vectors of a bunch of cat, dog and airplane photos as heatmaps. It’s a simple concept and does not have much practical use. I built it mostly for fun, but also for some “learning by doing” too. Today I’m releasing a completely new version of my CatBench Vector Search Playground app! https://github.com/tanelpoder/catbench With this new app, you can interactively navigate through multiple Postgres-based vector similarity...| tanelpoder.com
Last year I wrote an article about visualizing embedding vectors of a variety of different pictures as heatmaps. I used TorchVision with ImageNet1K_V1 model for encoding a bunch of different cat, dog and plane images into their embeddings, 1000 floating-point values in each resulting vector. I used the generate_embeddings.py script for that. This whole exercise is just for my own learning (and fun), so it’s ok if there’s no other practical value coming from this work. - Linux, Oracle, SQL...| tanelpoder.com
Back in 2018 I wrote about how to apply Brendan Gregg’s FlameGraphs for visualizing SQL plan execution time 1, using the plan node-level context and timing data that some database engines provide. This approach is not designed to replace the traditional plan display methods that are packed with lots of additional metrics and details. Nevertheless, in case of large execution plans, SQL plan flamegraphs give you a quick overview of both the plan structure, tables and operations involved and w...| tanelpoder.com
Disclaimer: I’m not an ML expert and not even a serious ML specialist (yet?), so feel free to let me know if I’m wrong! It seems to me that we have hit a bit of an “on-premises” vs. “on-premise” situation in the ML/AI and vector search terminology space. The majority of product announcements, blog articles and even some papers I’ve read use the term vector embeddings to describe embeddings, but embeddings already are vectors themselves! - Linux, Oracle, SQL performance tuning an...| tanelpoder.com
Here’s a 12-minute video of our chat with Silk VP of Product Tom O’Neill about my recent testing of the Silk Platform in Google Cloud. In this interview we cover some high level points, conclusions and talk a little bit about the future. If you want to read all the technical details and some interesting references, see the links below: Testing the Silk Platform in 2024: Achieving 20 GiB/s I/O Throughput in a Single Cloud VM (2024) Testing The Silk Platform - Hands-On Technical Analysis of...| tanelpoder.com
Hands-on technical analysis of a novel data platform for high-performance block I/O in the cloud, tested by Tanel Poder, a database consultant and a long-time computer performance nerd. Index Background and motivation Scalable Architecture by Design Enterprise Features Testing Results I/O Latency 1.3 Million IOPS Lessons Learned Summary Background and Motivation Back in 2021, my old friend Chris Buckel (@flashdba) asked me if I would like to test out the Silk Data Platform in the cloud and se...| tanelpoder.com
Following my earlier Visualizing and Comparing Embedding Vectors as Heatmaps article, here are the launch & demo videos. You might want to read this article first, these short demo videos should make more sense then! Visualizing Vision Transformer Embedding Vectors as HeatMaps In this video I’ll peek around cat vectors and then find outliers from both the cat photo dataset and aircraft dataset. Installing CatBench Vector Visualizer App and Technical Details The CatBench GitHub repo is here:| Posts on Tanel Poder Consulting
If you already know what embeddings are, you can jump right to: Vector Heatmap Visualization Background I am currently testing various RDBMS vector search options and their usability & performance. Next week I will publish an early version of my CatBench toolset that will eventually evolve into a full end-to-end test (and fun!) suite for high performance AI/ML pipelines, with things like streaming image recognition using Vision Transformers on GPUs, followed by vector searches in target datab...| Posts on Tanel Poder Consulting
Here’s my talk from eBPF Summit 2024. Video (10 minutes) The 0x.tools Extended Task State Sampling approach provides a new observability signal - wall-clock time of all active threads’ activity in your system. You can see both total workload demand of your apps and also drill down deep into individual threads’ activity when needed. This method and toolset give you a single data source with many directly linked fields of information about what your application & OS threads are doing and ...| Posts on Tanel Poder Consulting
Starting from Python 3.12, you can instruct Python to enable direct perf profiling support for your Python code. This feature currently works only on Linux, as it communicates the relationships between Python-level interpreted function calls and machine-code level return pointers (for stack unwinding) via perf map files. The Linux perf tool knows how to read these files and maps the return pointers in a thread’s stack to the corresponding Python-level symbols (similar to what JVM JIT code p...| tanelpoder.com
Yes! Various traps like page faults may cause your user process to be switched into kernel codepath even when the process is minding its own business in the userspace. It just needs to do something that causes such a trap, like touch a new page of memory in its virtual memory address space, that hasn’t been “fully materialized” yet in the kernel memory structures. For example, after starting up an Oracle database instance that uses a large amount of shared memory, there’s one process ...| tanelpoder.com
On Tuesday I will publish the code for my next big thing I have been working on for the past 5 years. No, this is not about AI or a yet another startup - I’m still a big fan of NI (natural intelligence) and what I reveal is open source and free for everyone to use. I’m not gonna sell you any products, only ideas! You will see a new approach and a supporting toolset for understanding system behavior and its performance across all your applications and data center & cloud footprint. - Linux...| tanelpoder.com
This blog post is not about tracing program system calls, but about programmatically extracting all system calls and their accepted arguments in your current system. If you want to read more about tracing system calls, here’s my tutorial using strace for troubleshooting a real life scenario: Troubleshooting Linux SSH Login Delay - Why does logging in always take 10 seconds?1 I am building the next version of my 0x.tools xcapture tool for continuous OS-wide profiling of process activity in L...| tanelpoder.com
Thanks to Tomasz Sroka’s pull request for my hint.sql in the TPT repo and blog entries by Sayan 1 and Jonathan 2, we now have a HINT_SCOPE field in my hint.sql script too! SQL> @hint merge NAME VERSION VERSION_OUTLINE INVERSE HINT_SCOPE ----------------------- ----------- --------------- -------------- -------------- MERGE_CONST_ON 8.0.0 STATEMENT MERGE_AJ 8.1.0 8.1.7 QBLOCK MERGE_SJ 8.1.0 8.1.7 QBLOCK MV_MERGE 9.0.0 QBLOCK MERGE 8.1.0 10.1.0 NO_MERGE QBLOCK OBJECT NO_MERGE 8.0.0 10.1.0 MER...| Posts on Tanel Poder Consulting
I switched my blog’s commenting system to use Giscus. It’s still backed by Github (Discussions) as “storage”, but it doesn’t request wide Github permissions from commenters (like Utterances used to do). I didn’t migrate old comments over, but they’re still available here: https://github.com/tanelpoder/blog-comments/issues/ For current/latest comments, just scroll to the bottom of the blog entries (or browse the corresponding GitHub Discussions page): https://github.com/tanelpode...| tanelpoder.com
One evening back in January I finally had enough of thermal issues within my homelab server. You know, every time the computer fans make more noise than I think they should, I can’t help but investigate! Also, the RTX4090 is so thick that it takes 3 PCIe slots worth of space on a typical motherboard - and you also need space for the airflow! I was using the same machine that achieved 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation back in 2020. - Linux, Oracle, SQL performance tu...| tanelpoder.com
Over time, Oracle has been adding more contextual information into each allocated heap memory chunk, to make it easier to see what for your heap memory is used. This instrumentation is used for private (PGA,UGA,etc) heaps too, but this article focuses only on shared pool heaps. A few examples from past are: Library cache object hash values added to allocated shared pool chunk names (2010) The above technique is used for some other allocation types too, like KKSSP^NNN allocations that are used...| tanelpoder.com
Back in 2007, my third blog entry was about a parallel index building gotcha, a warning about building & rebuilding indexes in parallel. After completing the build, Oracle left the parallel degree in place for that index in data dictionary. Thus, after building a great new index over some weekend, many of your OLTP queries could have picked a parallel plan afterwards! Whenever creating/rebuilding any indexes, I had a habit of writing up both the CREATE INDEX . - Linux, Oracle, SQL performance...| tanelpoder.com
You probably are already familiar with my ash_wait_chains scripts that use CONNECT BY for walking through the blocking_session fields in ASH and report “wait chains”, signatures of complex waits involving other database sessions. Since ASH is “just” a history of V$SESSION, it’s sensible to assume that one could sample the current wait chains just by querying the GV$SESSION views, instead of summarizing history from ASH. And indeed it’s possible - here’s my Session-Wait-Chains sc...| tanelpoder.com
About 14 years ago I published the OstackProf script that ran on Windows sqlplus clients (the Oracle server could run on any platform that supported ORADEBUG SHORT_STACK). I don’t use Windows almost at all nowadays, so it’s about time to publish the same functionality for MacOSX, Linux, Unix sqlplus clients. This is mostly useful when hacking dev systems, troubleshooting test systems and in rare cases, carefully troubleshooting non-fatal processes in production too. - Linux, Oracle, SQL p...| tanelpoder.com
Starting from v8.0.27, the MySQL daemon shows its thread names at the OS level too, for better troubleshooting. The MySQL 8.0.27 release notes say this: To assist monitoring and troubleshooting, the Performance Schema instrumentation is now used to export names of instrumented threads to the operating system. This enables utilities that display thread names, such as debuggers and the Unix ps command, to display distinct mysqld thread names rather than “mysqld”. - Linux, Oracle, SQL perfor...| tanelpoder.com
Here’s the High Performance Block IO on Linux hacking session video. It’s based on my recent article about achieving 11M IOPS & 66 GB/s IO on a single ThreadRipper workstation. Enjoy! Towards the end of this video, I also explain how I got up to 11.5M IOPS without having to keep CPUs 100% busy, with batched I/O submission and completion checks I reduced the CPU usage pretty noticeably. Also, with properly configuring I/O polling queues in the Linux kernel, I got close to 10M IOPS by using...| tanelpoder.com
Update: I have added an interview with Jonathan Lewis to YouTube too. We talk a bit about the last 10 years and Jonathan describes what he’ll be talking about in his virtual conference talk: I will add a few more videos in the coming days. The conference interview playlist URL is here. You can sign up for the conference here. See you soon! - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.| tanelpoder.com
I have uploaded the 3-hour hacking session video into my YouTube channel in 2 parts. Introduction and using ASH + wait chains for troubleshooting performance spikes A case study of a complex Oracle performance problem The slides are in SpeakerDeck: https://speakerdeck.com/tanelpoder/troubleshooting-complex-oracle-performance-problems FYI there were a few questions about using such techniques without Oracle’s built-in ASH (due to licensing and Oracle Standard Edition). You can look into the ...| tanelpoder.com
As promised, I will run one more hacking session in the end of January. I aim to make this one more practical than some of the past internals geek-dives :-) The topic: Troubleshooting Complex Oracle Performance Problems with Tanel Poder Date & Time: Thursday 30 January 2020 (today!) 12pm-1:30pm ET Registration: https://attendee.gotowebinar.com/register/7847585427420817931 Description: In this session I will first present some slides about a complex performance issue where the initial symptoms...| tanelpoder.com
I have uploaded the 3 hour hacking session video about Oracle instance state objects and how to read Oracle systemstate dumps to my YouTube channel: Note that as I just uploaded the video, you may want to wait for a few hours until YouTube publishes HD version (currently it’s available only at 360p, but that should change soon). The slides are in Slideshare: https://www.slideshare.net/tanelp/oracle-state-objects-and-system-state-dumps/ Enjoy! :-) I will run another hacking session in the en...| tanelpoder.com
I have created a new page & uploaded videos of my Linux Process Snapper hacking session: Resources Homepage: https://tp.dev/psnapper GitHub: https://github.com/tanelpoder/psnapper Videos Video 1: Exploring /proc Filesystem & System Calls Hacking Session Video 2: Linux Process Snapper Demo Slide(s) The only slide I showed during the hacking session is this: As I didn’t get to cover some planned Linux tracing topics today, I guess this means we’ll have another hacking session some day! - Li...| tanelpoder.com
As I’m having fun preparing for my Linux troubleshooting training, I’ll also do another informal hacking session to give you an idea of what’s coming up! This week’s hacking session is deliberately without any slides, planning or much structure, but with lots of fun and … hacking! This session will be about some Linux tools I use for my everyday work and I’ll even publish an early version of a new open source Linux performance tool! - Linux, Oracle, SQL performance tuning and trou...| tanelpoder.com
As I announced last year, I’m running a new online Practical Linux Performance and Application Troubleshooting training class, starting on 29. April 2019. The emphasis of this class is on practical application troubleshooting and Linux OS performance tuning techniques. We’ll be using scripts and tools that you can use on any Linux machine whenever a problem happens, without having to first install a complex layer of monitoring tools. At the same time, we will go way beyond what the classi...| tanelpoder.com
Update: Starting from 2022, I am delivering all my classes in a new format, via my new Learning Platform. Check out what are Learning Weeks and the new Continuous Learning method here: https://learn.tanelpoder.com More details about the class contents here: Advanced Oracle SQL Tuning training Practical Linux Performance & Application troubleshooting training Advanced Oracle Troubleshooting training Here are some example slides from the AST class for eye-candy :) Execution plan data flow Readi...| tanelpoder.com
I was recently involved in a discussion around why would a regular Oracle foreground session wait for the log file sequential read wait event. I then realized that I had never published my Hotsos Symposium 2014 talk about a relatively complex (treacherous?) Oracle performance issue involving multiple problems that “collide”. So I decided to run this presentation as a webinar next week + will demo some of my newer related scripts too! - Linux, Oracle, SQL performance tuning and troubleshoo...| tanelpoder.com
Update: Just as I started blogging about the Optane technology, it turns out that Intel is shutting down the Optane business (after bringing the latest-gen “Crow Pass” Optane memory to market - probably just for satisfying existing commitments). While not exactly the same thing, keep an eye on the new CXL standard developments to see where the data center/cloud infrastructure world is going (CXL is basically cache-coherent disaggregated memory access protocol via PCIe5. - Linux, Oracle, S...| tanelpoder.com
Update: Just as I started blogging about the Optane technology, it turns out that Intel is shutting down the Optane business (after bringing the latest-gen “Crow Pass” Optane memory to market - probably just for satisfying existing commitments). While not exactly the same thing, keep an eye on the new CXL standard developments to see where the data center/cloud infrastructure world is going (CXL is basically cache-coherent disaggregated memory access protocol via PCIe5. - Linux, Oracle, S...| tanelpoder.com
It’s almost the beginning of September, time to get back to school, everyone! I recently decided to do my Oracle SQL Tuning and Performance & Troubleshooting training classes again. So I will also run some free hacking sessions for promoting these classes, but also for sharing cool new stuff with people who can’t attend the training and of course for fun too! The hacking sessions do not follow a formal training structure, have very few or no slides at all, just live demos of putting perfo...| tanelpoder.com
I was recently involved in troubleshooting an interesting performance issue with some non-intuitive background process & wait event behavior. I reproduced the problem in my own custom-tailored environment (Oracle 18.3 on Linux), here’s the starting point: After starting a transaction-heavy benchmark, soon after the transaction rate drops and the database sessions are mostly waiting for some Configuration wait class events, instead of getting work done. Just looking into the high level wait ...| tanelpoder.com
The Oracle SQL Developer team has released version 18.3 of this tool. My favorite new feature in this version is its ability to save SQL Monitoring reports in HTML format! Adobe Flash is finally not needed for sharing graphical SQL Monitoring reports. Here’s an example output, it looks pretty nice and clean: The report is reasonably navigatable, has tooltips (useful for identifying wait classes for example) and you can also rearrange columns by dragging them left or right with your mouse. -...| tanelpoder.com
Introduction Brendan Gregg invented and popularized a way to profile & visualize program response time by sampling stack traces and using his FlameGraph concept & tools. This technique is a great way for visualizing metrics in nested hierarchies, what stack-based program execution uses under the hood for invoking and tracking function calls. If you don’t know what FlameGraphs are, I suggest you read Brendan’s explanation first. In this blog post I won’t be doing traditional stack profil...| tanelpoder.com
When I delivered my Advanced Oracle SQL Tuning training last week, I jumped into a free-form demo to show that the PARALLEL hints don’t really force anything other than adjusting the cost of parallelizable operations for CBO evaluation as usual. If a serial plan (or branch) still has a lower cost than the respective parallel processing cost, then serial execution would be used there. But when demoing this, I got a surprise. - Linux, Oracle, SQL performance tuning and troubleshooting - consu...| tanelpoder.com
As I’m delivering my Linux Troubleshooting training soon, I am going to blog about some typical issues and techniques we’ll troubleshoot in the class too. I’ll start from a relatively simple problem - logging in to a server via SSH always takes 10 seconds. The delay seems to be pretty constant, there don’t seem to be major network problems and the server is not overloaded. Yet, remote logins always take 10 seconds. - Linux, Oracle, SQL performance tuning and troubleshooting - consulti...| tanelpoder.com
I talked about a new diagnostic event wait_event[] back at OakTableWorld 2014. Oracle introduced this feature in version 12.1. But since there’s only a camera recorded video of that talk, I’ll document some examples here. Attaching Actions to Wait Events The new wait_event[] event name allows you to attach actions, like tracing, to wait events. The action runs whenever a wait event ends. It behaves like Oracle’s SQL Trace that writes out trace lines for wait events only when the wait en...| tanelpoder.com
This post is written mostly for fun, not for practical everyday usability (enjoy!). Introduction There was a question on Twitter about alternative options for retrieving the SQL execution plan line ID that Oracle is currently executing. Normally Oracle’s Active Session History shows this info in its PLAN_LINE* columns as I have explained in my asqlmon.sql post. does anyone knows if I can select the equivalent from ASH.SQL_PLAN_LINE_ID for current session? (something like v$session or x$. - ...| tanelpoder.com
As I promised in my previous post, I’m going to blog more frequently for a change. So here’s a blog entry about some “new” Oracle execution plan displaying scripts that I’ve had since 2011 or so - I just tidied them up recently and added some improvements too. My aim in this blog post is not to go deep into SQL tuning topics, but just show what these scripts can do. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.| tanelpoder.com
If you use my TPT scripts for Oracle troubleshooting, I have some good news for you. I had created a help.sql script years ago for showing me a quick index of my scripts, with syntax examples, straight from sqlplus command line. BUT, it was just an empty template, I never got to populating help.sql to cover my most used scripts. Recently Tomasz Sroka, who had attended my Oracle troubleshooting training, mentioned that he had taken matters into his own hands and documented about 100 of my scri...| tanelpoder.com
As running Oracle databases as (partially) managed services in the cloud has become quite popular, I thought to start a little series about troubleshooting Oracle (performance) stuff when not having OS access. This will help with such cloud services, but also in cases where your team just doesn’t have convenient OS access due to some separation of privileges reasons. The first example is my schedlat.sql script that uses X$KSO_SCHED_DELAY_HISTORY to list sampled process scheduling latency de...| tanelpoder.com
My ASH wait chains scripts started returning the ORA-12850 error on Oracle 19c RAC installations: SQL> @ash/dash_wait_chains program2||event2 1=1 sysdate-1 sysdate -- Display ASH Wait Chain Signatures script v0.5 BETA by Tanel Poder ( https://tanelpoder.com ) , REPLACE(SYS_CONNECT_BY_PATH(program2||event2, '->'), '->', ' -> ')||CASE WHEN CONNECT_BY_ISLEAF = 1 AND d.blocking_session IS NOT NULL THEN ... ERROR at line 63: ORA-12850: Could not allocate slaves on all specified instances: 2 needed...| tanelpoder.com
I hereby announce that I will retire in 2030 and replace myself with an AI bot! I am serious. This is probably the clearest mission statement I’ve ever come up with in my career! This is essentially my statement of direction for the next 10 years. There will be a series of posts about what I am up to and where I’m going. My plan for the next 10 years When the year 2020 arrived, I spent a month thinking about the future and what I want to be working on for the next decade. - Linux, Oracle,...| tanelpoder.com
As you may know, I like to use low tech scripts when possible, ideally such ones that don’t even require you to be root. I prefer simple and “boring” tools as a starting point simply because of the production reality at my customers in traditional enterprise IT. Systems where RHEL6 (and clones) seem to be the most common distros (with Linux kernel 2.6.32) and it’s not too unusual to see RHEL5 occasionally either. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting...| tanelpoder.com
There was a discussion at Oracle-L about shared pool chunk eviction and I thought I’d show how to see a shared pool memory chunk’s position in the LRU list yourself. This is a post mostly about Oracle internals and less about practical every day troubleshooting. There’s a view X$KGHLU (@kghlu.sql script) that shows you a summary of shared pool (sub)heap allocations and how many of these chunks are in the “transient list” (used once) vs. - Linux, Oracle, SQL performance tuning and tr...| tanelpoder.com
The current state of affairs in the entire world have radically changed since last month when we announced the next Oracle SQL Optimization event in the virtual conference series. We’ve been keeping an eye on the trajectory of the developments and decided to make some changes too. So, here’s the update: we are going to run the Oracle SQL Optimization in 2020 virtual conference twice! Additional dates These are the existing + new dates for the event: - Linux, Oracle, SQL performance tuning...| tanelpoder.com
When talking about Oracle background processes, there’s a term/qualifier “fatal” background process. This means that when one of these background processes crashes, then whoever detects the process disappearance (PMON or LGWR or CLMN possibly), will shut down the instance as it cannot function normally anymore. Not all background process crashes take down the whole instance, for example processes like J000 and P000 are technically background processes (daemons disconnected from network)...| tanelpoder.com
Update: I have uploaded the videos of this hacking session to YouTube. I will run another hacking session that has been in my mind for a while. It will be a pretty narrow deep-dive into one of the SQL Monitoring report’s columns that is not well explained: Oracle SQL Monitoring - Understanding the Execution Timeline column In this hacking session, I will explain how to read the “Execution Timeline” column in the SQL Monitoring reports. - Linux, Oracle, SQL performance tuning and trouble...| tanelpoder.com
As I promised in the end of the last hacking session (about Oracle SQL Monitoring, I will run another one on 2. June 2020. This one will be a deep dive into Oracle hint usage and various scenarios of hint (in)validity. I have too much material in my full-week Advanced Oracle SQL Tuning training, so I’m moving some “narrow deep dives” out and make available for free, so that we could spend more time actually tuning SQL during the class. - Linux, Oracle, SQL performance tuning and trouble...| tanelpoder.com
In this post I will show you how to break down Linux system load by the load contributor or reason. You can drill down into the “linux system load in thousands” and “high system load, but low CPU utilization” problem patterns too. Introduction - terminology Troubleshooting high system load on Linux Drilling down deeper - WCHAN Drilling down deeper - kernel stack How to troubleshoot past problems Summary Further reading Introduction - Terminology The system load metric aims to represen...| tanelpoder.com
Here’s a list of reasons why SELECT * is bad for SQL performance, assuming that your application doesn’t actually need all the columns. When I write production code, I explicitly specify the columns of interest in the select-list (projection), not only for performance reasons, but also for application reliability reasons. For example, will your application’s data processing code suddenly break when a new column has been added or the column order has changed in a table? - Linux, Oracle, ...| tanelpoder.com
FYI: I will be speaking about similar topics at P99Conf in Oct 2023 (online, free event) There are plenty of JVM profilers out there, but before attaching with a profiler, you can get a high-level overview by just sampling the Linux /proc/PID/comm and /proc/PID/stat fields to see which JVMs threads are most active and whether they’re burning CPU or are blocked by some I/O. When creating threads in Java, you can name them using the following syntax: Thread t = new Thread("MyThread") or threa...| tanelpoder.com
Here’s a little SQL query that lists all Oracle wait events where Oracle kernel developers have added additional “human readable” descriptions. Since the original wait event names shouldn’t be changed for compatibility with existing tools, then a new column DISPLAY_NAME was added: SQL> SELECT name, display_name FROM v$event_name WHERE name != display_name; NAME DISPLAY_NAME ----------------------------------------- ---------------------------------------------------------------- contr...| tanelpoder.com
TL;DR Modern disks are so fast that system performance bottleneck shifts to RAM access and CPU. With up to 64 cores, PCIe 4.0 and 8 memory channels, even a single-socket AMD ThreadRipper Pro workstation makes a hell of a powerful machine - if you do it right! Introduction In this post I’ll explain how I configured my AMD ThreadRipper Pro workstation with 10 PCIe 4.0 SSDs to achieve 11M IOPS with 4kB random reads and 66 GiB/s throughput with larger IOs - and what bottlenecks & issues I fixed...| tanelpoder.com
TL;DR: Some Linux distros and even different kernel versions within a distro have disabled IRQ time accounting. In such case, your monitoring tools will report zero time spent in hardware interrupt handlers (shown as %hi, %irq, hiq, etc). It’s easy to check how your kernel is behaving by looking at /proc/stat and you can still measure interrupt CPU usage using perf and a little creativity. Index Index Problem Explanation Kernel configuration options Measuring IRQ time with perf Different di...| tanelpoder.com
Over the years of troubleshooting performance problems in the Unix/Linux world, I have seen multiple cases where a regularly used command line tool in a customer server just stops working for some reason. The tool just returns immediately, doing absolutely nothing. No output printed, no coredumps and the exit code is zero (success!). This article walks you through a couple of such incidents and in the end I explain how I avoid accidentally doing bad stuff in production in general. - Linux, Or...| tanelpoder.com
Learn to troubleshoot some of the most complex Oracle performance problems - real life case studies by Tanel Põder I already announced my latest virtual conference in social media, but will write a short blog entry here too (people who are following my blog via mailing list or RSS will get the update too). This year’s topic is: Troubleshooting Very Complex Oracle Performance Problems. I will deliver 4 case studies (based on real life events) where I will dive very deep, even deeper than in...| tanelpoder.com
Hands-on technical analysis of a novel data platform for high-performance block I/O in the cloud, tested by Tanel Poder, a database consultant and a long-time computer performance geek. Index Background and motivation Architecture and internals of the Silk Platform I/O throughput for table scanning (for your big data warehouses) IOPS, I/O latency and write performance (OLTP and data loads) Scaling out I/O performance in the cloud (demo of live operation) Summary and architectural opportunitie...| tanelpoder.com
You may remember my previous article about high-performance I/O for cloud databases, where I tested out the Silk Platform and got a single Azure VM to do over 5 GB/s (gigaBytes/s) of I/O for my large database query workload. The magic was in using iSCSI over the switched cloud compute network to multiple “data VMs” at the virtualized storage layer for high I/O throughput and scalability. Well, the Silk folks are at it again and Chris Buckel (@flashdba) just sent me screenshots of similar ...| tanelpoder.com
Tanel at Mount Denali in Alaska (2021) In the beginning of every year I travel somewhere and take some time to think about the future and what I should be working on next. The 2020/2021 years were full of all kinds of distractions for me (and many of you). I got somewhat sick a couple of times too (although never tested positive for COVID) and the full recovery to 100% took time. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & training.| tanelpoder.com
In the previous post in this series, I showed how my scripts xb.sql and xbi.sql (eXplain Better and eXplain Better, by sqlId) help you see Oracle execution plan details in a more complete way. Here are the features that I currently plan to cover: More detailed access/filter predicate info next to plan tree data (Part 1) More intelligent row-source level execution statistics presentation (Part 2, this post) Calculate Optimizer Cost misestimation factor and demo its effects (Part 3, coming soon...| tanelpoder.com