Roastidio.us Tagged with elixir

Erlang Ecosystem Foundation - Supporting the BEAM community

Thu, 28 May 2026 00:36:05 +0000

Atom Exhaustion Is Not a Footgun. It's One Third of Our CVEs.

35.8% of CVEs published by the Erlang Ecosystem Foundation CNA fall into the category of uncontrolled resource consumption. In the BEAM ecosystem, a large share of those are caused by one recurring issue: atom exhaustion. You can find the current distribution on the EEF CNA’s Common Weaknesses page.

Atom exhaustion is a denial-of-service vulnerability. Atoms are not garbage collected and are stored in a global atom table, and once it fills up, the VM crashes. Creating atoms from non-finite values, especially user-supplied input, is therefore a latent DoS waiting to happen.

This is not limited to obvious calls such as binary_to_atom/1, list_to_atom/1, String.to_atom/1, or List.to_atom/1. Some dangerous patterns are less obvious:

% Erlang: dynamic atom creation through interpolation
list_to_atom("field_" ++ UserInput)

# Elixir: decoding JSON with atom keys
Jason.decode(json, keys: :atoms)

# Elixir: dynamic atom creation through interpolation
:"field_#{user_input}"

What makes this class of vulnerability persistent is not carelessness. It often appears in code where the input was assumed to be controlled or finite. URI schemes are a good example: it may feel like there are only a few schemes to handle, but if the value comes from external input, the set is no longer guaranteed to be finite.

Creating atoms from input is unsafe unless the set of possible values is finite, known, and enforced.

The safest approach is to avoid creating new atoms at runtime entirely. Prefer explicit lookup tables when the accepted values are known:

% Erlang
case Scheme of
  <<"http">>  -> http;
  <<"https">> -> https;
  _           -> error
end

When a lookup table is not practical, use the safer existing-atom variants, which will raise an error instead of creating a new atom:

% Erlang
binary_to_existing_atom(Value)
list_to_existing_atom(Value)

# Elixir
String.to_existing_atom(value)
List.to_existing_atom(value)

Linters can help catch these patterns before they become vulnerabilities. For Elixir projects, consider enabling Credo’s Credo.Check.Warning.UnsafeToAtom, which flags unsafe calls to String.to_atom/1, List.to_atom/1, Module.concat/1,2, and Jason.decode/2 with keys: :atoms. The check is disabled by default.

If you maintain an Erlang or Elixir project, search your codebase for atom creation from binaries, strings, JSON keys, URI components, headers, and configuration values. This is one of the easiest vulnerability classes to fix before it becomes a CVE.

For more detailed guidance, see the EEF Security Working Group’s guide on preventing atom exhaustion.

Highest Random Weight in Elixir | jola.dev

Sat, 23 May 2026 15:49:11 +0000

Consistent hashing is a common building block for distributed Elixir and enables fairly low complexity and high value design patterns, like the distributed rate limiter or cache. I’ve written about it before.

The most common way of assigning keys to nodes, ensuring that any node participating in the cluster can figure out which node owns the given key, is Discord’s ExHashRing. This is an incredibly battle-tested and reliable library with excellent performance characteristics, and I’ve only had good experiences with it.

That said, it does have a downside. You have to start and manage the ring processes. It’s not a huge downside, you can give them global names and it’s trivial to look them up, but you still want them set up under your supervision tree and they are stateful persistent things that hang around. That state has to be managed. It’s not a big deal at all, but when I found a stateless alternative it did immediately catch my attention.

Rendezvous hashing

As described by the Wikipedia page: Rendezvous hashing is both much simpler and more general than consistent hashing. Also called HRW or Highest Random Weight. In practice, you can use it very much like you would ExHashRing.

ExHashRing example.

HRW example.

That’s it. No stateful process, no setup. Just pure functional programming with inputs and outputs. Consistent across multiple machines. Avoids unnecessary drift when changing the list of nodes. You can see why it caught my eye!

There’s a downside of course. The big O notation for HRW.owner is linear (O(n)), or in other words, it doesn’t do well with larger lists of nodes. That’s definitely something to take into account when considering using it. But to be honest, looking back at the times I’ve used ExHashRing I’ve never had more than ~14 nodes to worry about. Here’s a comparison of how each algorithm does on my machine for 14 nodes.

ExHashRing is extremely fast, and stays fast as the number of nodes grow. But at a smaller number of nodes, even on a fairly hot path, there’s not much difference here. You’re free to pick whichever one you think reads better.

Basic HRW algorithm

Let’s dig a bit deeper into rendezvous hashing. The basic implementation is actually incredibly small. What you want to do is apply a scoring function on the key together with each of the nodes separately and then return the highest value. Highest Random Weight. For a scoring function you can use any fast hashing function really. :erlang.phash2 is an obvious candidate in the BEAM ecosystem.

Here’s what that looks like.

It’s pretty ingenious!

Linear growth

Just to demonstrate how that affects performance as nodes grows, here’s a benchmark run with 10K nodes. 4200x times slower than ExHashRing. Although to put things into perspective, it’s still just taking ~2 ms on my machine. Depending on your use case, that might actually be just fine. It’s hard to beat the convenience of a pure function.

But let’s see if we can do better.

HRW skeleton

Our basic HRW implementation, although actually quite fast, doesn’t behave well as the number of nodes grows. This is because it, for every lookup, has to hash the key against every node. That same Wikipedia page describes a way around that by arranging the nodes into an efficient data structure and bringing the big O notation of owner to O(log n).

At a very (very) high level what we want to do is sort the list of nodes and then chunk them into clusters. Each cluster gets an address and instead of hashing the key against every node, we now just need to calculate the address of the cluster, and then we can hash the key against the nodes inside that cluster to find the correct one. This means significantly less effort, bringing us to a much nicer logarithmic complexity.

Using it looks something like this.

Running the same benchmark as above, but with the skeleton created in advance, just like we do for ExHashRing, this is what we get.

##### With input D: 10_000 #####
Name                                ips        average  deviation         median         99th %
ExHashRing.Ring.find_node        2.17 M     0.00046 ms  ±1791.93%     0.00042 ms     0.00058 ms
HRW.owner (skeleton)             0.71 M     0.00141 ms   ±634.18%     0.00138 ms     0.00183 ms
HRW.owner                     0.00047 M        2.13 ms     ±5.03%        2.10 ms        2.53 ms

Comparison:
ExHashRing.Ring.find_node        2.17 M
HRW.owner (skeleton)             0.71 M - 3.06x slower +0.00095 ms
HRW.owner                     0.00047 M - 4615.43x slower +2.13 ms

We’ve gone from 2 ms per lookup to 141 µs, only ~3x slower than ExHashRing, with no NIFs and no stateful processes to start up. We do have a struct we have to pass around now, and adding and removing nodes is no longer a stable operation. Adding a node pushes everything that comes after in the sorted list one slot over. I guess nothing in life is free. Still, this is an interesting tradeoff for a lot of use cases.

Distribution

The other thing you probably want to know about a mechanism for distributing work/keys/load across a set of nodes, is how well it distributes. It wouldn’t be very useful if every key maps to the same node. Here’s a little sample script that demonstrates the distribution.

defmodule Distribution do
  def run do
    keys = Enum.map(1..100_000, fn i -> "key-#{i}" end)

    for n <- [10, 100, 1000] do
      nodes = Enum.map(1..n, &"node#{&1}")
      ideal = div(length(keys), n)

      counts =
        keys
        |> Enum.map(&HRW.owner(&1, nodes))
        |> Enum.frequencies()
        |> Map.values()

      min_c = Enum.min(counts)
      max_c = Enum.max(counts)
      avg = Enum.sum(counts) / length(counts)
      stddev = :math.sqrt(Enum.sum(Enum.map(counts, fn c -> (c - avg) ** 2 end)) / length(counts))

      IO.puts("#{n} nodes, #{length(keys)} keys (ideal #{ideal} per node):")
      IO.puts("  min: #{min_c}  max: #{max_c}  stddev: #{Float.round(stddev, 1)}  (#{Float.round(stddev/avg*100, 2)}% of mean)")
    end
  end
end

Distribution.run()

I extended that to add HRW with MurmurHash3, HRW with skeleton, and ExHashRing, for comparison.

10 nodes, 100000 keys (ideal 10000 per node):
    phash2 (HRW)           min: 9691  max: 10639  stddev: 249.9  (2.5% of mean)
    murmur3 x86_32 (HRW)   min: 9859  max: 10192  stddev: 112.2  (1.12% of mean)
    murmur3 x64_128 (HRW)  min: 9864  max: 10170  stddev: 98.1   (0.98% of mean)
    HRW.Skeleton           min: 9691  max: 10639  stddev: 249.9  (2.5% of mean)
    ExHashRing             min: 9526  max: 10513  stddev: 338.5  (3.38% of mean)

  100 nodes, 100000 keys (ideal 1000 per node):
    phash2 (HRW)           min: 920   max: 1075   stddev: 29.7   (2.97% of mean)
    murmur3 x86_32 (HRW)   min: 934   max: 1059   stddev: 27.0   (2.7% of mean)
    murmur3 x64_128 (HRW)  min: 902   max: 1072   stddev: 29.2   (2.92% of mean)
    HRW.Skeleton           min: 877   max: 1124   stddev: 46.6   (4.66% of mean)
    ExHashRing             min: 105   max: 1229   stddev: 279.7  (27.97% of mean)

  1000 nodes, 100000 keys (ideal 100 per node):
    phash2 (HRW)           min: 69    max: 132    stddev: 9.9    (9.91% of mean)
    murmur3 x86_32 (HRW)   min: 72    max: 132    stddev: 9.6    (9.65% of mean)
    murmur3 x64_128 (HRW)  min: 67    max: 144    stddev: 9.8    (9.79% of mean)
    HRW.Skeleton           min: 72    max: 141    stddev: 9.9    (9.85% of mean)
    ExHashRing             min: 0     max: 147    stddev: 31.4   (31.42% of mean)

As you can see, we’re doing just fine with :erlang.phash2. Murmur3 is maybe slightly better at smaller node counts, but that’s not the big takeaway from here. It’s that ExHashRing is really struggling at larger node counts on the default settings. The solution is to add more vnodes, but that was unexpected to me!

Announcing HRW, the library

You’re very welcome to try out the hrw library on hex.pm, or why not take a look at the Github repository at https://github.com/joladev/hrw. For very large number of nodes, you’ll want to use ExHashRing or HRW.Skeleton, for anything else, why not stick with plain HRW.owner?

The library comes with additional strategies not described here, like HRW.Weighted which lets you assign more key space to specific nodes, useful for heterogenous clusters where some machines are bigger, and HRW.Bounded, which gives you greater precision in how keys are distributed when you know the keys up front.

Let me know how you find it.

Make Friends With Your AI Assistant

Mon, 18 May 2026 10:10:08 +0000

Typography

AI assistants do apparently love clean markers. I use ⓪ –⑳ markers for numbering, ▸/▹ markers for itemizing, typographically correct quotes (“”/‘’,) and m-dashes. Compose key makes entering that stuff easier, alternative layout makes it a charm. No tips for poor MacOS/Win users, sorry.

Workflows

I have system-wide workflows for my assistant. They all look like “START_WORD → WORKFLOW_RULES → STOP_WORD.”

Here is an example:

Let’s define the workflow you must follow every time we work on a new task, named NEW_TASK. The rules are following:

① The statement “New task !" typed by me starts new NEW_TASK workflow
② Upon starting new workflow you must pull the main branch from github and create a new branch named ""
③ At this point you have to wait for me to enter the problem statement
④ You have to analyze the problem statement and immediately ask all the questions regarding it where you might have had any uncertancy and/or lack of clarity
⑤ After everythig is clarified, you must come up with the plan of implementation and ask for its amendment and/or approval
⑥ Upon approval, you should start implementing it, asking all the questions you might have appeared
⑦ Once done, you must create a comprehensive tests for the new functionality, documentation in both the source code as standalone (markdown), changelog entry, and a brief what-has-been-done note in markdown
⑧ After I accept everything by saying "Task done!", you should commit everything with a descriptive comprehensive commit message, push it to remote, and create a pull request in github.

The key insight here is the stop-words. Without them the assistant will cheerfully barrel through all eight steps in one breath, commit untested code to main, and open a pull request titled “feat: everything.” Stop-words are the leash. Love the leash (and the muzzle, your assistant is ill-mannered and disobedient.)

Context Is King (And Your Assistant Has Amnesia)

Every new session your assistant is a freshly hatched duckling. It will imprint on whatever you show it first. Feed it your codebase conventions, your style preferences, your testing philosophy—and do it persistently, through rules and system prompts, not through hopeful repetition in every chat.

The difference between a rule that says “always use :json instead of :jason” and manually correcting the same dependency in every generated mix.exs is the difference between engineering and penance.

Persistent rules are not optional. They are the civilizational layer between you and the primordial chaos of a context window that has never heard of your project. Each workflow (see above) might have different persistent rules attached.

Atomic Tasks, or: Stop Writing Novels

The temptation to say “implement the entire authentication system” is roughly as productive as asking an intern to “make the app work.” Break it down. Then break it down again. If your task description exceeds one screenful, you have written a novel, not a task. If within the task there are forks in the road possible, the assistant will inevitable choose the wrong path.

The assistant will cheerfully attempt your novel and wild guess the direction on each fork. It will produce something that compiles, passes zero tests, and makes you question your career choices. Whereas “add the verify_token/2 function that pattern-matches on {:ok, claims} and {:error, reason}”—that it can do in its sleep. If models slept.

The rule of thumb: if there is more than one correct architectural path, the task is too large. Decide the architecture yourself; delegate the typing.

Feed It Good Code, Get Good Code

This is not mysticism, it is pattern matching. If you feed your assistant examples of clean, idiomatic code, it will produce clean, idiomatic code. If you feed it your legacy spaghetti written at 3 AM during a production outage, you will get more spaghetti—but with AI-generated parmesan on top.

I keep a curated set of “golden” modules that I attached as context ages ago. The assistant does not need to be told “you are a senior Elixir architect with 300 years of experience.” It needs to see what good Elixir looks like. Show, don’t pep-talk.

Trust, But Verify (Especially the Tests)

The cruelest irony of AI-assisted development: the assistant generates tests that pass. “Green across the board!” you exclaim, and deploy. Three hours later you discover the tests were testing that the function returns something, not that it returns the right something.

Read the generated tests more carefully than the implementation. The implementation at least looks suspicious when it is wrong. A bad test looks exactly like a good test, except it asserts nothing useful. Pattern-match on the actual values; assert result is not a test, it is a prayer.

The Art of Saying “No, Try Again”

Your assistant is not your therapist. It does not need encouragement. “This is wrong” followed by silence teaches it nothing. “This is wrong because the recursive clause does not handle the empty-list base case, see List.foldl/3 for the pattern I expect”—now you are getting somewhere. Keep your paw right above Ctrl+C combination and press it every time you see the assistant’s thinking process drives bonkers.

You always liked interrupting people in person, feel free to interrupt your deputy.

Know When to Kill the Session

If you have been going back and forth for more than three iterations on the same function, close the session. Start fresh. The context window is polluted with failed attempts, corrections, and mutual disappointment—a couples therapy transcript, essentially.

A new session with a clean description of what you actually want will outperform the seventh round of “no, I meant the other thing” every single time. Sunk cost fallacy applies to token budgets too.

Please aware, that the necessity to close a session vividly implies your original prompt sucked in the first place. Rewrite it from scratch. Make sure it has no freedom for the assistant to choose paths. They are good at walking the straight roads.

The Checklist

Distilled to a grocery list for the impatient:

▸ Use persistent rules, not per-session rituals
▸ Define workflows with explicit start/stop words
▸ One architectural decision per task, maximum
▸ Attach good code as examples, skip the motivational preamble
▸ Run the full validation pipeline (format, credo, dialyzer, test) after every step, not at the end
▸ Read generated tests as if they were written by someone who wants to trick you (they were not, but the effect is the same)
▸ Three failed iterations = kill the session, rethink the task
▸ Never let the assistant commit without your explicit say-so
▸ If you could not solve the problem yourself, the assistant cannot solve it for you—it can only type faster

The last point deserves a tattoo. An AI assistant is a force multiplier, not a force. Multiply zero by anything you like.

Happy prompting.

Mocks Are Your Friends, Not Your Servants

Sun, 17 May 2026 18:03:59 +0000

There is a peculiar tradition in our industry: the moment someone says “mock,” half the room recoils as if you have just proposed replacing the database with a spreadsheet. The other half nods enthusiastically, having already replaced the database with a spreadsheet.

Both camps are wrong, and for the same reason: they think of “mock” as a verb.

Mock Is a Noun

José Valim wrote a splendid piece on the subject some years ago, and yet the industry continues to collectively ignore its central thesis with the dedication of a cat ignoring an expensive toy. The key insight is disarmingly simple: consider “mock” to be a noun, never a verb. You do not “mock a module.” You create a mock—an entity, a collaborator, a contract participant—that implements a well-defined behaviour.

The difference is not cosmetic. When you “mock” a function (verb), you are lying to your test. You are saying: “pretend this hole in the wall is a door.” When you create a mock (noun), you are building an actual door—one that opens and closes according to the same interface contract as the production door, just perhaps into a smaller room.

Stubs Are for Candles, Not for Software

The prevailing use of mocks in the wild amounts to: “I need this function to return 42 so my other function does not crash.” This is not testing. This is bribery. You have paid off a witness to give the testimony you wanted, and now you are surprised when the real trial goes sideways.

A mock that merely returns canned responses is a stub wearing a fake moustache. It tells you nothing about whether your code actually interoperates with the expected behaviour. It only tells you that, given a universe where everything behaves exactly as you imagined, your code does not crash. Congratulations—that was never the hard part.

The hard part is the contract. Does your code send the right messages? Does it handle the responses defined by the behaviour? Does it respect the protocol, not just survive it? A proper mock enforces these questions. A stub sweeps them under the carpet and charges you for the cleaning.

Promote Your Mocks to Lead Actors

Here is the heresy: mocks should not sit quietly in the corner of your test, responding when spoken to like well-trained waitstaff. They should be protagonists. They should drive the testing narrative.

Consider a finite state machine. It transitions through states, fires callbacks, notifies listeners. In production, the listener might persist data, send emails, ring bells. In tests, you do not care about the bells. You care about: did the FSM reach state X with payload Y after event Z?

A mock-as-protagonist answers this question directly. It is not merely absorbing calls—it is reporting back to the test, asserting that the system did what it promised. The mock becomes your embedded journalist, filing reports from inside the process under test.

This is a fundamentally different posture. The mock is no longer a shift worker you hire to stand in for the real employee. It is a first-class participant in the test, with its own responsibilities, its own assertions, its own voice.

OTP, Race Conditions, and the Debugger You Deserve

Anyone who has tested concurrent OTP systems knows the special joy of a test that passes ninety-nine times and fails on the hundredth, always at random time, always on CI, never on your machine. The core problem is structural: you fire an asynchronous message and then try to assert about a state that may or may not have been reached yet. Process.sleep(200) is not a solution—it is a prayer in milliseconds.

Mocks offer something better. When a mock is registered as a listener, every state transition sends a message back to the test process. You can then use assert_receive to wait for the transition, with a timeout, deterministically. No sleep. No prayer. No coin flip.

In effect, the mock becomes a breakpoint in a debugger you never had to open. It declares: “I expect to be called with these arguments, in this order,” and it sends a message to the test process confirming each call. You get both the expectation (expect/3) and the synchronization (assert_receive/2) in one mechanism. The mock is simultaneously the probe and the signal.

This is not theoretical. In OTP, where processes communicate via message passing and the scheduler is free to interleave execution in whatever order amuses it most, this pattern transforms flaky tests into deterministic ones. You are no longer guessing when the process reached the desired state. You are being told.

Listeners, Visibility, and the Joy of Seeing What Happens

There is a secondary benefit that deserves its own paragraph: visibility.

When you plug a mock in as a listener—not as a replacement for an implementation, but as an observer of one—the test code becomes radically more readable. Each assertion is a statement about what the system did, not about what you had to simulate. The reader does not need to mentally reconstruct the production flow from a pile of stubs. The flow is right there, reported by the mock, step by step.

Test code that reads like a narrative of actual system behaviour is test code that people trust. And test code that people trust is test code that people maintain. And test code that people maintain is test code that catches bugs. The chain of causation is longer than a Dickensian sentence, but every link holds.

Finitomata: A Case Study in Mock-Driven Testing

All of this is not armchair philosophy. The Finitomata.ExUnit module is a working testing framework built entirely on this premise. It uses Mox not as a crutch but as the foundation.

The setup declares a mock listener. The mock is allow-ed to the FSM process and given expectations for after_transition/3 callbacks. Each expectation sends {:on_transition, id, state, payload} back to the test process. The test then walks the FSM through its transitions and asserts each state deterministically via assert_receive.

The result looks like this:

test_path "respectful passenger", %{passengers: initial_passengers} do
  :coin_in ->
    assert_state :opened do
      assert_payload do
        data.passengers ~> ^initial_passengers
      end
    end

  :walk_in ->
    assert_state :closed do
      assert_payload do
        data.passengers ~> one_more when one_more == 1 + data.passengers
      end
    end

  :switch_off ->
    assert_state :switched_off
    assert_state :*
end

No Process.sleep. No polling. No race conditions. Each transition is confirmed by the mock-listener before the test proceeds. And crucially: intermediate states—including those triggered by automatic (bang!) transitions that the test never explicitly fires—are observable and assertable, because the mock reports every single one.

The mock here is not a stand-in. It is the entire testing infrastructure. Remove it, and you are back to Process.sleep(200) and crossed fingers.

The Moral

Mocks have suffered from a branding problem. They were introduced to the mainstream as “a way to avoid calling the real thing,” which is roughly equivalent to introducing a violin as “a way to avoid silence.” Technically correct, but missing the point so thoroughly that it constitutes its own genre of wrongness.

A mock is a collaborator. A contract enforcer. A synchronisation primitive. A visibility layer. A debugger breakpoint. A protagonist.

Treat it accordingly, and your tests will repay you with the one thing no amount of Process.sleep can buy: confidence.

Running local models on an M4 with 24GB memory | jola.dev

Tue, 12 May 2026 02:29:47 +0000

I’ve been experimenting with running local models on and off for a bit and I’ve finally found a setup that seems to work reasonably. It’s nothing like the output of a SOTA model, but the excitement of being able to have a local model do basic tasks, research, and planning, more than makes up for it! No internet connection required! Not to mention that it’s a way of reducing your dependence on big US tech, even if just a tiny bit.

I gotta say though, it’s not easy to get this stuff set up. First you have to choose how you’re running the model: Ollama, llama.cpp or LM Studio. Each one comes with its own quirks and limitations, and they don’t offer all the same models. Then of course, you have to pick your model. You want the best model available that fits in memory and still gives you enough headroom to run your regular assortment of Electron apps, not to mention something where you can have at least a 64K context window, but ideally 128K or more. Most recently I’ve tried Qwen 3.6 Q3, GPT-OSS 20B, Devstral Small 24B, which all technically fit in memory but were in practice unusable, and Gemma 4B that would run fine but really struggle with tool use.

Then there’s a plethora of configuration options to tweak. From the more well-known, like temperature, to more esoteric options like K Cache Quantization Type. Many of these tools come with a basic recommended set of options, but the appropriate ones can depend on things like whether you’re enabling thinking or not!

Qwen 3.5-9B (4b quant)

qwen3.5-9b@q4_k_s (HuggingFace link) is the best model I’ve gotten working with a reasonable ~40 tokens per second, thinking enabled, successful tool use, and a 128K context window, running on LM Studio. Compared to a SOTA model, it gets distracted more easily, sometimes it gets stuck in loops, it’ll misinterpret asks etc. But it’s surprisingly good for something that can run on a 24GB Macbook Pro while leaving space for lots of other things running too!

These are the recommended settings for thinking mode and coding work:

Thinking mode for precise coding tasks (e.g., WebDev):

temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

To enable thinking I also had to select the model, go to configuration, scroll to the bottom of the Inference tab, and add {%- set enable_thinking = true %} to the Prompt Template.

I’ve been using it through both pi and OpenCode. I still haven’t quite made my mind up on with one I prefer. Pi feels a bit snappier, but although I really appreciate the idea of the harness building itself and all that customization, I can’t help but wish it came with some sensible defaults. I feel like you could easily end up spending more time tweaking your pi set up to be just right, than you do on your actual projects!

Pi setup

Here’s the ~/.pi/agent/models.json:

OpenCode setup

~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio (local)",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1"
      },
      "models": {
        "qwen3.5-9b@q4_k_s": {
          "name": "Qwen 3.5 9B Q4_K_S",
          "tools": true,
          "context_length": 131072,
          "max_tokens": 32768
        }
      }
    }
  },
  "model": "lmstudio/qwen3.5-9b@q4_k_s"
}

Comparison with SOTA models

A model like Qwen 3.5 9B (Q4) is nowhere near being able to independently solve complex problems over longer period of time, like SOTA models are. There’s no point in asking it to build an entire app in one go, you’ll just burn your thighs without getting anything out of it.

What does work is a more interactive workflow where you’re clearly communicating with the model step by step, and giving it a lot of guidance. I’m sure that sounds pointless to many of you, why use a model where you have to babysit it as it works, but I actually found that it encouraged me to be more engaged. The downside to working with SOTA models is that they make it too easy to offload all cognitive effort, even where you’re trying to actively prevent that from happening.

With the local modal I have to take on a lot more of the thinking and planning, I have to be a lot more specific, but it will still act as a research assistant, a rubber duck, and a savant with instant recall of a lot of programming language details and command line evocations. It’s not the 10x productivity boost that the big AI companies market, but it’s something, and it’s interesting.

To give you an idea of some things that work and don’t, here are some very basic examples.

Example 1:

I wanted to bump the Elixir linter credo to the latest version and I got warnings out of my code. As an experiment, I asked Qwen to take a look.

me:
  run mix credo --strict and suggest how to solve it, don't edit

Qwen:
    Credo Warnings
    There are 4 warnings in your test files. The issue is using length/1 to check if lists are non-empty. Credo recommends comparing against an empty list instead.
    Warnings:
    1. test/jola_dev_web/controllers/blog_controller_test.exs:11
       - assert length(conn.assigns.posts) > 0
    2. test/jola_dev_web/controllers/blog_controller_test.exs:22
       - assert length(conn.assigns.posts) > 0
    3. test/jola_dev/blog_test.exs:10
       - assert length(posts) > 0
    4. test/jola_dev/blog_test.exs:23
       - assert length(tags) > 0
    How to Fix
    Replace each length(list) > 0 with list != []:
    blog_controller_test.exs:
    # Line 11
    assert conn.assigns.posts != []
    # Line 22
    assert conn.assigns.posts != []
    blog_test.exs:
    # Line 10
    assert posts != []
    # Line 23
    assert tags != []
    This is more idiomatic in Elixir and avoids the overhead of computing the list length.

I asked Qwen to make the edits and it did 4 parallel edits, very clean. Overall this was a very simple task that I could’ve done myself by tabbing back and forth between the terminal and my editor to get the line numbers to edit. Not a big difference in effort, but nice and convenient.

Example 2:

After some dependency bumps I had a dependabot PR with git conflicts that dependabot for some reason was refusing to rebase. I pulled it down, rebased, and asked Qwen to take a look. It was a very simple conflict to resolve, just take the newer version of each, and the model recognized that.

I can see the conflict. Here's what's happening:
HEAD (your current branch):
- sentry: 13.0.1
- tailwind: 0.3.1
Other side (commit being rebased):
- sentry: 12.0.3
- tailwind: 0.4.1
You have divergent dependency bumps. Here are your options:
Option 1: Take both updates (recommended)
- Keep sentry 13.0.1 from HEAD
- Keep tailwind 0.4.1 from the other side
- This gives you the latest of both deps
Option 2: Take only sentry update
- Keep sentry 13.0.1
- Keep tailwind 0.3.1
Option 3: Take only tailwind update
- Keep sentry 12.0.3
- Keep tailwind 0.4.1
Which approach do you want to take?

However, when asked to make the change, it forgot to make the edit and instead just tried to add the changes and continue the rebase, with the conflict markers still in place.

> Thinking: The user wants to take both updates - sentry 13.0.1 and tailwind 0.4.1. I need to resolve the conflict by keeping the HEAD version for sentry and the other side version for tailwind. I'll use git add to stage the resolved file.
----
# Stage resolved mix.lock and continue rebase
$ git add mix.lock && git rebase --continue
Vim: Warning: Output is not to a terminal
Vim: Warning: Input is not from a terminal
"~/workspace/mixtli/.git/COMMIT_EDITMSG" 32L, 1047BBump tailwind from 0.3.1 to 0.4.1
Bumps [tailwind](https://github.com/phoenixframework/tailwind) from 0.3.1 to 0.44.1.
- [Changelog](https://github.com/phoenixframework/tailwind/blob/main/CHANGELOG.mmd)
- [Commits](https://github.com/phoenixframework/tailwind/compare/v0.3.1...v0.4.11)

Additionally it failed to recognize that git rebase --continue opens an editor, and OpenCode then proceeded to hang, although that might just have been a one-off.

Conclusion

Anyway, LLMs are incredibly flexible and there’s a million things even a local model could do. Your imagination is the limit. Local models have serious tradeoffs, but they come with some pretty attractive benefits:

No internet connection required, you can work on the plane!
The cost is limited to the electricity you’re using, assuming you were gonna buy a computer anyway. No subscription required.
There’s still going to be a serious environmental cost of training these models, but the open model companies are nowhere near the top of the list in environmental impact, and using your own hardware means less data centers.
It’s fun to tinker.

LLMs have had a huge impact on our world, and much of it not great, but it’s obvious that they’re here to stay. Experimenting with local models feels like a more sustainable and positive way to interact with this technology. And honestly, it’s a lot of fun, even when it does the wrong thing!

Elixir versus Python for Data Science - DockYard

Fri, 08 May 2026 19:20:08 +0000

Introduction

Over the last year, thanks to the efforts of the amazing Elixir community, the Elixir machine learning ecosystem has grown at an impressive rate, with more and more libraries filling more and more gaps in the Elixir data science and machine learning ecosystem.

A common argument against using Nx for a new machine learning project is its perceived lack of a library/support for some common task that is available in Python. In this post, I'll do my best to highlight areas where this is not the case, and compare and contrast Elixir projects with their Python equivalents. Additionally, I'll discuss areas where the Elixir ecosystem still comes up short, and using Nx for a new project might not be the best idea.

Numerical Computing

The obvious place to start this post is to compare Nx to its Python equivalents. At its core, Nx is intended to serve as a NumPy equivalent with support for automatic differentiation and acceleration via GPUs. In this respect, its main inspiration is JAX—a Python library that supports automatic differentiation and JIT compilation to accelerators via composable function transformations.

The Nx API is (intentionally) considerably smaller than the NumPy API. Because Nx relies on JIT compilation, the API builds around a smaller amount of powerful primitive operations which can be used to build out more complex functions.

NumPy does not have the same luxury, instead needing to rely on specialized implementations for most of the functions in its API. JAX also builds on a set of core primitive operations; however, they intentionally provide wrappers around the NumPy API due to its ubiquity in numerical computing community.

There are pros and cons to having a smaller API. From a learning perspective, a beginner can reasonably pick up and understand 90% of the functions in the Nx API rather quickly.

Unfortunately, this trade-off means Nx implementations can at times be more verbose than their NumPy counterparts. Due to the sheer size of the API, there are often times when a few lines of NumPy translate to considerably more lines in Nx. Additionally, the Nx API falls short of the NumPy API in some areas. For example, Nx PRNG support is not as feature complete as JAX/NumPy, the Nx Linear Algebra module is not as in-depth as NumPy, and Nx does not have support for string data types.

The API shortcomings of Nx are mostly active areas of work. Even with these shortcomings, I've found I can be just as productive writing Nx as I can writing NumPy.

From a performance perspective, if you're using the EXLA compiler, Nx will have (mostly) equivalent performance to JAX. Nx/EXLA relies on the same JIT compiler as JAX in XLA. That means that essentially all of the areas that JAX beats NumPy, Nx will also beat NumPy; and in all of the areas that NumPy beats JAX, NumPy will also beat Nx.

One advantage that Nx has over JAX is its first-class support for pluggable compilers and backends. While the JAX project seems to be moving in the direction of supporting multiple pluggable compilers/runtimes, Nx was built with this flexibility in mind, and thus is positioned for rapid integration with any existing/future tensor compilers and backends.

JAX is ahead in terms of parallelization; however, there are plans to integrate parallel primitives into the Nx API on the roadmap. Given that Nx can build on the same parallelism/sharding implementations as JAX in XLA, Nx can catch up to JAX relatively quickly in this respect.

Deep Learning

One of the initial ambitions for the Nx project was to support creating and training deep-learning-type models in Elixir. This is now possible with the Axon library.

Axon is built directly on top of Nx and thus can take advantage of all of the things Nx offers, including JIT compilation and automatic differentiation. Axon most directly compares to tools like PyTorch and TensorFlow/Keras in the Python ecosystem.

From an API perspective, Axon is somewhat even with both PyTorch and TensorFlow/Keras. Aside from Attention/Transformer layers (which are on the roadmap), Axon has an essentially identical offering of model building blocks. Additionally, with its custom layer API, creating and using new layers is as easy as defining an Nx implementation of the layer. More or less any model you can create and perform inference in PyTorch/TensorFlow, you can also create and perform inference in with Axon.

Axon also has a robust training API inspired by libraries in the Python ecosystem such as PyTorch Ignite and PyTorch Lightning. The training API supports out-of-the-box callbacks such as model checkpoints, early stopping, and model validation, as well as an API for integrating custom callbacks. Similar to Keras, the Axon training API offers increasing levels of flexibility at increasing levels of complexity. In other words, you can sacrifice simplicity to have more control over training your models.

One area of concern when migrating to Elixir is the ability to make use of pre-trained models. Thanks to AxonOnnx this is possible for (almost) any model you might have.

If you're able to export an ONNX version of your model (e.g using torch.onnx or tf2onnx), you can probably import your model with Axon. AxonOnnx has even been tested to work with pre-trained transformers from the popular transformers library.

While Axon supports importing pre-trained models, there are still some aspects of working with pre-trained models that need ironing out. For example, fine-tuning, while possible, does not yet have a first-class API in Axon. Additionally, features such as mixed-precision and multi-device training that make training large models possible are not 100% supported in Axon yet.

Traditional Machine Learning

Along with deep learning, gradient boosting and decision tree algorithms are perhaps the most popular machine learning algorithms in use. These classes of algorithms typically outperform deep learning with tabular and time-series data, and are often significantly less expensive to train and deploy.

Unfortunately, this is an area still under active development in the Elixir ecosystem. Python has popular libraries such as XGBoost, but there is still no Elixir equivalent. I expect this to change over the next six months; however, for the time being, Elixir is behind in this area.

Elixir also falls short of Python in other traditional machine learning applications. While Python has the excellent scikit-learn, Elixir has the relatively new Scholar library. Because Scholar is new, it's lacking in features that allow it to be a competitive alternative to sklearn. This is another area of active development on the Nx roadmap, and thus I expect things to look significantly different here in the next six months.

Data Analysis

Essentially any data scientist that has worked with Python for any amount of time is familiar with the pandas library for data analysis. Pandas is a library for working with structured, columnar data. It's popular as a library for any sort of analysis or munging tasks you might need to perform. The Elixir equivalent to Pandas is Explorer. Explorer is built on top of the polars library which implements DataFrames in Rust.

From an API perspective, the Explorer API is different from what you might be used to using with Python. Given that Elixir is a functional language, the Explorer library builds on immutable abstractions, which can feel quite different for somebody migrating from Python and pandas' mutability. Explorer, like Nx, is notably more succinct than its Python counterpart. Despite this, there isn't much you can't do in Explorer that you can do in Pandas.

From a performance perspective, Explorer benefits from the speed of Polars. There are a number of articles that laud Polars as the fastest DataFrame library. Given that Explorer builds on that performance, you might see significant performance improvements migrating from Pandas to Explorer.

Data Presentation/Visualization

In data science, presentations and visualizations are where the money is made. Having a good tool for presenting and visualizing data is a must for any language looking to position itself in the data science sphere. Python has a number of excellent visualization libraries such as Plotly Express and matplotlib. The Elixir equivalent is its VegaLite library which provides bindings around the VegaLite graphics library.

Functionally, you can get essentially equivalent visualizations from both Elixir and Python. The VegaLite API might feel unfamiliar to users coming from Plotly Express and matplotlib; however, the abstractions are incredibly powerful and allow for composing and creating evermore complex graphics with code.

For the most part, I've found it possible to perform equivalent visualizations in both Elixir and Python; however, Python seems to have an edge in network visualizations and geographic visualizations. Elixir has no equivalents to Python's NetworkX and the Folium library. I suspect with companies like Felt using Elixir in the map-making space that we might see geographic visualizations in Elixir improve (fingers crossed).

The Python ecosystem also has a number of libraries concerned with Dashboard creation. Tools such as Dash allow for the creation of interactive demos with a few lines of code. There are no direct equivalents in Elixir just yet; however, the direction of Livebook is promising for the prospects of interactive and shareable demos in Elixir.

Pipelines / Orchestration

Whether it be training large models or creating production-ready data ingest/management solutions, the task of data orchestration and pipeline is an important one for data science/machine learning. There are a large number of Python libraries built specifically to create and orechestrate data pipelines. In Elixir, there are a few; however, this is an area I would personally argue Elixir has a strong edge over Python. Given the Elixir is built on the BEAM, which is designed for concurrency, the task of Concurrent Data Processing in Elixir is a natural extension of the language. Python is just not designed with concurrency in mind. From simple language-level abstractions such as Task to library-level abstractions such as Flow and Broadway, creating scalable input processing pipelines is incredibly easy with Elixir by default.

That's not to say that Python doesn't have some nice libraries for achieving the same results. Both PyTorch and TensorFlow offer nice data loading abstractions in tf.data and DataLoader. Additionally, there are a number of libraries designed for building and orchestrating data pipelines at scale (e.g. Prefect and AirFlow). The biggest advantage Elixir has in this space is that it is concurrent and fault-tolerant by design. I don't think Python can ever beat Elixir in this regard.

Domain-Specific Libraries

There are a number of "domain-specific" libraries that don't neatly fall into any of the categories I've written so far, but which are worth a brief mention in this article—namely computer vision, natural language processing, and signal processing.

There are a number of computer vision-related libraries in the Python ecosystem that generally streamline the task of working with images. This includes Pillow and OpenCV among others. In the Elixir ecosystem, there is Evision, which provides bindings to OpenCV implementations. From both an API and performance perspective, this means working with images in Evision will be somewhat similar to working with images in Python's OpenCV bindings.

For NLP tasks, Elixir does not have a library that is equivalent to Python's spacy or NLTK. However, it does offer bindings to Huggingface's tokenizers, and the ability to import a significant number of HuggingFace models for performing NLP tasks with neural networks.

The Elixir ecosystem still falls behind Python in the area of signal processing; however, with recent work to add FFT support to Nx, and other dedicated efforts, I expect this area to improve in the near future.

Conclusion

This post was meant to serve as a high-level comparison of the Elixir machine learning and data science ecosystem with the Python ecosystem. While there are still many gaps in the Elixir ecosystem, the progress over the last year has been rapid. Almost every library I've mentioned in this post is less than two years old, and I suspect there will be many more projects to fill some of the gaps I've mentioned in the coming months.

If you're interested in helping in any of our areas of active development, join us at the EEF ML Working Group to drive machine learning on the BEAM forward. Until next time!

Passkey Sign In with Elixir and Phoenix

Fri, 08 May 2026 09:29:25 +0000

Passkeys are a replacement for passwords that use public/private cryptographic key pairs for login in a way that can be more user-friendly and resistant to a number of kinds of attacks. Let’s implement account registration and login with passkeys in a simple Phoenix web app.

This work is heavily based on this article by Adam Langley, which provides a great deal of information about implementing passkeys. My goal here is to fill in some more of the details, and provide some Elixir-specific information. As with Adam’s article, I’m not going to use any WebAuthn libraries (even though that may be advisable from a security/maintenance perspective) since I think it’s interesting and helpful to understand how things actually work.

Providing an exhaustive, production-ready implementation is a non-goal of this post. I’m going to make some slightly odd decisions for pedagogical reasons, and leave some things incomplete. That said, I’ll try to note when I’m doing so.

To start, I’m using the default Phoenix template app (less Tailwind) in which I’ve also generated the default, controller-based authentication system. Some parts of the password-specific stuff have been stripped out altogether, others will get changed to fit with the passkey authentication setup we’re going to build.

Database schema §

The first thing we’ll need is a backend schema for passkeys. The only data we need to store for a particular passkey is a unique identifier, and the public key itself. We also want users to be able to have multiple passkeys associated with their account (since they may want to login from multiple platforms that don’t cross-sync passkeys), so the users schema will have a one-to-many relationship with the passkeys.

The actual model I’ll call UserCredential, since that’s closer to what the WebAuthn spec calls them. Here’s the schema, it’s pretty simple:

defmodule PhoenixPasskeys.Accounts.UserCredential do
  use Ecto.Schema
  import Ecto.Changeset

  @primary_key {:id, :binary, []}

  schema "users_credentials" do
    # DER-encoded Subject Public Key Info: https://datatracker.ietf.org/doc/html/rfc5280#section-4.1.2.7
    field :public_key_spki, :binary
    belongs_to :user, PhoenixPasskeys.Accounts.User
    timestamps()
  end

  def changeset(credential, attrs) do
    credential
    |> cast(attrs, [:id, :public_key_spki, :user_id])
  end
end

There are a couple things to note here:

First, we’re explicitly specifying that the primary key is a binary, since that’s what the WebAuthn API provides as the credential ID.
The public_key_spki field contains the data for the public key itself and the algorithm being used. We don’t care about the specific format, though, since we don’t have to parse it ourselves.

To the generated User schema, I also added the has_many side of the relationship. There’s also a migration to create the credentials table. I won’t show it here, since it’s exactly what you’d expect—just make sure the id column is a binary.

Registration JavaScript §

WebAuthn, being a modern web API, is a JavaScript API. This is a distinct disadvantage, if your users expect to be able to login from JavaScript-less browsers. Nonetheless, we proceed with the JS. Here’s the first bit:

document.addEventListener("DOMContentLoaded", () => {
    const registrationForm = document.getElementById("registration-form");
    if (registrationForm) {
        registrationForm.addEventListener("submit", (event) => {
            event.preventDefault();
            registerWebAuthnAccount(registrationForm);
        });
    }
});

If we find the registration <form> element (note that I’ve added an ID (and also removed the password field)), we add a submit listener to it which prevents the form submission and instead calls the registerWebAuthnAccount function we’ll create next.

Before we get there, we’ll also write a brief helper function to check whether passkeys are actually available:

async function supportsPasskeys() {
    if (!window.PublicKeyCredential || !PublicKeyCredential.isConditionalMediationAvailable) {
		return false;
	}
	const [conditional, userVerifiying] = await Promise.all([
		PublicKeyCredential.isConditionalMediationAvailable(),
		PublicKeyCredential.isUserVerifyingPlatformAuthenticatorAvailable(),
	]);
	return conditional && userVerifiying;
}

The conditional mediation check establishes that WebAuthn conditional UI is available. This is what we’ll use for login, and is part of what makes using passkeys a nice, slick experience. Conditional UI will let us start a request for a passkey that doesn’t present anything until the user interacts with the browser’s passkey autofill UI.

The user-verifying platform authenticator check establishes that, well, there is a user-verifying platform authenticator. The platform authenticator part means an authenticator that’s part of the user’s device, not removable, and the user-verifying part means that the authenticator verifies the presence of the user (such as via biometrics).

In the registerWebAuthnAccount function, the first thing we need to do is check that both these conditions are met and passkeys are supported by the browser. If not, we’ll just bail out and registration won’t be possible.

async function registerWebAuthnAccount(form) {
    if (!(await supportsPasskeys())) {
        return;
    }
}

Next, we’ll setup a big ol’ options object that we’ll pass to the WebAuthn API to specify what sort of credential we want. There’s going to be a whole lot of stuff here, so lets take it one piece at a time.

The RP is the Relying Party—that is, us, the party which relies on the credentials for authentication. The ID is the domain of the RP—just localhost for this example, though in reality you’d need to use your actual domain in production. The name is just a user-facing name for the RP.

Next up is some info about the user who the credential is for. The user’s “name” will just be the email that they entered in the form. The displayName value is required, per the spec, but we don’t have any other information so we just leave it blank and let the browser display the name only. The ID is where this gets a little weird, since we’re just generating a random 64-byte value:

function generateUserID() {
	const userID = new Uint8Array(64);
	crypto.getRandomValues(userID);
	return userID;
}

If you were adding a passkey to an existing user account, you could use a server-specified value for the user ID and the browser would replace any existing credentials with the same user ID and RP ID. But, since the user is registering a new account at this point, we assume they don’t want to do that (and, moreover, we don’t have any ID we can use yet). The user ID will also be returned upon a successful login, which would let us look up the user that’s logging in. However, since we’re only going to allow a credential to belong to a single user, we can use the credential’s ID to uniquely determine the user.

Next up, we specify the types of public keys that we’ll accept. We’re going to support Ed25519, ES256, and RS256, since those are recommended by the WebAuthn spec:

During the login process, the server will generate a challenge value that the client will sign and the server will verify was signed with the user’s private key. The spec also requires that we provide a challenge when creating a credential, but we’re not going to use it for anything (since the user is just now creating the credential, we have nothing trusted that we can verify the initial challenge against), so we just provide an empty value:

Lastly, we give our requirements for authenticators that we want to use:

We want only the platform authenticator, not anything else, like removable security keys. We also specify that we want a resident key. This usage of “resident” is deprecated terminology that’s enshrined in the API. Really it means we want a discoverable credential, so that the authenticator will surface it during the login process without us having to request the credential by ID. This is important to note, since it’s what prevents needing a separate username entry step.

Now that we (finally) have all the configuration options in place, we can actually proceed with the credential creation. We pass the options to navigator.credentials.create to actually create the WebAuthn credential. If that fails, we’ll just take the easy way out and alert to inform the user (in an actual service, you’d probably want better error handling).

From the credential we get back, we need a few pieces of information. First is the decoded client data, which is an object that contains information about the credential creation request that occurred.

The clientDataJSON field of the response object contains the JSON-serialized object in an ArrayBuffer, so we decode that to text and then parse the JSON. With the decoded object, we do a consistency check with a couple pieces of data: the type of the request, whether or not it was cross-origin, and the actual origin being used.

If it was not a creation request, the request was cross-origin, or the origin doesn’t match, we bail out. Note that in production, the origin should be checked against the actual production origin, not localhost. And again, in reality you’d want better error handling than just an alert.

Next, we need to get the authenticator data which is encoded in a binary format and pull a few pieces of data out of it. You can see the full format of the authenticator data in the spec, but the parts we’re interested in are the backed-up state and the credential ID, which is part of the attested credential data.

async function registerWebAuthnAccount(form) {
    // ...
    const authenticatorData = new Uint8Array(credential.response.getAuthenticatorData());
    const backedUp = (authenticatorData[32] >> 4) & 1;
    const idLength = (authenticatorData[53] << 8) | authenticatorData[54];
    const id = authenticatorData.slice(55, 55 + idLength);
}

We get the backed-up bit from the flags byte at offset 32. Then we get the length of the credential ID, which is encoded as a big-endian, 16-bit integer at bytes 53 and 54. The ID itself immediately follows the length, thus starting at byte 55.

Before proceeding, we check that the backed-up bit is set, indicating that the credential is backed-up and safe from the user losing the current device. If it’s not, we won’t let the user register with this passkey. I choose to do this since it’s recommended by Adam Langley’s blog post, but whether it’s actually necessary may depend on your specific circumstances.

The last piece of data we need out of the credential is the actual public key:

async function registerWebAuthnAccount(form) {
    // ...
    const publicKey = credential.response.getPublicKey();
}

And with all that in place, we can actually initiate the registration. We’ll assemble a form data payload with all of the requisite values:

async function registerWebAuthnAccount(form) {
    // ...
    const body = new FormData();
    body.append("_csrf_token", form._csrf_token.value);
    body.append("email", form["user[email]"].value);
    body.append("credential_id", arrayBufferToBase64(id));
    body.append("public_key_spki", arrayBufferToBase64(publicKey));
}

We’ll use the body in a POST request to the registration endpoint. The response we get back will contain a status value to indicate whether the request was successful or not. If it was successful, the backend will have set the session cookie, and we can redirect to the home page and the new user will be logged in. If registration failed, we’ll alert the user.

And with that, we’re done with JavaScript (for now) and can move on to the backend part of registering an account.

Creating a user §

In the UserRegistrationController module that comes with the Phoenix auth template, we’ll change the create function. By default, it registers the user using the parameters from the signup form and then redirects to the homepage. Instead, we’re going to register using the passkey we created on the client and then respond with the JSON that our JavaScript is expecting.

The first thing we need to do is extract the values that were sent by the frontend and decode the base 64-encoded ones.

defmodule PhoenixPasskeysWeb.UserRegistrationController do
  def create(conn, %{
        "email" => email,
        "credential_id" => credential_id,
        "public_key_spki" => public_key_spki
      }) do
    credential_id = Base.decode64!(credential_id)
    public_key_spki = Base.decode64!(public_key_spki)
  end
end

Those get passed to the Accounts.register_user function, which we’ll update shortly to handle. If account creation succeeded, we’ll still send the confirmation email as the existing code did. After that, instead of redirecting, we’ll log the user in by setting the session cookie and then respond with the “ok” status for the frontend. If account creation fails, we’ll just respond with the “error” status so the frontend can alert the user.

defmodule PhoenixPasskeysWeb.UserRegistrationController do
  def create(...) do
    # ...
    case Accounts.register_user(email, credential_id, public_key_spki) do
      {:ok, user} ->
        # Send confirmation email...

        conn
        |> UserAuth.log_in_user_without_redirect(user)
        |> json(%{status: :ok})

      {:error, _changeset} ->
        json(conn, %{status: :error})
    end
  end
end

Let’s update the register_user function. Instead of just creating a changeset for the user and then inserting it, we need to also create the authenticator. To avoid potentially leaving things in a broken state, we wrap both of these in a database transaction.

defmodule PhoenixPasskeys.Accounts do
  def register_user(email, credential_id, public_key_spki) do
    Repo.transaction(fn ->
      user =
        %User{}
        |> User.registration_changeset(%{email: email})
        |> Repo.insert()
        |> case do
          {:ok, user} -> user
          {:error, changeset} -> Repo.rollback(changeset)
        end
    end)
  end
end

First, we create a user with the given email. If the user creation fails, we abort and rollback the transaction. Then, we can create a credential belonging to the new user with the credential ID and public key we received from the client. As with the user, if creating the credential fails, we rollback the transaction.

defmodule PhoenixPasskeys.Accounts do
  def register_user(email, credential_id, public_key_spki) do
    Repo.transaction(fn ->
      # ...
      %UserCredential{}
      |> UserCredential.changeset(%{
        id: credential_id,
        public_key_spki: public_key_spki,
        user_id: user.id
      })
      |> Repo.insert()
      |> case do
        {:ok, _credential} -> nil
        {:error, changeset} -> Repo.rollback(changeset)
      end
    end)
  end
end

We don’t need to do anything with the newly created credential, so we can just ignore it once it’s been created.

And finally, from the transaction function, we return the user:

The last thing we need to to complete the registration flow is to set the new user’s session cookie so that they’re logged in immediately. The UserAuth module that’s generated as part of the Phoenix auth template has a log_in_user function that does exactly this. But it also redirects the connection to another endpoint. We don’t want to do that, since we’re sending a JSON response, so I’ve split the function into two: one that only sets the session, and the existing function that sets the session and then redirects.

defmodule PhoenixPasskeysWeb.UserAuth do
  def log_in_user(conn, user, params \\ %{}) do
    user_return_to = get_session(conn, :user_return_to)
    conn
    |> log_in_user_without_redirect(user, params)
    |> redirect(to: user_return_to || signed_in_path(conn))
  end

  def log_in_user_without_redirect(conn, user, params \\ %{}) do
    token = Accounts.generate_user_session_token(user)
    conn
    |> renew_session()
    |> put_token_in_session(token)
    |> maybe_write_remember_me_cookie(token, params)
  end
end

And with that, everything is in place and you can now create an account with a passkey!

Login form §

Now that the user’s got an account, they need to be able to login with it. That means once again interacting with the WebAuthn API and writing a bunch of JavaScript. But before we get there, we need some slight changes to the backend.

In the HTML for the login page, we’ll add an ID to the <form> element so that we can find it from JS. We’ll also remove the password field, which is obviously no longer necessary. Lastly, but certainly not least, we need to send a challenge to the client.

The challenge is a value that the user’s device will cryptographically sign with their private key. The result will get sent back to the server, and we’ll verify it against the public key we have stored thus authenticating them. We’ll send the challenge just in a hidden form field:

<input type="hidden" id="challenge" value={@webauthn_challenge} />

In the session controller, we’ll need to generate and assign the challenge to the connection.

WebAuthn expects the challenge to be a value up to 64 bytes long, so we’ll use the Erlang crypto module to generate one of that length. The value is encoded as URL-safe base 64 (the same as normal base 64, but with dash and underscore rather than plus and slash) without padding. We encode it this way since that’s the format in which it will later be returned as part of the clientDataJSON, so when we extract that value we can directly compare it to the challenge value we generated.

defmodule PhoenixPasskeysWeb.UserSessionController do
  defp put_webauthn_challenge(conn) do
    challenge =
      :crypto.strong_rand_bytes(64)
      |> Base.url_encode64(padding: false)

    conn
    |> put_session(:webauthn_challenge, challenge)
    |> assign(:webauthn_challenge, challenge)
  end
end

Note that the challenge string is also stored in the session, so that we can later check that the challenge that the client signed matches the challenge we generated. It’s safe to store this in the session, even though it’s sent to the client, because the cookie is encrypted and signed so the client can’t tamper with it.

Login JavaScript §

With the first part of the backend changes taken care of, it’s time for more JavaScript, baby!

We’ll follow a similar outline to the registration setup (and the same caveat applies about error handling).

document.addEventListener("DOMContentLoaded", () => {
    // ...
    const loginForm = document.getElementById("login-form");
	if (loginForm) {
        loginWebAuthnAccount(loginForm);
    }
});

async function loginWebAuthnAccount(loginForm) {
    if (!(await supportsPasskeys())) {
        return;
    }
}

The first thing we need to do is grab the challenge from the hidden form field, then we can construct the options object for getting the credential (which is, thankfully, much simpler than for creation).

In the options, we specify that we want conditional mediation. As noted before, this means that the browser won’t display any UI, except for autofill, for this credential request until the user accepts the autofill suggestion. In the public key options, we also give the decoded challenge value and specify the our Relying Party ID (again, this would need to be the actual domain in production).

Now, we can actually make the credential request and then, if we get a credential back, encode and send all the values to the backend. We need to send the ID of the credential, so that the backend can find its public key and the corresponding user. We also need the client data JSON, which we send as text decoded from the ArrayBuffer it’s returned as. We also need to send the authenticator data as well as the signature itself.

async function loginWebAuthnAccount(loginForm) {
    // ...
    const credential = await navigator.credentials.get(getOptions);
    if (!credential) {
        alert("Could not get credential");
        return;
    }

    const clientDataJSON = new TextDecoder().decode(credential.response.clientDataJSON);

    const body = new FormData();
    body.append("_csrf_token", loginForm._csrf_token.value);
    body.append("raw_id", arrayBufferToBase64(credential.rawId));
    body.append("client_data_json", clientDataJSON);
    body.append("authenticator_data", arrayBufferToBase64(credential.response.authenticatorData));
    body.append("signature", arrayBufferToBase64(credential.response.signature));
}

We can then send a request to the login endpoint. If the backend request fails (or if we failed to get the credential) we just alert the user (again, you’d probably want something better in reality). If the login attempt was successful, the server will have set the session cookie, and so we can just redirect to the homepage and the user will be logged in.

With that in place, let’s move on to the backend half of the login request.

Validating a login attempt §

As with signup, we’ll modify the existing log in endpoint to actually validate the WebAuthn login attempt.

The first step is extracting all of the information the frontend provides in the params and decoding it:

defmodule PhoenixPasskeysWeb.UserSessionController do
  def create(conn, params) do
    id = params |> Map.get("raw_id") |> Base.decode64!()
    authenticator_data = params |> Map.get("authenticator_data") |> Base.decode64!()
    client_data_json_str = params |> Map.get("client_data_json")
    signature = params |> Map.get("signature") |> Base.decode64!()
  end
end

Next, we’re going to validate all of the information we got from the client. Before we get to that, there are a handful of helper functions we’ll use. First, looking up a credential by its ID:

Next, a function in the controller that verifies the signature against the provided data:

defmodule PhoenixPasskeysWeb.UserSessionController do
  defp verify_signature(credential, client_data_json_str, authenticator_data, signature) do
    with {:ok, pubkey} <- X509.PublicKey.from_der(credential.public_key_spki),
         client_data_json_hash <- :crypto.hash(:sha256, client_data_json_str),
         signed_message <- authenticator_data <> client_data_json_hash,
         true <- :public_key.verify(signed_message, :sha256, signature, pubkey) do
      true
    else
      _ ->
        false
    end
  end
end

X509 comes from the x509 package, which is the only third-party piece of code we’re using. It’s a fairly thin wrapper around the Erlang public_key and crypto modules, and mostly serves to save me from having to deal with Erlang records in my code. Its from_der helper function is used to parse the public key from the encoded format.

Next, we hash the client data JSON and append that hash to the authenticator data from the client. This value is what should match the signature using the public key we’ve got, so finally we check that. If all these steps succeeded, we return true, and false otherwise.

The last helper function will receive the decoded client data make sure it’s got all of the values that we expect. If the crossOrigin value is present and is not false, the client data is invalid and the login attempt will be rejected.

Otherwise, we check that the data has the expected type and origin, and we extract the challenge value (note that we’re checking the origin again here, and this would need to change in production):

And lastly, if neither of the previous patterns matched, the client data fails validation:

Now, let’s put all those parts together and validate the login attempt.

defmodule PhoenixPasskeysWeb.UserSessionController do
  def create(conn, params) do
    # ...
    with credential when not is_nil(credential) <- Accounts.get_credential(id),
         true <- verify_signature(credential, client_data_json_str, authenticator_data, signature),
         {:ok, client_data_json} <- Jason.decode(client_data_json_str),
         {:ok, challenge} <- check_client_data_json(client_data_json),
         true <- challenge == get_session(conn, :webauthn_challenge),
         true <- :binary.part(authenticator_data, 0, 32) == :crypto.hash(:sha256, "localhost"),
         true <- (:binary.at(authenticator_data, 32) &&& 1) == 1 do
      conn
      |> delete_session(:webauthn_challenge)
      |> UserAuth.log_in_user_without_redirect(credential.user)
      |> json(%{status: :ok})
    else
      _ ->
        json(conn, %{status: :error})
    end
  end
end

Here’s everything that we’re doing:

Lookup the credential with the given ID.
Use the public key we had stored to verify the signature on the authenticator data and client data JSON.
Decode the client data JSON.
Check that all the values in the client data are what we expect, and extract the challenge that was signed.
Ensure the challenge that the user signed matches what we previously generated.
Extract the hash of the origin from the authenticator data and ensure it matches our origin (this would not be localhost in production).
Check that the authenticator data has the user presence bit set (indicating that a person was actually present on the client’s end).

If all of those steps succeeded, we remove the old challenge value from the session (since it’s no longer needed), actually log the user in, and then respond with the “ok” status that the JavaScript is expecting. If any step failed, we’ll respond with the “error” status and the frontend will alert the user.

Since there’s a lot going on here, it’s worth being clear about what exactly in this process lets us authenticate and prove that the user is who they claim to be. Since the signature verification step is using the public key that we stored during the registration process, we know that anyone that can produce a valid signature using that public key must be the user (or at any rate, have their private key). The value that they’re signing is, essentially, the challenge: a securely generated random value. The user isn’t directly signing the challenge, but this is still safe, since the challenge value is included in the client data JSON, that hash of which is included in the signed message.

So: the challenge value that was signed by the user must be in the client data, and the challenge value in the client data must be the one we generated. Given that, we know that the user whose key was used to sign the message is the one trying to log in now. That we’re verifying with the stored public key prevents an attacker from using an arbitrary key to sign the login attempt. And that the signed challenge matches the challenge the server generated means an attacker can’t reuse a previous response to login (a replay attack).

At long last, we finally have the ability to log in to our application using a passkey. Only a few minor things to go, so let’s forge ahead.

Handling login if the user enters an email §

Although we’re presenting the conditional UI, there’s nothing preventing the user from typing their email into the field and then clicking “Sign in,” so we should probably handle that to. This can be done fairly simply by reusing our existing code for conditional login.

We’ll change the loginWebAuthnAccount to take an additional parameter, conditional, which will be a boolean indicating whether this login attempt is to setup the conditional UI or triggered by submitting the login form.

If it’s false, we won’t request conditional mediation and instead we’ll look up the credentials corresponding to the email the user entered and ask WebAuthn for one of those:

async function loginWebAuthnAccount(loginForm, conditional) {
    // ...
    let allowCredentials = [];
    if (!conditional) {
		const email = loginForm["user[email]"].value;
		const resp = await fetch(`/users/log_in/credentials?email=${email}`);
		const respJSON = await resp.json();
		allowCredentials = respJSON.map((id) => {
			return {
				type: "public-key",
				id: base64ToArrayBuffer(id),
			};
		});
    }

    const getOptions = {
		mediation: conditional ? "conditional" : "optional",
		publicKey: {
			challenge: base64URLToArrayBuffer(challenge),
			rpId: "localhost",
			allowCredentials,
		}
    };
    // ...
}

The “optional” value for the mediation option means that the authenticator isn’t required to display UI, but will do so if its policies dictate that. The allowCredentials array contains objects describing all of the credentials that we want to accept—specifically, their binary IDs as ArrayBuffers.

We look up the user’s credentials so that the one we actually request from the authenticator matches the account that the user is trying to log in with. To handle this, we’ll also wire up an additional route on the backend that returns the base 64-encoded IDs of all the credentials belonging to the user with a given email.

The get_credentials_by_email function is quite simple. It just looks up a user by email, preloading any credentials they have and then returning them:

Back in the JS, we can tweak the setup code to pass true for the conditional parameter in the initial request and also register a submit handler on the login form that will invoke it with false:

document.addEventListener("DOMContentLoaded", () => {
    // ...
	const loginForm = document.getElementById("login-form");
	if (loginForm) {
		loginWebAuthnAccount(loginForm, true);
		loginForm.addEventListener("submit", (event) => {
			event.preventDefault();
			loginWebAuthnAccount(loginForm, false);
		});
	}
});

And so we’ve handled the case where the user ignores the conditional UI and still types in their email to log in.

Passkey reset §

A not-infrequent argument against passkeys is that if your phone falls into a lake, you lose access to all of your passkey-backed accounts. One could argue that this isn’t true because the definition of passkeys means that they’re backed up and thus protected from this sort of event (and indeed, earlier we only permitted registering with backed-up credentials). I think the argument isn’t particularly interesting, however, because you can still have a “Forgot my passkey” option that works just like it does now with passwords.

This is less secure than a passkey implementation that has no “Forgot” option. But it’s no less secure than current password-based systems, and I think the UX/security tradeoff here falls on the UX side—people will, inevitably, lose access to their passkeys while retaining access to their email.

Implementing isn’t too complicated, fortunately, since we can reuse much of the registration code. First, the JavaScript. The only change necessary is attaching the registration function to the reset form as well.

document.addEventListener("DOMContentLoaded", () => {
    // ...
	const resetForm = document.getElementById("reset-passkey-form");
	if (resetForm) {
		resetForm.addEventListener("submit", (event) => {
			event.preventDefault();
			registerWebAuthnAccount(resetForm);
		});
	}
});

In the HTML for the reset form, we we need to include the email in the same form field as the registration form (and also add an ID to the <form> element):

<input type="hidden" name="user[email]" value={@user.email} />

In the function for the reset route (which is changed to POST from PUT, to match the signup route), we take the credential ID and public key and use them to update the user’s credentials:

defmodule PhoenixPasskeysWeb.UserResetPasswordController do
  def update(conn, %{
        "credential_id" => credential_id,
        "public_key_spki" => public_key_spki
      }) do
    credential_id = Base.decode64!(credential_id)
    public_key_spki = Base.decode64!(public_key_spki)
    case Accounts.reset_user_credentials(conn.assigns.user, credential_id, public_key_spki) do
      :ok ->
        conn
        |> put_flash(:info, "Passkey reset successfully.")
        |> UserAuth.log_in_user_without_redirect(conn.assigns.user)
        |> json(%{status: :ok})

      {:error, _} ->
        conn
        |> put_flash(:error, "Error resetting passkey")
        |> json(%{status: :error})
    end
  end
end

After updating, we log the user in if applicable and return JSON with the appropriate status.

The reset_user_credentials function works very similarly to the original reset password function that was part of the template: it deletes all the user’s existing sessions, and then removes their existing credentials and creates a new one:

defmodule PhoenixPasskeys.Accounts do
  def reset_user_credentials(user, credential_id, public_key_spki) do
    Ecto.Multi.new()
    |> Ecto.Multi.delete_all(
      :old_credentials,
      from(a in UserCredential, where: a.user_id == ^user.id)
    )
    |> Ecto.Multi.insert(
      :new_credential,
      UserCredential.changeset(%UserCredential{}, %{
        id: credential_id,
        public_key_spki: public_key_spki,
        user_id: user.id
      })
    )
    |> Ecto.Multi.delete_all(:tokens, UserToken.user_and_contexts_query(user, :all))
    |> Repo.transaction()
    |> case do
      {:ok, _} -> :ok
      {:error, _, changeset, _} -> {:error, changeset}
    end
  end
end

It’s worth noting that this does have slightly weaker security properties than the phx.gen.auth reset password implementation. With that approach, leaking a password reset token does not necessarily result in an account takeover, since whoever obtained the leaked token may not know the target user’s email address. Since the auth template forces the user to re-login after a reset, this prevents someone without the email address from gaining access even if they change the password.

But since logging in with a passkey is functionally a single factor, resetting it means gaining access to the account. So, a leaked reset token gives the bearer control over the account. This is an argument against having a reset option, but whether this is a concern in practice depends on your specific circumstances.

Conclusion §

As noted, this is not a complete implementation. There are a handful of places where I’ve left things unfinished since this isn’t meant to be production-level code. There are also a few places where there are security decisions that need to be made on a more contextual basis, that I’ve tried to note. And, of course, you wouldn’t really want to only permit signing in with passkeys and wholesale drop the passwords column from your database.

Nonetheless, I hope this has been a helpful look at how to implement passkeys in an Elixir/Phoenix application. You can find the complete repo here, and it may also be useful to look at the specific commit where passkey support was added.

From legacy code to verifiable specifications with Surveyor

Wed, 29 Apr 2026 20:10:39 +0000

What is legacy code?

Sometimes that's the fifteen-year-old monolith. Sometimes it's the service that grew complicated faster than the team could keep up with. Sometimes it's that project that started as a shortcut and turned into a roadblock. Legacy code has been exposed to real world demands long enough to gather fixes for edge cases we are not aware of anymore.

And — increasingly — sometimes it's the repo your CEO vibe-coded last weekend and that they proudly present on Monday. It runs. Now you have to own it.

Legacy software is the code we lost confidence in

That last category is genuinely new and growing fast. Code written once, never reviewed, tested a bit, deployed one day. Or, worse, released as a library and never maintained after that.

A new kind of technical dept has evolved: "The code we should review thoroughly one day."

"Can we rewrite this?"

The actual problem is not the writing of the new code, but the discovery of what the old code does, and the certainty that the new one does the same thing.

We've been building two small Elixir tools to help with both halves of that problem.

Surveyor scans a codebase in any language and produces an architectural model.
Assay runs the same plain-text behavioral specs against both the legacy and the rewrite, so you can prove they behave the same.

Together they cover the workflow: Figure out the structure, capture the behavior, rewrite, prove the rewrite works.

TL;DR

Surveyor produces an automatically verifiable behavioral specification of a legacy system.

How we got here

Before going into either tool, it's worth walking through the shape of the problem. The picture builds up step by step, and the final diagram only makes sense once you've seen the holes in the earlier ones.

What we don't want

A direct rewrite also copies bugs, dead code and unused features.

Point an agent at a legacy codebase, ask it to produce a new one, hope for the best. Call it "the Claude zombie rewrite." You end up with a target codebase that nobody understands either, plus a generation gap between the old assumptions and the new. What has the agent overlooked, misunderstood, assumed? A black box gets replaced with a different black box. That is not a rewrite, that is a transcoding.

What we want instead

The features of the legacy codebase need to be revised

The valuable artifact in the middle is not new code — it's a readable specification of what the old code actually does. Plain text, human-reviewable, version-controllable. The kind of thing you need when you want to ask the product owner if a feature is really wanted that way. - Or if it is just a legacy quirk.

If you have it, you can use it for estimates. It can be your map to plan the implementation and might even give you a hint about that stakeholder that you might have forgotten about otherwise.

Organised and illustrated

The specification's organisation should mirror the codebase

A single big text file describing a whole legacy system is only useful for very small projects. You want the spec broken down by module — by bounded context — and the modules themselves arranged on a map of the architecture. Once each module has its own spec associated to it, two things become possible: you can divide the work, and you can reason about coverage.

Job done! Or is it?

We now have and organised map of the system and a set of specifications. We have an overview, and the details. We could now talk to the stakeholders, the development team, and set to work.

The specification's organisation should mirror the codebase

However, we have not yet verified our findings.

What we are looking at is not yet a specification of the legacy system — it is a speculation of the legacy system. A document that claims to describe what the code does, written by reading the code (or by asking an LLM to read the code). Plausible, well-organised, reviewable. But unverified.

1. Verifiability

This is speculation as specification, and it's the trap most "let's document the legacy system" projects fall into. A Word document of "what the system does" can be worse than no document at all, because people start trusting it. But even if the specification is done as a group effort by all stakeholders involved, and is thoroughly verified, it still lacks one critical property:

2. Reproducibility

What we actually want is a verifiable specification: Not "verified once," but verifiable on demand, any time someone asks the question. You could verify a paper spec by hand, of course — sit down, read it, run the system, tick off the scenarios — but manual verification is expensive enough that it gets done once at sign-off and never again.

A verification you don't actually do is verification that doesn't exist. And unless the legacy codebase is a museum exhibit, it's a moving target:

The team is still shipping fixes, the system is still drifting, and a paper spec written on Monday is no longer accurate by Friday.

Runnable specs

The fix is to make the specs executable — to wire them into something that can run them against the actual legacy system, on demand, and tell you whether they still hold.

That is the move from speculation to verifiable specification: not a one-shot audit, but a button you can press on every commit, every nightly, every time anyone asks "is this still true?"

A green run proves the spec is still true. A red run is a useful signal: either the legacy drifted, or your spec was wrong. Either way, you find out before the rewrite, not during it.

That magic is Assay

Assay is the runner. It parses the .assay specs Surveyor produced, matches each step against bindings you write, and exercises the real legacy system through whatever interface it actually exposes — HTTP, CLI, message queue, file drop, whatever. The framework is small.

The runtime contract is "your bindings + your assertions + a deterministic runner."

The systemic approach

Open up that "parts you fill in" box and you find the actual surface area of work: pattern recognition for step matching, a binding per target system, and Given/When/Then automations for setup, action, and verification.

The end state has a property that nothing in the earlier pictures had: the problem space has been dissected into manageable pieces. Architecture, behavior, target adapters, assertions — each is small, each can be reviewed independently, and each can be picked up by a human or an agent without having to hold the whole system in their head at once.

That's the goal of the toolkit. Everything below is the detail of how each piece is built.

Surveyor — discovering the architecture you inherited

Surveyor is a Mix project that takes a path to a codebase and produces a Structurizr DSL workspace file. It runs in three phases mapping to the C4 model: C1 (system context), C2 (containers), C3 (components per container).

Crucially, Surveyor does not parse code. There are no language-specific parsers, no AST walkers, no fragile grammar files for thirty different ecosystems.

It scans the filesystem to identify what's there — languages, frameworks, project files, entry points, deployment manifests — and then feeds meaningful chunks to an LLM. That sounds trivial, but is the same mechanism that commercial coding agents use when they analyse your code.

Surveyor's intelligence is in the prompts and the chunking, not in a stack of half-broken language adapters.

The CLI is interactive at every phase.

$ surveyor ./legacy-monolith --phase c1

Phase 1 — System Context
  Scanning codebase...
  Querying LLM...

  System: "Order Management System"
  Description: Manages the full order lifecycle

  Actors:
    ✓ Customer — places and tracks orders via Web UI
    ✓ Warehouse Staff — manages fulfillment via Back Office
    ? Admin — found references in auth config (confidence: low)

  External Systems:
    ✓ Stripe — payment processing (REST API)
    ? SendGrid — found API key in config (confidence: low)

[a]ccept  [e]dit  [r]etry with more context  [q]uit
>

The LLM is asked to flag uncertainty. Anything tagged confidence: low shows up with a ? and a short reasoning string for human review. That's not just nice ergonomics — it's the bit that makes the tool trustworthy. An LLM that confidently invents a PaymentReconciliation container that doesn't exist is worse than no model at all. An LLM that says "I see a SENDGRID_API_KEY in .env.example but no Sendgrid client in the source, please confirm" is doing the right kind of work.

Each phase is resumable. Results are saved as JSON in ./surveyor/, so a thirty-container system doesn't have to finish in one sitting. You can hop in, accept C1, take a break, come back tomorrow, and resume at C2.

The end product is a workspace.dsl you can render in Structurizr. But more importantly, it's a workspace.dsl whose components are decorated with assay.specs and assay.schema properties, pointing at the behavioral specs and the domain schemas for each bounded context.

orderLifecycle = component "Order Lifecycle" "Orders" "Bounded Context" {
    properties {
        "assay.specs" "specs/order-lifecycle"
        "assay.schema" "schemas/order_lifecycle.ex"
    }
}

DDD pattern annotations from the LLM (customer-supplier, anticorruption-layer, …) ride along as properties on the relationship. Every component gets assay.specs and assay.schema properties — the contract the next tool reads from.

Examples

The System Context of a Medical Application

Surveyor identified the actors and the external systems of the application. Without knowing the code, we would already know which actor types there are

The System Context of a medical application

An example of an assay spec created in the second phase:

The Account Creation Specs of a medical application

Assay — proving the rewrite behaves the same

Assay is a minimal behavioral spec runner. Specs are plain text and look like this:

Component: [orderLifecycle] Order Lifecycle
Context: Order Placement

Definitions:
  - "a valid customer" means:
      a Customer with status Active, verified email,
      and a credit limit greater than zero

Invariants:
  - total must not exceed customer credit limit
  - at least one line item required

Rule: Customers can place orders for in-stock items

  @critical @phase-1
  Scenario: [OL-001] Place a simple order
    Given a valid customer "Alice" with credit limit €10,000
    And product "Widget" is in stock with 50 units available
    When Alice places an order for 3 units of "Widget"
    Then the order status becomes "Placed"
    And stock for "Widget" is reduced to 47 units

The bracketed [orderLifecycle] is a workspace.dsl identifier — the same identifier Surveyor wrote into the architecture model. The bracketed [OL-001] on the scenario is an optional free-form id that survives across rewrites of the spec text and lets you cross-reference from tickets or audit trails.

If that looks like Cucumber or Gherkin, it does — with one large difference. There is no glue layer, no abstraction over what kind of system you are testing, no framework opinions about HTTP, databases, or browsers. The bindings are just Elixir.

defmoduleTargets.Legacy.OrderLifecycledo
useAssay.Binding,component:"orderLifecycle"

@baseSystem.get_env("LEGACY_API_URL")

  step :given,~r/a valid customer "(?P<name>.+)" with credit limit €(?P<limit>[\d,.]+)/do
{:ok, resp}=Req.post("#{@base}/test/customers",json:%{
name:params().name,
credit_limit:parse_money(params().limit),
status:"Active"
})
assign(customer_id: resp.body["id"])
end

  step :action,~r/.+ places an order for (?P<qty>\d+) units? of "(?P<product>.+)"/do
{:ok, resp}=Req.post("#{@base}/orders",json:%{
customer_id:var(:customer_id),
lines:[%{product_id:var(:product_id),quantity:to_int(params().qty)}]
})
assign(order_id: resp.body["id"],http_status: resp.status)
end

  step :expect,~r/the order status becomes "(?P<status>.+)"/do
{:ok, resp}=Req.get("#{@base}/orders/#{var(:order_id)}")
    assert resp.body["status"]==params().status
end
end

A binding is just a module that brings whatever it needs — Req for an HTTP API, AMQP for a message broker, System.cmd for a batch processor, File for a directory-watching pipeline. Assay itself does not ship an HTTP client or a database adapter.

The framework provides parsing, step matching, scenario lifecycle, and assertions; the rest is your code. That's what makes the same spec runnable against an HTTP API, a CLI tool, a batch job, or a message queue, depending on what the legacy system happens to be.

🚀 Would you like to run this on your legacy codebase? Sign up for early access! Count me in!

The crucial property is target swappability:

assay run specs/order-lifecycle/ --target legacy
assay run specs/order-lifecycle/ --target new

Same specs. Different bindings. Different systems. The legacy binding hits the old SOAP API; the new binding hits the new Phoenix endpoint. If both go green, the rewrite is behavior-equivalent for the things the spec covers — which, after a few months of writing specs, is most of the things that matter.

There is no separate .exs file generated from the specs. The runner parses .assay files, matches each step against a regex on a binding function, executes them, and prints pass/fail. That's it.

How Assay actually works

The runtime is around six hundred lines of Elixir. Five design choices do most of the work.

A two-pass parser

The first pass lifts triple-quoted doc strings out of the file (so the line tokenizer doesn't have to reason about them). The second pass dispatches on the first keyword on each line.

defparse_string(content)do
{filtered, doc_strings}=
    content
|>String.split("\n")
|>Enum.with_index(1)
|>extract_doc_strings()

Process.put(:assay_doc_strings, doc_strings)

  filtered
|>Enum.reject(fn{line, _}->
    trimmed =String.trim(line)
    trimmed ==""orString.starts_with?(trimmed,"#")
end)
|>parse_lines(%Spec{})
end

Output is a plain %Spec{} struct. No AST nodes, no string interpolation.

The step macro generates real module functions

Each step :given, ~r/.../ do ... end call expands into a uniquely-named function plus a %StepBinding{} record on an accumulating attribute.

defmacrostep(type, regex,do: block)
when type in[:given,:action,:expect]do
  fn_name = :"__step_#{:erlang.unique_integer([:positive])}__"

quotedo
@assay_steps{unquote(type),unquote(regex),&__MODULE__.unquote(fn_name)/0}

defunquote(fn_name)()do
unquote(block)
end
end
end

At compile time, __before_compile__ exposes __steps__/0 and __component__/0 for the runner to read. There is no string codegen, no eval — just normal Elixir function definitions.

Component-scoped step matching

Anyone who has used Cucumber has hit the global step-definition problem: the same step text means different things in different contexts, but Gherkin treats step definitions as one shared namespace. Assay scopes bindings by component:

defprun_spec(spec, bindings, target, tags, exclude_tags)do
  component_bindings =
Enum.filter(bindings,fn b -> b.component == spec.component end)

# ... run each scenario against component_bindings only
end

This is the direct reason Surveyor and Assay use the same identifier convention. The architecture model and the spec runner share a vocabulary, and step text never collides across bounded contexts.

Pre-check, then execute

Before running any step, the runner walks the whole scenario and finds a binding for every step. If anything is unbound, the scenario fails before any side effects.

defrun_scenario(scenario, bindings)do
  matched_steps =
Enum.map(scenario.steps,fn step ->
casefind_binding(step, bindings)do
nil->{:unbound, step}
        binding ->{:bound, step, binding}
end
end)

ifEnum.any?(matched_steps,&match?({:unbound, _},&1))do
%ScenarioResult{status::error,error:"Unbound steps found"}
else
execute_matched_steps(scenario, matched_steps, bindings)
end
end

This is what stops a scenario from running half its Givens, hitting an unbound When, and leaving a dangling test customer in the database.

Per-scenario state in the process dictionary

Each scenario gets a fresh context — a tiny module that stores variables, params, doc strings, and data tables in the process dictionary.

def init do
Process.put(@vars_key,%{})
Process.put(@params_key,%{})
end

defget_var(name)do
Process.get(@vars_key,%{})|>Map.get(name)
end

defset_vars(keyword)do
  vars =Process.get(@vars_key,%{})
Process.put(@vars_key,Enum.into(keyword, vars))
end

The runner can save/0 and restore/1 the snapshot, which is how Assay tests itself by running its own .assay specs through itself in the same process.

There is no plugin system, no dependency injection, no scenario hooks beyond cleanup. Adding any of those would push complexity into the framework and out of the binding, which is the wrong direction. The framework exists to be small enough that you can read it on a Friday afternoon and trust it on Monday.

How they fit together

The flow is straightforward and it maps onto the five phases laid out in the Assay handbook.

Discovery (Surveyor). Run Surveyor against the legacy codebase. Walk through the C1/C2/C3 output interactively. Edit, retry, accept. Produces workspace.dsl.
Behavioral extraction. For each component in the model, write .assay specs and an Elixir schema. The assay.specs and assay.schema properties on every Structurizr component tell you exactly where they go: specs/<context>/ and schemas/<context>.ex.
Validation. Write bindings against the legacy system in targets/legacy/. Run assay run specs/ --target legacy. Iterate until green. You now have a verified, executable specification of the legacy system's behavior.
Stabilization. Review what's covered, what's missing, what's flagged as ambiguous. Get sign-off on scope.
Rewrite. Build the new system. Write bindings against it in targets/new/. Run assay run specs/ --target new. When that's green, the rewrite is done — at least for the surface area the specs cover.

The reason the two tools sit next to each other is that the architecture model is what gives the spec work structure. Without a model, "write specs for the legacy system" is an open-ended task with no obvious stopping point. With the model, every component has a specs/ directory and the question becomes "is each context's behavior covered?" That's tractable.

🚀 Would you like to run this on your legacy codebase? Sign up for early access! Count me in!

The cross-target story is what makes the rewrite verifiable. The same parsed .assay scenarios are dispatched, step by step, through two different binding modules at two different points in the project's life:

Same spec, two bindings, two systems. The runner doesn't know or care which target it's dispatching to — that's the property that makes the rewrite a measurable thing rather than a leap of faith.

Four audiences, one artifact

Four very different parties end up reading — or executing — the same .assay file: the product owner, the developer, the coding agent, and the test runner. The interesting observation is that they don't conflict on content — they conflict on form.

Product owners want to know what the system does in business terms, what would break if you changed X, and where the risk lives. They don't want UML. They want to read scenarios in their domain vocabulary, in plain language, and trust them.

Developers want to know how the system is structured, where the seams are, and what invariants they mustn't break. And — crucially — they want the documentation to be wrong less often than the code is. The moment docs lie, developers stop reading them.

Coding agents want machine-parseable, unambiguous, addressable artifacts with stable identifiers. They want to be able to say "the assertion at assay://orderLifecycle/order-placement/OL-001/step-3 failed" and have that mean something durable. They want types, schemas, and graphs they can traverse.

Automated tests want executable bindings — the layer Assay provides — and they want failures that point back to the spec, not just to a line of code.

Same facts, four presentations.

The .assay file gives the product owner readable scenarios in domain language. The workspace.dsl plus the schema modules give the developer the structural truth. The bracketed identifiers (Component: [orderLifecycle], Scenario: [OL-001]) give the agent its stable addresses. The runner turns the whole thing into a regression suite. None of these audiences is asked to read the others' representation; they all read the same spec, surfaced at the level of detail they need.

Where the LLM lives, and where it doesn't

A reasonable question at this point: with all the recent enthusiasm for LLMs, why isn't more of this LLM-driven?

The split is deliberate.

Surveyor uses an LLM because architectural discovery is genuinely a language task — you're reading config files, route definitions, deployment manifests, dependency lock files, and inferring "this is the API, that's the worker, that's the read model." That's exactly the kind of fuzzy, evidence-weighing job an LLM does well, especially when you tell it to flag uncertainty rather than guess.

Assay does not use an LLM at runtime. The runner is deterministic: parse the spec, match the regex, execute the binding, assert. A behavioral spec that sometimes passes and sometimes does not, depending on which model variant happened to answer, is not a spec.

LLMs are very welcome on the authoring side — drafting .assay files from legacy source is a great agent task, and the handbook has explicit guidance for coding agents about how to do that without inventing behavior. But the green/red signal at the end has to come from a deterministic runner, or it is not really a signal.

🚀 Sign up for early access:Count me in!

How you use them on a project

In practice, the first day of a legacy rewrite looks something like this:

Clone the repo. Run Surveyor with --phase c1 to get the system in context. Discuss the actors and external systems with the client — this alone surfaces "wait, who is the Admin role?" conversations that would otherwise happen three months in.
Run --phase c2 to map containers. This is where you discover the data plane that nobody documented, the cron job running on a forgotten server, the queue that two services share. Surveyor flags confidence: low items; the human resolves them.
Run --phase c3 per container. By the end you have a workspace.dsl that is, for the first time in the project's history, an accurate picture of what is deployed.
Pick the highest-risk bounded context. Read the legacy source. Draft .assay specs for its behavior. Write a legacy binding. Run it. Iterate until green.
Repeat per context, in priority order, until coverage is good enough to start the rewrite.
Build the new system, one bounded context at a time, with targets/new/ bindings going green as each context comes online.

The thing both tools are optimized for is the same: making the work legible. A legacy rewrite is a long project with shifting personnel and a nervous client. Both Surveyor and Assay produce artifacts — a workspace.dsl and a directory of .assay files — that survive turnover, show progress, and that the client can actually read.

Status and next steps

Both tools are ready to be tested in real world scenarios. This is where we need your help:

If you have a legacy system staring you down and a rewrite on the roadmap, get in touch. We would love to hear from you!

Elixir: Introduction

Sun, 26 Apr 2026 14:28:15 +0000

Elixir: Introduction

Head of the Agents and Assistants Department

Thu, 23 Apr 2026 10:20:28 +0000

Let me state upfront: my attitude toward AI assistants cannot be expressed as a boolean value. If you need an answer to the question posed point-blank—“New York Yankees or Boston Red Sox?”—I do not watch baseball at all; I’m a Barça fan. That said, I find AI assistants a perfectly legitimate and even liquid asset. The text below is an account of what made my work with agents pleasant and reduced their errors and rough edges to an acceptable minimum.

A little under a year ago I began working on Cure, a programming language with dependent types, finite state machines as first-class citizens, SMT verification, and other niceties, compiling to the BEAM.

My first approach to the apparatus ended in ignominious failure. I got tangled in my own architectural decisions, started piling on crutches wherever they fit, turned the code completely into an Italian restaurant menu, and lost heart. In a fit of idiocy I had an assistant generate the website and showed it to the public—the ideas themselves were intriguing enough, dependent types and SMT solvers on the BEAM are hardly redundant, and I harboured a quiet hope for community interest. The community correctly identified the language’s site as slop generated by a language model and received my attempt with arctic indifference. I got a great many “nothing works” responses and not a single coherent suggestion for improvement (no fault of the community’s—against the backdrop of the slop-flood of those days, my project would not have looked like Noah’s Ark even to an extremely charitable observer).

I stepped back, examined my creation from all angles, and was forced to admit: I had produced a monster. I saw no chance of licking it into shape, even through a global refactor. I bought a ream of paper and some pencils and started drawing, in order to understand exactly where I had gone wrong. (Spoiler: I had been so enchanted by the idea itself, and so desperate to get something to launch and run, that I had done literally everything wrong.)

I had no intention of giving up—and before me, full height, loomed the necessity of rewriting everything from scratch without repeating the mistakes. By that point I knew that artificial assistants could significantly accelerate the actual writing of code, so I started by erecting scaffolding around the future project. It was obvious to me then (I can now confirm that former intuition with experience) that all those prompts along the lines of “You are a genius architect with three hundred years of experience designing languages with dependent types and SMT solvers” work no better than the morning pep talk to an intern at standup: “You are a great programmer who has written three hundred million billion lines of code without a debugger.” If an assistant tasked with writing coherent code to a spec needed motivational gibberish to function, it would not be worth using under any circumstances whatsoever. Seriously, think about it: a model is asked to implement a GCD module using the Euclidean algorithm—are you really suggesting its deeply baked-in internal rules will not guide it down the correct branches of the conditional operators without first being told it has the soul of a prima ballerina and an avant-garde poet? What on earth does “You are a senior architect” actually mean? Do the people who advocate this believe that without such a preamble the training pathways via the memoirs of axolotl breeders will activate instead?

So, instead of all those skills/agents/whatever, I started by feeding this mechanised beast the source code of all my own libraries, lovingly written by hand, with the note: “Here are examples of good code. Write like this. Not like that—do not write like that.” I know it is immodest, but it is my assistant. You are welcome to feed yours your own code.

Then I reclaimed the wasteland: thus were born Metastatic, implementing MetaAST for different languages across different paradigms, and Ragex—a RAG built on AST rather than plain text (my hunch that AST fits into a context window far more easily and is far better structured than raw source code turned out to be correct).

My task was building a new language; constructing a new ecosystem from scratch was not part of the brief. So I analysed existing solutions—Rust, Go, Elm, Gleam—and chose the one I considered most mature (I never promised the project would be neutral with respect to my tastes and preferences). I simply copied the Elixir ecosystem and added to it what I had personally found lacking over the last ten years. Thanks to standing on the shoulders of giants in this regard, the model wrote almost the entire ecosystem for me; I simply told it: “Look how beautifully new project creation is handled in Elixir—do the same for Cure.” Language models are strong at translation, and Elixir is considerably more intelligible to them compared to almost every other language.

So, before the first line of code, my backpack already held: the right AST—a language in which the assistant and I can communicate far more easily than in homespun English—a handcrafted RAG, and a clear understanding that every step must be a simple, atomic change. The fewer choices the assistant has to make between two paths, the cleaner the result. This principle outweighs the quality of all prompts combined.

Next I had to solve the problem of validating the produced code. My eyes are sharp, but they occasionally miss non-obvious bad decisions in review. Thus was born oeditus_credo—a set of nearly forty additional credo checks covering vulnerabilities, anti-patterns, and the like. The library also ships a mix oeditus_assistant_rules command, a rules generator for the soulless assistant. To those rules I also added: after each stage, verify that mix format && mix credo --strict && mix dialyzer && mix test passes; update all documentation; add regression tests for new code; then run all regression tests and confirm they are still as green as my face the morning after a party.

Every reasonably significant stage also ends with creating an “example” in the examples folder. Something for people to look at, and the regression tests never idle.

At this point I felt ready to start writing actual code. Cure has certain critical parts I wrote by hand from scratch. Every other pull request gets manual edits from me. Every bug I find through manual testing I fix by hand (the obvious ones aside). And yet I reached the desired result considerably faster than if I had written every line in Vim.

Over the course of working on Cure I learned to keep the number of errors—and consequently manual edits—to a minimum. A fairly substantial release, v0.26.0, required not a single correction, for example. Here is the distillate of my rules for communicating with an assistant, in case it proves useful to anyone:

the task must be self-contained but not too large; “first define the types for numeric, then we will write the converters” does not work
inside the task there must be no undefined ambiguities that will derail our T9; there must be exactly one path to the solution
before tackling any task, demand an implementation plan and edit it until all ambiguities have vanished
the task must be roughly solved in your own head before turning to the assistant; otherwise the odds of agreeing with an incorrect solution are uncomfortably high
generated code must be comprehensible and elegant, “rewrite this nicely, I fed you three gigabytes of rules” does not work; if the solution is aesthetically repellent, there is a problem with the task formulation—close the session and start over
setting a task and going for coffee is the direct path to the infinite iterations described in the previous point; the stream of unconsciousness must be watched and any attempt to stray from the planned path killed without mercy
finally, if it seems to you that the cognitive load is decreasing and that any cook could now implement such a project—you need pharmaceutical intervention; your fingers tire less, yes, but if you could not have implemented the project from scratch in a text editor, the LLM is no help to you; it will generate something suspiciously twitching, no argument there, but the first halfway-serious complexity requiring a considered architectural decision will put a cross on the whole enterprise.

That is my experience. Yours may differ; I can bear with you on that matter.

Try https://cure-lang.org and maybe you will like it. The site now has a playground where you can experiment with types in real time, and an almost real console where you can play with the REPL without installing anything locally.

Happy curing!

Scotty, I need warp speed in three minutes - Hauleth

Mon, 20 Apr 2026 23:02:18 +0000

In my last larger gig I worked on fascinating project - Postgres connection pooler written in Elixir. Unfortunately, due to different circumstances, this project burned me out to the ground. However what doesn't kill you ~~is crap not a weapon~~ can become great learning experience.

Most of my achievements in this project were related to performance. This project contains very tight loop in form of query handler, that needed to run hundreds of thousands times per second per user connection. That mean that this functions are very sensitive to even slightest performance changes. And that was my task - to find potential improvements that can be made to make this codebase be much faster.

After departing from Supabase I liked the project so much (mostly as learning ground) that I have created my own fork, where unrestrained from all business side of the project I could focus purely on squeezing as much of performance as I can. This project now lives as Ultravisor - it is still nowhere near being done in a way that I like, but I still go back to work on it from time to time to find potential performance improvements.

This is a story of things that I have done and learned during that journey.

Beware: It is a retrospection, so in some places my memory may be not the best.

Here I need to provide some explanation first, about how Ultravisor works with database connections. It provides 2 modes of operation:

session - where each connection from user to Ultravisor checks out one connection from Ultravisor to database. It checks out once, at the start of connection, and then holds connection until the end;
transaction - where on connection there is nothing done. Client connects to the Ultravisor and can keep that connection indefinitely without ever bothering database. Database connection is checked out only when there is some request from user and is returned to the pool as soon as that result of that query is returned and DB is ready for next one.

While session mode is quite on par with other implementations of connection pooling for Postgres, transaction mode is where performance is lacking and is the main focus is put. In whole article (unless mentioned otherwise) I will speak about transaction mode of Ultravisor.

Lesson: Flame graphs and call tracing is essential#

Pretty obvious thing, but still valuable lesson for any performance optimisation endeavour. For that the great thanks to Trevor Brown and his awesome project eFlambè. This helped a lot in tracing hot points in the running code.

Unfortunately this project seems to be less active recently and has some missing features, like listening for given duration instead of function calls count. This can be partially fixed by simply listening for count of calls to handle_event/4 function given times and then running cat *.bggg to concatenate all files into larger trace. That has disadvantages, but at least it was workable within Speedoscope which I also highly recommend to anyone who needs to work on such optimisation.

While flame graphs are awesome, there is cost to gathering them with eFlambè - it greatly affects performance. Fortunately Erlang has some built in tools that have lesser performance impact, and the "most modern" of these is tprof. This tool is pretty easy to use, but is less detailed than eFlambè. But even with that limitation, it provides superb insight into stuff that has greatest impact on performance, as well as it make it easier to work on long running processes, as it work asynchronously, so you can "manually" decide how long you want to trace your process.

Summary: Knowing where your bottlenecks are is essential for performance optimisations.

Lesson: Doing less can improve performance#

Obvious thing that need to be stated - doing nothing is faster than doing something. Extracting amount of data sent over given socket using :inet.getstat/2 call is fast, but not free. That involves some waiting for response from either port or process handling connection, which introduces slowdown. Two possible solutions there are:

Do not gather that metric at all - sensible, but not feasible, especially when you use that metric to charge your users.
Gather that data less often.

The approach I have taken there is obviously 2., and the solution is dumb simple - debouncer.

Debouncing is an interesting technique often used in user interfaces where you accept some event, and then for some period you ignore repeated events. The reason for that is that our interfaces may have flaws that send repeated events one after another.

In this case Ultravisor tries to store amount of sent data after each query, but that can get expensive for many short queries. Instead I have implemented simple per-process debouncer:

This stores returned data in process dictionary (per-process mutable space with quick access) and if there was no call in given time-period, then we process data again. This is safe way to do so, as :inet.getstat/2 will always return amount of data that socket processed since it started, so data between calls will be accounted.

Before:

tps = 79401.392762 (without initial connection time)

After (10ms of debouncing):

tps = 80069.646510 (without initial connection time)

After (100ms of debouncing):

tps = 80568.825937 (without initial connection time)

Summary: Doing noting is more performant than doing something. Sometimes doing nothing can be quite easy.

Lesson: Telemetry is not free#

When working on most projects, especially Phoenix-based, one can slap :telemetry.execute/3 calls everywhere and notice no performance degradation¹. Unfortunately, when you do hundreds of thousands calls a second - that is not a case.

For unaware readers - Telemetry is Erlang event dispatching system for observability events.

In this project the metrics are exposed in Prometheus/OpenMetrics format, which means that there needs to be collection system within the application. In BEAM applications the standard way to implement that is to use ETS tables to store recorded values. Fortunately there are libraries to handle that for you, and for the longest time "gold standard" for it was telemetry_prometheus_core library created by Telemetry core team.

While for most projects that library is performant enough (because metrics aren't recorded in quite tight loops), in case of this project that was not a case. There, metrics gathering is still one of the hottest spot in the codebase, even with all improvements that have been done.

Excerpt from tprof profile:

Function	Calls count	Per call (μs)	Percentage
`Peep.EventHandler.store_metrics/5`	1421911	0.13	4.21%
`Peep.Storage.Striped.insert_metric/5`	904852	0.26	5.11%

This is with awesome library Peep by Richard Kallos. When using telemetry_prometheus_core it was simply the most expensive thing in whole loop. Just replacing metrics gathering library with Peep gave us about 2x bump in TPS.

Summary: Telemetry handler can matter in tight loops. Fast metrics gathering isn't easy.

Lesson: Records instead of maps or structs#

Elixir uses structs for structured data. That gives a lot nice features wrt. hot code reloads, compilation graph dependencies, and other. However, because structs are maps, there is a cost. Maps have O(log n) access time to the fields, this is how maps are constructed in memory. While smaller maps have slightly different (better in most cases) characteristics, there is strict requirement that you keep your structure with less than 31 fields² and it still has slight memory overhead. The alternative is to use records. These have better performance characteristic (always constant) irrelevant of the amount of fields at the cost of being slightly more rigid (records are tuple based) and less convenient to use (experience may vary). Additional advantage in my opinion is that it is harder to add incorrect field by using Map module.

Current (OTP 28) limit for small map is 32 keys, but Elixir uses one key for struct name, hence 31 fields is the limit.

Before you will run and change all structs in your system to records, just remember - most of the time the difference doesn't matter - just use structures.

Before:

tps = 81765.266264 (without initial connection time)

After:

tps = 82147.855889 (without initial connection time)

Summary: Records are super handy when you need to squeeze each bit of performance. It doesn't provide much, but these adds up.

Lesson: ETS tables are super fast, but not always#

ETS is Erlang's built-in module for storing key-value data in mutable way. Like built-in Redis. This structure allows for sharing some data in a way, that is easy to access from different parts of the system. One example of system that is using ETS for storing their information is Telemetry (mentioned above).

While for 99% of the use cases Telemetry will be fast enough, it has some problems with tight loops. Main problem is that it will always copy data from table to the caller process. That mean that it can put high memory pressure on the process that tries to retrieve data.

Fortunately Erlang supports another mechanism for storing globally accessible data - persistent_term. Of course, there is no such thing as "free lunch" so it has substantial disadvantage - it works poorly³ with data that changes often, as removing or changing data in a key will require walk through all processes to copy data from it to processes that may use it into process memory. However - Telemetry handlers should not change a lot, you should just set them once as soon as your system start, and then ideally they will not change ever again.

There is slight optimisation that makes it fast in some cases (single word values, like atoms), but that is not the case there, so we can ignore that.

Before⁴:

tps = 76914.004685 (without initial connection time)

After:

tps = 78479.006634 (without initial connection time)

⁴

If you wonder why these results are lower than in previous section, it is because test conditions are identical only per section, not cross sections. In this particular case I have ran benchmark while collecting metrics (to show difference in persistent_term change) while other are ran without metrics to not pollute results.

Summary: persistent_term is awesome and super fast, so if you know that you have some data that will probably never change and will be requested constantly, then it may be good place to store that data.

Lesson: Calling your `GenServer`s is fast, but not 90k times per second fast#

One of the interesting observations that I have spotted is that if there are longer running queries, ones that send more data over the network than just simple short responses, then the difference between Ultravisor and "state of the art" tools like PgBouncer or PgDog (that are written in non-managed languages like C and Rust) is much smaller (obviously it is still there, but it is on par, not substantially off).

I needed to dig more, what can be the cause of such strange behaviour. The reason was found in place where I least expected it - checking out database connection to be used.

Flame graph showed that almost third of the time is spent on checking out database connections, and most of that time is spent in 2 function calls, both of them are internally gen_statem calls and in both most time is spent on sleeping (aka, waiting for reply).

Now, this one is hard thing to optimise, as in Elixir there is no mutability (almost, we will get there). This mean that if I want some form of shared queue of processes, then I need to use separate process to keep state of the queue for us, and then do GenServer calls to fetch that state. What I did in such situation? What any unreasonable Elixir developer obsessed with performance would do - NIF⁵.

⁵

I wanted to use ETS there, but for that to work it lacks function like ets:take/2 that would return only one element from the tables with type bag or duplicate_bag. Or any other form of just taking out any (possibly random) element from ETS table in atomic way.

The implementation is rather basic wrapper over VecDeques that allow popping single element from that queue without any message passing. The implementation is very crude, nowhere production ready. It doesn't provide any form of worker restarts or anything, but works quite well as PoC of what is possible.

New queue also provides a way to store additional "metadata" alongside the worker PID. This allows me to store DB connection socket next to connection process, which removes need for additional call to extract that data to pass requests directly to other DB, without copying data between processes.

Before:

tps = 83619.640673 (without initial connection time)

After:

tps = 94191.475386 (without initial connection time)

Summary: Sometimes one need to get creative to get around platform limitations. This may require some pesky NIFs though.

Conclusions#

Optimising such project was enormous fun and I think that at the current state there is nothing extra that can be done to optimise it more without optimising generated JIT-ed native code or optimising Erlang scheduler.

There are some flags, that affect performance, but as it is currently unclear why these work at all (probably it is related more to the OS scheduler rather than Erlang performance), I left them out of this article for now.

Post Scriptum: Good tooling helps a lot#

Just after I have started that optimisation project after leaving Supabase I started using Jujutsu for version control. That one thing helped me a lot with being able to have separate branches/PRs for each of the changes, while at the same being able to work with mega-merge of them all.

That allows me to profile code with all other noise removed, while still exposing the changes as separate reviewable units. Without that support I would need to decipher what have already been changed and/or removed from the profile.

Additional feature that I heavily used there is "anonymous branching". As when working with JJ I do not need to create new name for each branch that I want to try, it was way easier to implement one idea, then just do jj new @- (which branches off at the commit that is parent of the current one) and just implement alternative idea. I used that constantly to compare ideas and reject failed concepts.

∎

Resist Vendor Lock-In With Supabase

Sun, 19 Apr 2026 04:26:32 +0000

In one of his recent YouTube videos, Theo Browne highlights the potential pitfalls that comes with using a Platform-as-a-Service.

The video goes over recent work done to fuzz suspected Firebase-backed services looking for leaked credentials. As is shown in the video, this is an all-too common pattern in these platforms, where the "backend" is being directly manipulated from client-side SDKs, such that theres a high likelihood that credentials are carelessly exposed to the client. In addition to potential secret leakage, many of these platforms, including Firestore, do not have great default security settings. This has obvious MAJOR security concerns is a horrible practice for the industry to standardize on, but yet these platforms still remain extremely popular despite known security concerns due to their ease of use and how they enable rapid development.

In addition to these security concerns, Firebase has long drawn the ire of many developers due to its tight coupling with Google's proprietary ecosystem, and its cost structure which has lead to more than a few horror stories about unexpectedly high bills (although these issues can often be attributed to poor database practices that lead to exponentially ballooning costs). With much apprehension towards Firebase, many looked for alternatives, and found themselves needing to piece together a solution from many different parts. Here enters Supabase, which pitches itself as the "open-source Firebase alternative." Supabase offers similar solutions as Firebase, such as a realtime database, object storage, user authentication and management, edge functions, and much more. Supabase, however, is built entirely from open-source software (in fact, they stipulate that Supabase will only ever include software that comes with an MIT, Apache 2, or equivalent license).

When asked about his thoughts on Supabase, Theo rightly points out that these problems still exist when using their client SDKs, but he stresses that unlike Firebase, since Supabase is built entirely on open-source software (namely PostgreSQL) you not only can connect directly to the database, but it is actively encouraged by the CEO himself.

cc @chasers @filipecabaco @wenboxie

Alternatively: just connect to Postgres
¯\_(ツ)_/¯
— Paul Copplestone — e/postgres (@kiwicopple) April 19, 2024

Wow! What a breath of fresh air that is! And as it turns out, even when "just" using Supabase in this way, it is still great! You get an authentication table with Row-Level-Security and robust security practices out of the box, access to their wonderful front-end database management, and the ability to easily integrate their other features such as Edge Functions, Object Storage, all while maintaining tight control over your backend. It just makes sense! It offers the rapid development experience that is all too valuable, while not compromising on security.

Still, you might be wondering what this looks like from the developer perspective. Surely if you're writing directly to the database this must be more unwieldily than using those client SDKs right? Well, not necessarily!

Supabase still offer PostgREST out of the box, which exposes a REST API for database operations, but you have control over where you fire off those requests from. And since you have complete control over your database, you can even do a hybrid approach, which is what I prefer. Since Supabase takes care of the auth.users table (and even makes it read-only through the web frontend), you might want to just use the REST APIs they expose to handle authentication, but then handle everything else yourself.

💡

If you want to have a table NOT be exposed as an API through PostgREST, just add it to a different schema from the public schema. All tables under the public schema will be exposed.

Even though you might want Supabase to handle authentication, you still might want to extend features from that users.auth table. Since they discourage modifying that table directly, the suggested approach is to make a table in the public schema (which is the default schema – the auth schema is managed by Supabase) to manage any additional information related to the user. The documentation goes into much more detail about Managing User Data, but that is the gist of the situation. Well, if you want this new table (let's call it Profiles) to stay in sync with the auth.users table, it's best to have the Profiles table refer to the auth.users table, and add a Postgres trigger to run a function to create a new entry in the Profiles table whenever a new user is registered.

💡

It's also worth noting that in their recent General Availability Launch Week, Supabase also announced new efforts to increase security practices across projects, including a Postgres Linter and new Security Advisor and Performance Advisor dashboards to help you maintain good security posture.

GitHub - supabase/splinter: Supabase Postgres Linter

Supabase Postgres Linter. Contribute to supabase/splinter development by creating an account on GitHub.

GitHubsupabase

Supabase Security Advisor & Performance Advisor

We’re making it easier to build a secure and high-performing application.

Supabase

Let's walk through an example of what this might look like using my favorite backend language, Elixir (it's also a favorite of Supabase themselves). I'll be using Elixir's very nice ORM, Ecto.

One you make your new project (using mix new or perhaps more commonly in this situation, mix phx.new) you'll want to make sure you have Ecto as a dependency, and then you'll want to model the auth.users table as well as the new public.profiles table.

defmodule User do
  @moduledoc """
  This schema represents the default Supabase users table, which is under the 'auth' schema.

  Since we don't actually manage this schema, we will not make any migrations for it.

  This is mainly for convenience when unmarshalling data and working with users, so we
  can refer to the User struct rather than a generic map.

  Notice that we specify the primary key which will be referred to later

  Ecto schemas do not have to have 1-to-1 fields match the table in the database, so we can use whatever minimal fields we want to mirror in the profiles table (`id` at a minimum).
  """
  use Ecto.Schema
  import Ecto.Changeset

  # id is a UUID
  @primary_key {:id, :binary_id, autogenerate: false}
  schema "auth.users" do
    field :created_at, :naive_datetime_usec
    field :updated_at, :naive_datetime_usec
  end

end

defmodule Profile do
  @moduledoc """
  This schema holds extra information about users
  """
  use Ecto.Schema
  import Ecto.Changeset

  schema "profiles" do
    field :first_name, :string
    field :last_name, :string

    embeds_one :settings, Settings do
      field :default_portfolio, :string
      field :theme, Ecto.Enum, values: [dark: "Dark", light: "Light", system: "System"]
    end

    # This is the most important line and the one that is required to
    # properly link this table to `auth.users`
    # Make sure to set the type to :binary_id, which is what the Supabase 
    # auth uses
    belongs_to :user, User, type: :binary_id, references: :id, primary_key: true

    timestamps()
  end
end

Now we would create the necessary migrations, starting with creating the public.profiles table.

defmodule CreateProfiles do
  use Ecto.Migration

  def change do
    create table(:profiles, primary_key: false) do
      # notice the `prefix` since `public` is the default prefix
      # notice the specifying the type to match the Supabase defaults
      # make sure to set this as the primary key
      add :id, references(:users, on_delete: :delete_all, prefix: "auth", type: :uuid),
        primary_key: true

      # These fields should match what you have in your schema
      add :first_name, :string
      add :last_name, :string
      add :settings, :map

      # This represents the `inserted_at` and `updated_at` fields in the 
      # schema, and are required by default
      timestamps()
    end

    # You might also want to add indexes to improve performance and ensure data integrity
    create index(:profiles, [:id])
  end
end

Now we add a migration to add the trigger:

defmodule CreateProfilesTrigger do
  use Ecto.Migration

  def up do
    # Function to insert a new profile
    execute """
    CREATE OR REPLACE FUNCTION public.create_profile_for_new_user()
    RETURNS TRIGGER AS $$
    BEGIN
      INSERT INTO public.profiles (id, inserted_at, updated_at)
      VALUES (NEW.id, now(), now());
      RETURN NEW;
    END;
    $$ LANGUAGE plpgsql SECURITY DEFINER;
    """

    # Trigger to call the function after a user is inserted
    execute """
    CREATE TRIGGER trigger_create_profile_after_user_insert
    AFTER INSERT ON auth.users
    FOR EACH ROW
    EXECUTE FUNCTION public.create_profile_for_new_user();
    """
  end

  def down do
    execute "DROP TRIGGER IF EXISTS trigger_create_profile_after_user_insert ON auth.users;"
    execute "DROP FUNCTION IF EXISTS create_profile_for_new_user;"
  end
end

And that's it, you now have extended the ability to store information for the users while keeping the security provided by the protected auth.users table. But now how do you use it?

Well, let me show you how in fewer than 90 lines of code:

defmodule UserManagement do
  @req Req.new(
         base_url: Application.compile_env(:myapp, [:supabase, :base_url]),
         headers: [apikey: Application.compile_env(:myapp, [:supabase, :api_key])],
         url: "/auth/v1/:action"
       )

  def get_current_user(bearer_token) do
    Req.get!(
      @req,
      auth: {:bearer, bearer_token},
      path_params: [action: "user"]
    )
    |> Map.get(:body)
  end

  def signup_with_username_and_password(email, password) do
    Req.post!(
      @req,
      path_params: [action: "signup"],
      json: %{email: email, password: password}
    )
    |> Map.get(:body)
  end

  def login_with_email_and_password(email, password) do
    Req.post!(
      @req,
      path_params: [action: "token"],
      params: [grant_type: "password"],
      json: %{email: email, password: password}
    )
    |> Map.get(:body)
  end

  def send_password_recovery_email(email) do
    Req.post!(
      @req,
      path_params: [action: "recover"],
      json: %{email: email}
    )
    |> Map.get(:body)
  end

  def update_user(bearer_token, data \\ %{}) do
    {email, data} = Map.pop(data, "email")
    {password, data} = Map.pop(data, "password")

    body = %{
      "data" => data
    }

    body = if email, do: Map.put(body, "email", email), else: body
    body = if password, do: Map.put(body, "password", password), else: body

    Req.put!(
      @req,
      path_params: [action: "user"],
      auth: {:bearer, bearer_token},
      json: body
    )
    |> Map.get(:body)
  end

  def logout(bearer_token) do
    Req.post!(
      @req,
      auth: {:bearer, bearer_token},
      path_params: [action: "logout"]
    )
    |> Map.get(:body)
  end

  def send_email_invite(bearer_token, email) do
    Req.post!(
      @req,
      auth: {:bearer, bearer_token},
      json: %{email: email},
      path_params: [action: "invite"]
    )
    |> Map.get(:body)
  end
end

Pretty simple right? You could surely condense this more too if you so choose.

💡

The wonderful Req library takes care of a lot of the tedious parts of these requests, such as JSON-encoding and Bearer authentication. I highly recommend it over other HTTP clients for these reasons.

As long as you set your API key and Supabase instance URL into the application environment, this will have you ready to perform all of your user management tasks, and upon registration have the new user reflected in the auth.users table as well as the public.profiles table.

And lastly, make sure you connect to your database using the connection string (or you can specify each field if you'd like).

import Config

config :myapp, MyApp.Repo,
  url:
    "myconnectionstring",

This barely scratched the surface of what you can do with Supabase, but I hope it at least demonstrates how quick it is to get started with it and how flexible it is to have complete control over your stack.

Of course this example was in Elixir, but you could extend this to any other backend language and get the same benefits, and avoid falling victim to vendor lock-in!

Elevate Your Elixir With Sigils

Sun, 19 Apr 2026 04:26:32 +0000

Motivation

In a previous article of mine, I wrote about using NimbleOptions to add extremely powerful option handling to your Elixir applications. One of the custom validations I was using was a function in_range that would check if an option fell within a real-valued interval. This differs from Elixir's built-in Range in that it needed to be real-valued (rather than discrete integer steps). Additionally, mostly due to aesthetic and personal opinion, I wanted to be able to express the intervals using mathematical notation such as (0,1] to mean "allow any value greater than 0 and less than or equal to 1". I find Elixir to be such a beautiful language with a unique capacity for extensions that it felt wrong to use a function such as in_range or in_interval. Additionally, some implementations I've come across have somewhat unintuitive APIs, such as the following spec:

@spec in_range(float, float, float, bool, bool) :: bool
@doc """
  * `:value` - Value to test for inclusion
  * `:min` - Minimum value in range
  * `:max` - Maximum value in range
  * `:left` - Whether the left boundary is inclusive (true) or exclusive (false)
  * `:right` - Whether the right boundary is inclusive (true) or exclusive (false)
"""
def in_range(value, min \\ 0, max \\ 1, left \\ true, right \\ true)

There's nothing expressly wrong with this implementation, but with my use cases and as Elixir is being used more in the domain of Machine Learning which deals with these intervals quite often, I wanted a solution that felt a bit more integrated.

Solution

This led me to create a small 1-file library called Exterval which is available on Hex and can be installed with:

def deps do
[
  {:exterval, "~> 0.1.0"}
]
end

To make the interval feel more native to Elixir, I implemented it as a sigil that implements theEnumerable Protocol, which gives you several nice benefits:

Takes advantage of the member?/2 function which means we can use the in keyword to check for membership
Allows for checking of sub-interval membership (with some caveats)
Implements an optional step parameter that allows you to iterate/reduce over the interval
Implements a size function (remember, size refers to the ability to count the number of members without reducing over the whole structure, whereas lengths implies a need to reduce).
Allows for :infinity and :neg_infinity to be specified in the interval

This lets us write more succinct checks like:

iex> import Exterval
iex> ~i<[1, 10)//2>
[1, 10)//2
iex> ~i<[1, 10)//2> |> Enum.to_list()
[1.0, 3.0, 5.0, 7.0, 9.0]
iex> ~i<[1, 10)//2> |> Enum.sum()
25.0
iex> ~i<[-1, 3)//-0.5> |> Enum.to_list()
[2.5, 2.0, 1.5, 1.0, 0.5, 0.0, -0.5, -1.0]
iex> ~i<[1, 10]> |> Enum.count()
:infinity
iex> ~i<[1, 10)//2> |> Enum.count()
4
iex> ~i<[-2,-2]//1.0> |> Enum.count()
1
iex> ~i<[1,2]//0.5> |> Enum.count()
3
iex> ~i<[-2,-1]//0.75> |> Enum.count()
2
iex> 1 in ~i<[1, 10]>
true
iex> 1 in ~i<[1, 10)//2>
true
iex> 3 in ~i<(1, 10)//2>
true
# You can even do variable substitution using string interpolation syntax, since the sigil parameter is just a string
iex> min = 2
iex> 3 in ~i<(#{min + 1}, 10)//2>
false

Design Details

The decision to implement the interval as a sigil was not as straightforward as it might seem. As I mentioned before, Elixir is an extremely extensible language with superior support for meta-programming, so implementing this as a macro was my first instinct. I considered commandeering the opening brackets ( and [ to trigger the macro, or something similar with the comma , , but fortunately, I hit a brick wall with that effort. I say fortunately not only because it would have been a bad idea from a design perspective, but it certainly would have been a messier implementation and would have overly complicated it in addition to actually making the code less clear. I appreciate the usage of the sigil ~I because it makes it clear that the range that follows is not to be confused with the built-in Range.

💡

You can read more about Elixir sigils here and see their syntax reference here. Of note, you can use any of the allowed delimiter pairs that it lists to capture your sigil. I chose [ and ] so as to not conflict with the brackets used in the interval. You could also use something like ~i|[0,1)| if you prefer.

Once I decided on the usage of the Enumerable protocol, I knew I wanted to allow some way for an optional step size to be specified so that reduce could be used on the structure. Elixir sigils allow for parameters to be passed after the closing sigil, so initially, I considered passing in the step size as a parameter since zero or more ASCII letters and digits can be given as a modifier to the sigil, but this would prohibit having floats as step sizes. Another constraint to consider when using sigils is that string interpolation is only allowed within the sigil when using a lowercase sigil. Sigils start with ~ and are followed by one lowercase letter or by one or more uppercase letters, immediately followed by one of the allowed delimiter pairs. Within the context of our use case, we happen to be able to get by without having string interpolation since we use it within pre-defined parameters that are hardcoded, but the library becomes much more useful if we can have dynamically defined intervals, so this limits how the sigil is named.

Another major design decision was how to actually parse the sigil. I ultimately landed on the straightforward answer of just using a regex, but I had a decent back-and-forth with my friend Paulo from the Elixir-Nx core team regarding other options. He provided some nice proofs of concept using binary pattern matching as well as NimbleParsec, but I decided on a regex due to my familiarity, its ability to reduce the amount of code, and because I was not too concerned with performance concerns with what will typically be short patterns.

One of the last design details finalized was how to treat the step size and its effect on item membership. Paulo and I discussed whether it should support ranges where the min and max values did not necessarily have to be in the correct order (e.g. ~i<1,-1//0.5>) which would essentially imply that any iteration would start at 1 in this instance and would work towards -1 in steps of 0.5. This was discussed since it can be seen in some other implementations throughout other ecosystems. We decided that the most clear solution, as well as the solution that fit best within the spirit of the library, was to enforce that the first value specified be less than or equal to the second value, and any desire to iterate starting with the max value could be specified using a negative step size.

Implementation Details

Creation

An interval is stored as a struct with the following fields:

left - the left bracket, either [ or (.
right - the right bracket, either ] or ).
min - the lower bound of the interval. Can be :neg_infinity or any number.
max - the upper bound of the interval. Can be :infinity or any number.
step - the step size of the interval. If nil, the interval is continuous.

To define a sigil, you create a function with the name of the sigil prefixed by sigil_, so since I wish to use this sigil using ~i I define it as

def sigil_i(pattern, []) do
end

The second parameter are the options to the sigil I mentioned earlier. For now these are unused.

I parse the input to the sigil using the following regex:

^(?P<left>\[|\()\s*(?P<min>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:neg_infinity)\s*,\s*(?P<max>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:infinity)\s*(?P<right>]|\))(?:\/\/(?P<step>[-+]?(?:[1-9]+|\d+\.\d+)(?:[eE][-+]?\d+)?))?$

Using the named capture groups I perform some additional validation such as ensuring that the interval goes from the minimum value to the maximum.

Enumerable – Size / Count

The first function I need to implement for the protocol is the Enumerable.count/1 function. Logically, there are three conditions to account for. First are the instances where the size is either zero or infinity. Since Enumerable.count/1 must return a number on success, I choose to return {:error, Infinity} from Enumerable.count/1 when I wish to return :infinity. This would normally be used to return a module which can perform a reduction to compute the count, but if we just make a simple helper module

defmodule Infinity do
  @moduledoc false
  def reduce(%Exterval{}, _, _), do: {:halt, :infinity}
end

Now I can get my desired behavior. I implement these cases with the following:

def size(interval)
def size(%__MODULE__{step: nil}), do: {:error, Infinity}
def size(%__MODULE__{max: :neg_infinity}), do: 0
def size(%__MODULE__{min: :infinity}), do: 0

def size(%__MODULE__{min: min, max: max})
    when min in [:infinity, :neg_infinity] or max in [:infinity, :neg_infinity],
    do: {:error, Infinity}

Lastly I separate cases where the step size is negative and where its positive since the logic is different.

def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step < 0 do
  case {left, right} do
    {"[", "]"} ->
      abs(trunc((max - min) / step)) + 1

    {"(", "]"} ->
      abs(trunc((max - (min - step)) / step)) + 1

    {"[", ")"} ->
      abs(trunc((max + step - min) / step)) + 1

    {"(", ")"} ->
      abs(trunc((max + step - (min - step)) / step)) + 1
  end
end

def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step > 0 do
  case {left, right} do
    {"[", "]"} ->
      abs(trunc((max - min) / step)) + 1

    {"(", "]"} ->
      abs(trunc((max - (min + step)) / step)) + 1

    {"[", ")"} ->
      abs(trunc((max - step - min) / step)) + 1

    {"(", ")"} ->
      abs(trunc((max - step - (min + step)) / step)) + 1
  end
end

Enumerable – Reduce

The implementation for reduce is a great example of how Elixir's pattern matching in function headers can reduce visual complexity and even the implementation itself. First, we return :infinity if step is nil.

def reduce(%Exterval{step: nil}, acc, _fun) do
  {:done, acc}
end

Next, we again have different clauses depending on if the step is positive or negative, since that dictates which direction with respect to the interval the reduction occur.

def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
    when step > 0 do
  case left do
    "[" ->
      reduce(min, max, right, acc, fun, step)

    "(" ->
      reduce(min + step, max, right, acc, fun, step)
  end
end

def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
    when step < 0 do
  case right do
    "]" ->
      reduce(min, max, left, acc, fun, step)

    ")" ->
      reduce(min, max + step, left, acc, fun, step)
  end
end

Notice that these clauses to the reduce/3 implementation return a different reduce/6 function which is specific to our module.

Next we handle conditions where the reduction is halted or suspended:

efp reduce(_min, _max, _closing, {:halt, acc}, _fun, _step) do
  {:halted, acc}
end

defp reduce(min, max, closing, {:suspend, acc}, fun, step) do
  {:suspended, acc, &reduce(min, max, closing, &1, fun, step)}
end

Next we handle edge cases involving :infinity and :neg_infinity where we have no way to begin the reduction since we cannot move step increments away from either of these when they are our starting point:

defp reduce(:neg_infinity, _max, _closing, {:cont, acc}, _fun, step) when step > 0 do
  {:done, acc}
end

defp reduce(_min, :infinity, _closing, {:cont, acc}, _fun, step) when step < 0 do
  {:done, acc}
end

Interestingly, these are cases where the size of the intervals would be :infinity but we cannot reduce over them at all, as opposed to other infinitely sized intervals where we can begin iteration which will never end, such as ~i<[0,:infinity]//1> which would effectively be an infinite stream starting at 0 and incrementing by 1.

Next we add all of the main logic for the "typical" cases:

defp reduce(min, max, "]" = closing, {:cont, acc}, fun, step)
     when min <= max do
  reduce(min + step, max, closing, fun.(min, acc), fun, step)
end

defp reduce(min, max, ")" = closing, {:cont, acc}, fun, step)
     when min < max do
  reduce(min + step, max, closing, fun.(min, acc), fun, step)
end

defp reduce(min, max, "[" = closing, {:cont, acc}, fun, step)
     when min <= max do
  reduce(min, max + step, closing, fun.(max, acc), fun, step)
end

defp reduce(min, max, "(" = closing, {:cont, acc}, fun, step)
     when min < max do
  reduce(min, max + step, closing, fun.(max, acc), fun, step)
end

And lastly we add the final case where the condition that min < max (or min <= max depending on the brackets) is no longer met, which means the reduction is complete:

defp reduce(_, _, _, {:cont, acc}, _fun, _up) do
  {:done, acc}
end

Just like that the reduce/3 implementation is complete! As I mentioned before and notes in more detail, there are some opinions inherit to this implementation having to do with :infinity and :neg_infinity bounds as well as empty intervals, but I tried to keep the behavior consistent throughout.

Enumerable – Membership

Now on to the part that I was most interested in, which is interval membership. First, let's add support for checking membership between two intervals, which is essentially a check for one interval being a sub-interval of another.

Sub-interval must satisfy the following to be a subset:

The minimum value of the subset must belong to the superset.
The maximum value of the subset must belong to the superset.
The step size of the subset must be a multiple of the step size of the superset.

If the superset has no step size, then only the first two conditions must be satisfied.

if the superset has a step size, and the subset doesn't then membership is false.

def member?(%Exterval{step: nil} = outer, %Exterval{} = inner) do
  res = inner.max in outer && inner.min in outer
  {:ok, res}
end

def member?(%Exterval{}, %Exterval{step: nil}) do
  {:ok, false}
end

def member?(%Exterval{} = outer, %Exterval{} = inner) do
  res = inner.max in outer && inner.min in outer && :math.fmod(inner.step, outer.step) == 0
  {:ok, res}
end

Then that just leaves the main implementation for membership checks, which is basically just a case statement which changes the output depending on the brackets supplied. Additionally, if the interval contains a step then the value being checked must be a multiple of the step.

def member?(%Exterval{} = rang, value) when is_number(value) do
  res =
    if Exterval.size(rang) == 0 do
      {:ok, false}
    else
      case {rang.left, rang.min, rang.max, rang.right} do
        {_, :neg_infinity, :infinity, _} ->
          true

        {_, :neg_inf, max_val, "]"} ->
          value <= max_val

        {_, :neg_infinity, max_val, ")"} ->
          value < max_val

        {"[", min_val, :infinity, _} ->
          value >= min_val

        {"(", min_val, :infinity, _} ->
          value > min_val

        {"[", min_val, max_val, "]"} ->
          value >= min_val and value <= max_val

        {"(", min_val, max_val, "]"} ->
          value > min_val and value <= max_val

        {"[", min_val, max_val, ")"} ->
          value >= min_val and value < max_val

        {"(", min_val, max_val, ")"} ->
          value > min_val and value < max_val

        _ ->
          raise ArgumentError, "Invalid range specification"
      end
    end

  res =
    unless is_nil(rang.step) || rang.min == :neg_infinity || rang.max == :infinity do
      res && :math.fmod(value - rang.min, rang.step) == 0
    else
      res
    end

  {:ok, res}
end

Inspect

Lastly, to make the user experience a bit better, it's not too difficult to implement the Inpect Protocol to provide a cleaner output:

defimpl Inspect do
  import Inspect.Algebra
  import Kernel, except: [inspect: 2]

  def inspect(%Exterval{left: left, right: right, min: min, max: max, step: nil}, opts) do
    concat([string(left), to_doc(min, opts), ",", to_doc(max, opts), string(right)])
  end

  def inspect(%Exterval{left: left, right: right, min: min, max: max, step: step}, opts) do
    concat([
      string(left),
      to_doc(min, opts),
      ",",
      to_doc(max, opts),
      string(right),
      "//",
      to_doc(step, opts)
    ])
  end
end

Future Plans

Currently I am weighing the options between adding more functionality to the library or keeping it as thin as it currently is. The main additions could be more robust set operations on the intervals, but I currently do not have a need for it so it will probably not make it into the library in the near future.

For now, I hope this provided a detailed look at the process of identifying a problem, and subsequently designing and implementing the solution. I found this to be an elegant solution to the problem, but as I mentioned it was not a straight-line path. I would be interested to hear about any other solutions people have seen!

Sign up for The Stack Canary

If you enjoyed reading, let me know!

Email sent! Check your inbox to complete your signup.

No spam. Unsubscribe anytime.

From Python to Elixir Machine Learning

Sun, 19 Apr 2026 04:26:32 +0000

As Elixir's Machine Learning (ML) ecosystem grows, many Elixir enthusiasts who wish to adopt the new machine learning libraries in their projects are stuck at a crossroads of wanting to move away from their existing ML stack (typically Python) while not having a clear path of how to do so. I would like to take some time to talk to WHY I believe now is a good time to start porting over Machine Learning code into Elixir, and HOW I went about doing just this for two libraries I wrote: EXGBoost (from Python XGBoost) and Mockingjay (from Python Hummingbird).

Why is Python not Sufficient?

There's a common saying in programming languages that no language is perfect, but that different languages are suited for different jobs. Languages such as C, Rust, and now even Zig are known for their targeting systems development, while languages such as C++, C#, and Java are more commonly used for application development, and obviously there are the web languages such as JavaScript/TypeScript, PHP, Ruby (on Rails), and more. There are gradations to these rules of course, but more often than not there are good reasons that languages tend to exist within the confines of particular use cases.

Languages such as Elixir and Go tend to be used in large distributed systems because they place an emphasis on having great support for common concurrency patterns, which can come at the cost of supporting other domains. Go, for example, has barely (if any?) support for machine learning libraries, but it's also not trying to cater to that as a target domain. For a long time, the same could have been said about Elixir, but over the past two or so years, there has been a massive concerted push from the Elixir community to not only have support for machine learning, but to push the envelope with the maintaining state of the art libraries that are beginning to compete with the other dominant machine learning languages - namely Python.

Python has long been the gold standard in the realm of machine learning. The breadth of libraries and the low entry barrier makes Python a great language to work with, but it does create a bit of a bottleneck. Any application that wishes to integrate machine learning has historically had only a couple of options: have a Python component or reach into the underlying libraries that power much of the Python libraries directly. Despite all the good parts of Python I mentioned before, speed and support for concurrency are not on that list. Elixir-Nx is striving to give another option - an option that can take advantage of the native distributed support that Elixir and the BEAM VM have to offer. Nx's Nx.Serving construct is a drop-in solution for serving distributed machine-learning models.

How to Proceed

Sean Moriarity, the co-creator of Nx, creator of Axon, and author of Machine Learning in Elixir, has talked many times about how the initial creation of Nx and Axon involved hours upon hours of reading source code from reference implementations of libraries in Python and C++, namely the TensorFlow source code. While I was writing EXGBoost and Mockingjay, much of my time, especially towards the beginning, was spent referencing the Python and C++ implementations of the original libraries. This builds a great fundamental understanding of the libraries as well as taught me how to identify patterns in Python and C++ and identify the Elixir pattern that could express the same ideas. This skill is invaluable, and the better I got at it the faster I could write. Below is a summary and key takeaways from my process of porting Python / PyTorch to Elixir / Nx.

Workflow Overview

Before I get to the examples from the code bases, I would like to briefly explain the high-level cyclical workflow I established while working on this effort, and what I would recommend to anyone pursuing a similar endeavor.

Understand the Macro System

Much like how there's a common strategy to reading comprehension which involves reading through the entire document once to get a high-level understanding and then doing subsequent shorter reads to gain more in-depth understanding with the added context of the entire piece, you can consider doing the same when reading code. My first step was to follow the logical flow from the call of hummingbird.ml.convert to the final result. You can use tools such as function tracers and callgraph generators to accelerate this part of the process, or manually trace depending on the extent of the codebase. I felt in my case that it was manageable to trace myself.

Read the Documentation

Once you have a general understanding of the flow and process of the original system, you can start referring to the documentation for some additional context. In my case, this lead me to the academic paper Taming Model Serving Complexity, Performance and Cost: A Compilation to Tensor Computations Approach, which was the underlying ground work and basis for their implementation. I could write a whole other blog post about the process of transcribing algorithms and code from academic papers and pseudocode, but for now just know that these are some of the most important pieces you can refer to while re-implementing or porting over a piece of source code.

Read the Source Code in Detail

This is the point in which you want to disambiguate the higher-level ideas from the first step and really gain a fine, high-resolution understanding of what is happening. There might even be some points in which you need to deconflict the source code with its documentation and/or paper reference. In those cases, the source code almost always wins, and if not, then you likely have a bug report you can file. If you see things you don't fully understand, you don't necessarily need to address it here, but you should make note of it and keep it in mind while working in case new details help resolve it.

Implement the New Code

At this point, you should feel comfortable enough to start implementing the code. I found this to be a very iterative process, meaning I would think I had a grasp on something, then would start working on implementing it, then would realize I did not understand it as well as I had thought and would work my way back through the previous steps.

Example

💡

In case you would like to follow along going forward, the Python code I will be referencing is the Microsoft Hummingbird source code (specifically their implementation of Decision Tree Compilation), and the Elixir code is from the Mockingjay source code.

Class vs. Behaviour

As a result of the reading and comprehension I did of the Hummingbird code base, I realized fairly early on that my library was going to have some key differences. One of the main reasons for these differences was the fact that the Hummingbird code base was built as a retroactive library that needed to cater to existing APIs that existed throughout the Python ecosystem. They chose to only add support for converting decision trees according to the SKLearn API. I, conversely, chose to write Mockingjay in such a way that it would be incumbent upon the authors of decision tree libraries to implement a protocol to interface with Mockingjay's convert function. This difference meant that I could establish a Mockingjay.Tree data structure that I would use throughout my library, rather than having to reconstruct tree features from various other APIs as is done in Hummingbird.

Next, Hummingbird approaches its pipeline in a very-object oriented manner, as makes sense when using Python. Here' we are focusing on the implementation of the three decision tree conversion strategies: GEMM, Tree Traversal, and PErfect Tree Traversal. It implements the following base class for tree conversions as well as PyTorch networks.

💡

Since they're inheriting from torch.nn.model they must also implement the forward method.

class AbstracTreeImpl(PhysicalOperator):
    """
    Abstract class definig the basic structure for tree-base models.
    """

    def __init__(self, logical_operator, **kwargs):
        super().__init__(logical_operator, **kwargs)

    @abstractmethod
    def aggregation(self, x):
        """
        Method defining the aggregation operation to execute after the model is evaluated.

        Args:
            x: An input tensor

        Returns:
            The tensor result of the aggregation
        """
        pass

class AbstractPyTorchTreeImpl(AbstracTreeImpl, torch.nn.Module):
    """
    Abstract class definig the basic structure for tree-base models implemented in PyTorch.
    """

    def __init__(
        self, logical_operator, tree_parameters, n_features, classes, n_classes, decision_cond="<=", extra_config={}, **kwargs
    ):
        """
        Args:
            tree_parameters: The parameters defining the tree structure
            n_features: The number of features input to the model
            classes: The classes used for classification. None if implementing a regression model
            n_classes: The total number of used classes
            decision_cond: The condition of the decision nodes in the x <cond> threshold order. Default '<='. Values can be <=, <, >=, >
        """
        super(AbstractPyTorchTreeImpl, self).__init__(logical_operator, **kwargs)

They then proceed to inherit from these base classes and have different classes for each of the three decision tree strategies as well as their gradient-boosted counterparts, leaving them with three classes for each strategies (1 base class per strategy, 1 for ensemble implementations, and 1 for normal implementations) and nine total classes.

I chose to approach this using a behaviour

defmodule Mockingjay.Strategy do
  @moduledoc false
  @type t :: Nx.Container.t()

  @callback init(data :: any(), opts :: Keyword.t()) :: term()
  @callback forward(x :: Nx.Container.t(), term()) :: Nx.Tensor.t()
  ...
end

init will perform setup functionality depending on the strategy and return the parameters that will need to be passed to forward later on. This allows for a very simple top-level api. The whole top-level mockingjay.ex file can fit here:

def convert(data, opts \\ []) do
    {strategy, opts} = Keyword.pop(opts, :strategy, :auto)

    strategy =
      case strategy do
        :gemm ->
          Mockingjay.Strategies.GEMM

        :tree_traversal ->
          Mockingjay.Strategies.TreeTraversal

        :perfect_tree_traversal ->
          Mockingjay.Strategies.PerfectTreeTraversal

        :auto ->
          Mockingjay.Strategy.get_strategy(data, opts)

        _ ->
          raise ArgumentError,
                "strategy must be one of :gemm, :tree_traversal, :perfect_tree_traversal, or :auto"
      end

    {post_transform, opts} = Keyword.pop(opts, :post_transform, nil)
    state = strategy.init(data, opts)

    fn data ->
      result = strategy.forward(data, state)
      {_, n_trees, n_classes} = Nx.shape(result)

      result
      |> aggregate(n_trees, n_classes)
      |> post_transform(post_transform, n_classes)
    end
  end

As you can see, the use of a behaviour here allows a strategy-agnostic approach to generating a prediction pipeline. In the object-oriented implementation, each class implements init, forward, aggregate, and post_transform. We get the same result from a functional pipeline approach, where each step generates the needed information as input parameters for the next step. So, instead of storing intermediate results as object properties or values in an object's __dict__, we just pass them along in the pipeline. I would argue this creates a much simpler and easier to follow implementation (but I am also quite biased).

PyTorch to Nx

For these examples, we will be looking at porting the implementations of the forward function for the three conversion strategies from Python to Nx.

GEMM

Next, let's look at the forward function implementation for GEMM, one of the three conversion strategies. In Hummingbird, they implemented the forward step in the base class for each strategy. So given three GEMM classes with the signatures of GEMMTreeImpl(AbstractPyTorchTreeImpl), GEMMDecisionTreeImpl(GEMMTreeImpl), and GEMMGBDTImpl(GEMMTreeImpl), the forward function is defined in the GEMMTreeImpl class, since both ensemble and non-ensemble decision tree models share the same forward step.

def forward(self, x):
      x = x.t()
      x = self.decision_cond(torch.mm(self.weight_1, x), self.bias_1)
      x = x.view(self.n_trees, self.hidden_one_size, -1)
      x = x.float()

      x = torch.matmul(self.weight_2, x)

      x = x.view(self.n_trees * self.hidden_two_size, -1) == self.bias_2
      x = x.view(self.n_trees, self.hidden_two_size, -1)
      if self.tree_op_precision_dtype == "float32":
          x = x.float()
      else:
          x = x.double()

      x = torch.matmul(self.weight_3, x)
      x = x.view(self.n_trees, self.hidden_three_size, -1)

Now, here is the Nx implementation:

@impl true
  deftransform forward(x, {arg, opts}) do
    opts =
      Keyword.validate!(opts, [
        :condition,
        :n_trees,
        :n_classes,
        :max_decision_nodes,
        :max_leaf_nodes,
        :n_weak_learner_classes,
        :custom_forward
      ])

    _forward(x, arg, opts)
  end

  defnp _forward(x, arg, opts \\ []) do
    %{mat_A: mat_A, mat_B: mat_B, mat_C: mat_C, mat_D: mat_D, mat_E: mat_E} = arg

    condition = opts[:condition]
    n_trees = opts[:n_trees]
    n_classes = opts[:n_classes]
    max_decision_nodes = opts[:max_decision_nodes]
    max_leaf_nodes = opts[:max_leaf_nodes]
    n_weak_learner_classes = opts[:n_weak_learner_classes]

    mat_A
    |> Nx.dot([1], x, [1])
    |> condition.(mat_B)
    |> Nx.reshape({n_trees, max_decision_nodes, :auto})
    |> then(&Nx.dot(mat_C, [2], [0], &1, [1], [0]))
    |> Nx.reshape({n_trees * max_leaf_nodes, :auto})
    |> Nx.equal(mat_D)
    |> Nx.reshape({n_trees, max_leaf_nodes, :auto})
    |> then(&Nx.dot(mat_E, [2], [0], &1, [1], [0]))
    |> Nx.reshape({n_trees, n_weak_learner_classes, :auto})
    |> Nx.transpose()
    |> Nx.reshape({:auto, n_trees, n_classes})
  end

Do not be distracted by the length of this code snippet, as much of the lines are taken up by validating arguments. Let's look at a more stripped-down version without that:

@impl true
  deftransform forward(x, {arg, opts}) do
    _forward(x, arg, opts)
  end

  defnp _forward(x, arg, opts \\ []) do
    mat_A
    |> Nx.dot([1], x, [1])
    |> condition.(mat_B)
    |> Nx.reshape({n_trees, max_decision_nodes, :auto})
    |> then(&Nx.dot(mat_C, [2], [0], &1, [1], [0]))
    |> Nx.reshape({n_trees * max_leaf_nodes, :auto})
    |> Nx.equal(mat_D)
    |> Nx.reshape({n_trees, max_leaf_nodes, :auto})
    |> then(&Nx.dot(mat_E, [2], [0], &1, [1], [0]))
    |> Nx.reshape({n_trees, n_weak_learner_classes, :auto})
    |> Nx.transpose()
    |> Nx.reshape({:auto, n_trees, n_classes})
  end

Let's take a look at some obvious difference:

The Nx code does not have to transpose in the first step since Nx.dot/4 allows you to specify the contracting axes.
You can use Nx.dot/6 to get the same behavior as torch.matmul
- torch.matmul does a lot of wizardry with broadcasting to make this instance work
We use functions such as Nx.equal to fit into the pipeline rather than using the == oeprator (which would work outside of a pipeline)
torch.view is equivalent to Nx.reshape
Nx uses the :auto atom to where torch uses -1 to reference infering the sie of an axis

Outside of these differences, the code translates fairly easily. Let's take a look at a bit of a more complex instance.

Tree Traversal

Here is the Python implementation:

def _expand_indexes(self, batch_size):
        indexes = self.nodes_offset
        indexes = indexes.expand(batch_size, self.num_trees)
        return indexes.reshape(-1)

def forward(self, x):
        indexes = self.nodes_offset
        indexes = indexes.expand(batch_size, self.num_trees).reshape(-1)

        for _ in range(self.max_tree_depth):
            tree_nodes = indexes
            feature_nodes = torch.index_select(self.features, 0, tree_nodes).view(-1, self.num_trees)
            feature_values = torch.gather(x, 1, feature_nodes)

            thresholds = torch.index_select(self.thresholds, 0, indexes).view(-1, self.num_trees)
            lefts = torch.index_select(self.lefts, 0, indexes).view(-1, self.num_trees)
            rights = torch.index_select(self.rights, 0, indexes).view(-1, self.num_trees)

            indexes = torch.where(self.decision_cond(feature_values, thresholds), lefts, rights).long()
            indexes = indexes + self.nodes_offset
            indexes = indexes.view(-1)

        output = torch.index_select(self.values, 0, indexes).view(-1, self.num_trees, self.n_classes)

And here is the Nx implementation:

defn _forward(x, features, lefts, rights, thresholds, nodes_offset, values, opts \\ []) do
    max_tree_depth = opts[:max_tree_depth]
    num_trees = opts[:num_trees]
    n_classes = opts[:n_classes]
    condition = opts[:condition]
    unroll = opts[:unroll]

    batch_size = Nx.axis_size(x, 0)

    indices =
      nodes_offset
      |> Nx.broadcast({batch_size, num_trees})
      |> Nx.reshape({:auto})

    {indices, _} =
      while {tree_nodes = indices, {features, lefts, rights, thresholds, nodes_offset, x}},
            _ <- 1..max_tree_depth,
            unroll: unroll do
        feature_nodes = Nx.take(features, tree_nodes) |> Nx.reshape({:auto, num_trees})
        feature_values = Nx.take_along_axis(x, feature_nodes, axis: 1)
        local_thresholds = Nx.take(thresholds, tree_nodes) |> Nx.reshape({:auto, num_trees})
        local_lefts = Nx.take(lefts, tree_nodes) |> Nx.reshape({:auto, num_trees})
        local_rights = Nx.take(rights, tree_nodes) |> Nx.reshape({:auto, num_trees})

        result =
          Nx.select(
            condition.(feature_values, local_thresholds),
            local_lefts,
            local_rights
          )
          |> Nx.add(nodes_offset)
          |> Nx.reshape({:auto})

        {result, {features, lefts, rights, thresholds, nodes_offset, x}}
      end

    values
    |> Nx.take(indices)
    |> Nx.reshape({:auto, num_trees, n_classes})
  end

Here there are some much more striking differences, namely the use of Nx's while expression compared to a for loop in Python. We use while in this case since it can achieve the same purpose as the Python for loop and it is supported by Nx within a defn expression. Otherwise, we might have to perform some of the calculations within a deftransform, as we will see in the next example. Another obvious difference is that in the Nx implementation, we have to pass the required variables around throughout these operation, whereas Python can use stored class attributes.

Still, the conversion is quite straightforward. I hope you are beginning to see that this is not an impossible effort, and can be accomplished given you have a firm understanding of the source material.

Perfect Tree Traversal

Lastly, let's look at the last conversion strategy. Yet again, this conversion is even slightly more complex, but hopefully seeing this example will help you in your case:

def forward(self, x):
        prev_indices = (self.decision_cond(torch.index_select(x, 1, self.root_nodes), self.root_biases)).long()
        prev_indices = prev_indices + self.tree_indices
        prev_indices = prev_indices.view(-1)

        factor = 2
        for nodes, biases in zip(self.nodes, self.biases):
            gather_indices = torch.index_select(nodes, 0, prev_indices).view(-1, self.num_trees)
            features = torch.gather(x, 1, gather_indices).view(-1)
            prev_indices = (
                factor * prev_indices + self.decision_cond(features, torch.index_select(biases, 0, prev_indices)).long()
            )

        output = torch.index_select(self.leaf_nodes, 0, prev_indices).view(-1, self.num_trees, self.n_classes)

And the Elixir implementation:

defnp _forward(
          x,
          root_features,
          root_thresholds,
          features,
          thresholds,
          values,
          indices,
          opts \\ []
        ) do
    prev_indices =
      x
      |> Nx.take(root_features, axis: 1)
      |> opts[:condition].(root_thresholds)
      |> Nx.add(indices)
      |> Nx.reshape({:auto})
      |> forward_reduce_features(x, features, thresholds, opts)

    Nx.take(values, prev_indices)
    |> Nx.reshape({:auto, opts[:num_trees], opts[:n_classes]})
  end

  deftransformp forward_reduce_features(prev_indices, x, features, thresholds, opts \\ []) do
    Enum.zip_reduce(
      Tuple.to_list(features),
      Tuple.to_list(thresholds),
      prev_indices,
      fn nodes, biases, acc ->
        gather_indices = nodes |> Nx.take(acc) |> Nx.reshape({:auto, opts[:num_trees]})
        features = Nx.take_along_axis(x, gather_indices, axis: 1) |> Nx.reshape({:auto})

        acc
        |> Nx.multiply(@factor)
        |> Nx.add(opts[:condition].(features, Nx.take(biases, acc)))
      end
    )
  end

You can see that in this case, we have a function defined in a deftransform within our forward pipeline. Why is this so? Well, when writing definitions within defn you forfeit the use of the default Elixir kernel for the Nx.Kernel module. If you want full access to all of the normal Elixir modules, you need to use a deftransform. We needed to use Enum.zip_reduce in this instance (rather than Nx's while like before) since the features and thresholds lists are not of uniform shape. Their shape represents the length of a given depth of a binary tree, so they will be a nested list of lengths [1,2,4,8...]. This is an optimization as opposed to normal TreeTraversal, but required a bit of a different approach as opposed to the Python implementation which took advantage of torch.nn.ParameterList to build out the same lists. You might also notice the use of Tuple.to_list on lines 25 and 26. This was required since we needed features and thresholds to be stored in Nx.container's when passed into the deftransform, and Tuple implements the Nx.Container protocol, while lists do not. Even still, given that knowledge of the intricacies of defn and deftransform, the final ported solution is very similar to the reference solution.

Conclusion

In this post, I tried to accomplish several things at once, and perhaps that lead to a cluttered article, but I felt the need to address all of these points at once. I do not mean to suggest that Machine Learning has no place in Python or that Python will not continue to be the most dominant player in Machine Learning, but that I think some healthy competition is a good thing, and that perhaps Python does have some shortcomings that might give other languages valid reasons to coexist in the space.

Next, I wanted to address some specifics as to what Elixir has to offer to the machine learning space. I think it is uniquely positioned to be quite competitive considering the large community push to support more and more libraries, as well as the large application development community that can benefit from an in-house solution.

Lastly, I wanted to share some practical tips for those looking to move on from Python to Elixir, but feeling somewhat helpless in the process. I think that Sean Moriarity's book that I mentioned at the beginning of this article is an invaluable resource and great step in the education of machine learning for Elixir developers, but it can nonetheless feel daunting to seemingly throw out existing working solutions for new-fangled, perhaps not as well respected solutions. I hope I showed how anybody can approach this problem, and any existing Elixir developer can be a machine learning developer going forward. The ground work has been laid, and the tools are available. Thank you for reading (especially if you made it to the end)!

Cure, Four Releases Deep: From FSMs to Furniture

Sat, 18 Apr 2026 11:56:20 +0000

There is a particular kind of intellectual cowardice endemic to programming-language design, which consists of writing a feature that almost works, convincing oneself that it works enough, and then moving on to the next item on the roadmap before the cracks start to show. I have indulged in this pastime for years. Cure, for the first dozen or so releases, indulged in it with me. The last four tags—v0.16.0, v0.17.0 with its two patch siblings, v0.18.0, and v0.19.0 followed by v0.19.1—are my belated attempt to stop. Each of them picks one area of the language that had previously been garnished with a thin coat of varnish and strips it down to bare wood.

What follows is the tour. I will try to explain both what changed and why it had to. If the tone reads as exasperated in places, that is because I am mostly exasperated at the person who wrote the earlier versions, who was, as luck would have it, me.

v0.16.0: the turnstile that could not contain itself

For fifteen releases Cure had finite state machines as a language primitive, and for fifteen releases the canonical FSM example—the turnstile—looked like this: four lines of beautifully declarative transition graph in turnstile.cure, followed by a hundred and twenty lines of Elixir GenServer plumbing in a wrapper module that had nothing to do with turnstiles and everything to do with bridging the gap between a gen_statem process and the rest of the application. The FSM definition said what you meant. The wrapper said what the runtime demanded. You read one of them to understand the domain and the other to make the program run. I cannot in good conscience call that a first-class primitive. It was more of a first-class primitive’s slightly embarrassed cousin, the one who turns up at family dinners and talks about his crypto portfolio.

Finitomata, my Elixir library for finite automata that has been in production for years, got this right on the first attempt by insisting that the graph and the transition handler belong in the same module and behind the same abstraction. v0.16.0 borrows that insight wholesale, and then some. The turnstile now reads:

fsm Turnstile with Integer
  Locked   --coin-->  Unlocked
  Unlocked --push-->  Locked
  Unlocked --coin-->  Unlocked
  Locked   --push-->  Locked

  on_transition
    (:locked, :coin, _payload, data)   -> %[:ok, :unlocked, data + 1]
    (:unlocked, :push, _payload, data) -> %[:ok, :locked,   data]
    (:unlocked, :coin, _payload, data) -> %[:ok, :unlocked, data + 1]
    (_, _, _, data)                    -> %[:ok, :__same__, data]

The four transition lines on top are the graph you always wrote. The on_transition block underneath takes pattern-matching clauses of shape (current_state, event, event_payload, state_payload) and returns either %[:ok, next_state, new_payload] or %[:error, reason]. When the compiler sees an on_transition block it silently changes mode: instead of generating raw gen_statem Erlang abstract forms, it produces a GenServer-based Elixir module with an embedded transition table, pre-dispatch validation, compiled do_on_transition/4 clauses, and the optional lifecycle hooks on_enter, on_exit, on_failure, and on_timer. If you do not write on_transition, the FSM compiles the way it always did, through Erlang abstract forms to gen_statem. No flags, no configuration—the compiler decides based on whether you asked for the new thing.

Finitomata also contributed two conventions that I refused to live without once I tried them. A hard event, written with a trailing exclamation mark, must be the sole outgoing event of its source state, and fires automatically on arrival, so initialisation chains and guaranteed progressions stop requiring baby-sitting from the caller. A soft event, written with a trailing question mark, silently leaves the state alone if it cannot transition, instead of logging a warning and invoking on_failure. Health checks, optimistic polling, any kind of “try it, and if it does not apply this tick, never mind”—all become one-liners. The lexer grew a small counter that tracks whether it is currently inside an arrow --...-->, and only inside an arrow does it absorb the trailing ! or ? into the identifier. Everywhere else, ! stays reserved for effect annotations and ? for predicates and holes. The verifier enforces the hard-event rule; the compiler uses {:continue, ...} from the GenServer return tuple to fire hard events without yielding.

The turnstile’s wrapper is now fifty lines rather than one-hundred-and-twenty, and the fifty it retains are real application logic: counting how many people went through today. The twelve turnstile tests pass without modification, which is the only part of this release I will allow myself to take unironic pride in.

v0.17.0: toward Idris, at last

For seven versions Cure had been marketing itself as a dependently-typed language while behaving, in every practical respect, like a vaguely refinement-typed one. You could write Vector(T, n) in a signature. The checker would nod politely. It would not, however, verify that the length you promised was the length you produced, because the machinery to do so did not exist. v0.17.0 stops nodding.

Three type shapes that should have been there from the start arrive simultaneously. Sigma types pair a value with a type that depends on it; Sigma(n: Nat, Vector(T, n)) is “a natural number together with a vector of exactly that length”. Pi types let a function’s return type depend on its arguments, so the canonical example

fn append(xs: Vector(T, m), ys: Vector(T, n)) -> Vector(T, m + n)

now actually means what it reads as: at each call site the checker substitutes the concrete arguments into the return type, normalises with a tiny terminating reducer called Cure.Types.Reduce, and resolves the result. Closed type-level arithmetic never troubles the SMT solver—Vector(T, 2 + 3) is syntactically the same type as Vector(T, 5) before Z3 is even woken up. Equality types arrive with the single constructor refl(x) : Eq(T, x, x) and the single eliminator rewrite eq in expr, both erased at codegen to the atom :cure_refl. The standard library gains a Std.Equal module exposing refl, sym, trans, and cong, which is to say the usual suspects.

The second big arrival is implicit arguments with proper first-order unification. Write fn id({T}, x: T) -> T = x, call id(42), and at the call site an occurs-check-equipped unifier resolves T from the explicit argument. When resolution fails—and it does, more often than one would like—the pipeline emits a :unification_trace event carrying the argument, the position, and the substitution that killed it. The LSP renders the trace in hover; the CLI prints it in error output. No more staring at “cannot infer T” the way one stares at a Rorschach blot. Implicit parameters are erased at codegen, so they cost exactly nothing at runtime.

What ties the dependent-type stack together, in practical terms, is hole-driven development. Write

fn safe_head(xs: List(T)) -> T = ?body

and compile, and the compiler does not merely tell you “?body is missing”: it tells you ?body : T in scope: xs : List(T). Anonymous holes (??) get numbered ?_1, ?_2, in source order. Every encountered hole emits a :hole_goal event with the goal type and the local context, and the REPL’s new :holes meta-command lists everything recorded during the last evaluation. This is how Idris programmers write programs. It is now, finally, how Cure programmers can.

Totality joins the party, gently. Cure.Types.Totality classifies every function as :total, :partial, or :unknown, combining pattern-coverage (via Cure.Types.PatternChecker) with a structural-recursion check. By default totality is report-only; decorating a function with #[total] promotes the classification to a hard error if it fails:

#[total]
fn factorial(n: Int) -> Int
  | 0 -> 1
  | n -> n * factorial(n - 1)

Only direct structural recursion is caught in v0.17.0. Mutual recursion has to wait for v0.19.0.

Refinement types grow a backbone. Path-sensitive refinement flows along if and match guards, so inside

if x != 0 then 100 / x else 0

the then branch sees x : {x: Int | x != 0}, and the division is safe without an explicit refinement annotation. The new Std.Refine module ships drop-in refinements one does not wish to keep rewriting: NonZero, Positive, Negative, NonNegative, Percentage, Probability, plus predicate helpers. These are the kinds of tiny conveniences that, had they been present in v0.10.0, would have saved everyone a small amount of grief each day for a year.

The tooling catches up in the same release. Cure.REPL is a complete rewrite: multi-line input (terminated by a blank line or ;;), the meta-commands :t, :doc, :effects, :load, :reload, :use, :holes, :env, :reset, :fmt, :help, :quit, and command history persisted to ~/.cure_history. A watch mode, invoked as cure watch lib/ --action check, recompiles, type-checks, or tests on every save with a 200 ms debounce, and works without :file_system thanks to a small polling fallback. The LSP server, Cure.LSP.Server, acquires inlay hints, signature help, formatting via a round-trip-tested source-preserving printer, prepare-rename and rename, code lenses, semantic tokens, and workspace symbols—seven capabilities in one commit, which is the sort of thing that happens when one has been putting off the LSP for three releases in a row. Doctests arrive, too: any ## or ### docstring immediately above a function whose body contains cure> / => pairs is executed by cure test --doctests.

The patch releases tidied up. v0.17.1 stopped the stdlib preloader from polluting the code path, fixed two LSP crashes around inlay hints and semantic tokens, and retargeted the vicure grammar at v0.17.0 syntax. v0.17.2 re-enabled LSP formatting once the new source-preserving formatter proved it would not eat users’ comments for breakfast—a promise the old AST-based formatter had never quite been willing to make.

v0.18.0: pattern matching grows up

Here is a question I did not want to answer for a long time. If one writes

match value
  %{list: [h | t]} -> handle(h, t)
  _                -> default

and the subject value is a map whose list field is not present, what does the old compiler do? The correct answer is “fails the match and falls through to the wildcard”. The answer the old compiler produced is “succeeds, because the map pattern is miscompiled into Erlang’s construction form (K => V) rather than the match form (K := V), so it accepts any subject, and h and t are bound to whatever happens to be lying around, possibly nothing sensible”. There were tests pointing straight at this. What there was not, was a pattern compiler willing to recurse.

v0.18.0 replaces the pattern layer wholesale. The headline is that this now actually works:

match value
  %[_, %{list: [head | tail]}, _]          -> handle(head, tail)
  Person{name, address: Address{city}}     -> greet(name, city)
  [Ok(v) | _]                              -> v
  _                                        -> default

Every sub-pattern compiles as a real pattern. Nested map patterns use exact matching and actually require the key. Record patterns check the struct tag and each field. The cons pattern binds v through the Ok(...) constructor. The wildcard mops up the rest. There is no magic here; merely a module (Cure.Compiler.PatternCompiler) that does the one thing its name promises, and which every other codegen path—compile_multi_clause_function, compile_pattern_match, compile_assignment, compile_comprehension, compile_catch_and_finally—now routes through rather than reinventing pattern handling locally. The original sin, I discovered after far too long, was treating patterns and expressions uniformly because their AST nodes have the same shape. Patterns are not evaluated; they are matched against a subject with bindings as side effects. Nothing good comes of pretending otherwise.

Alongside the compiler rewrite, the type checker’s bind_pattern_vars/3 was rewritten to thread the scrutinee type through every pattern shape, so nested pattern variables now carry the tightest type the structure allows rather than the old :any. Tuple patterns zip element-wise. List and cons patterns bind head and tail at the correct element and list types respectively. Map patterns look up each key through the scrutinee’s schema when it is a known record, or through the map value type otherwise. Record patterns resolve every field against the schema registered at rec time, and unknown fields emit a warning under the new error code E021. The practical consequence is that match p { Person{age: a} when a > 17 -> ... } finally knows a : Int, so subsequent refinement fires against the right type.

Field punning arrives in both directions: Point{x, y} desugars to Point{x: x, y: y} in pattern position and in construction position. It is purely a parser change, and if you have ever found yourself writing Person{name: name, age: age, email: email, ...} seven fields deep you will be glad of it. Repeated occurrences of the same variable in one pattern now do what a reasonable person would expect: the first occurrence binds, later ones lower to equality guards. %[x, x] matches exactly the pairs whose slots are equal, which I should have supported five releases ago.

The pin operator (^name) lands behind --experimental-pin as the official escape hatch for “compare against this already-bound value”:

let target = get_tag()
match event.tag
  ^target -> :hit
  _       -> :miss

Internally it lowers to a fresh variable plus a V_fresh =:= V_target guard—the same transformation one used to write by hand. Zero runtime cost, considerably fewer characters on screen. It is promoted to default in v0.19.0.

Exhaustiveness checking grows a second pass. The old flat classifier (:wildcard | :empty_list | :cons | {:literal, ...} | {:constructor, ...} | {:tuple, n}) is still there, because it is fast and covers the common case; a new Maranget-style column walker, Cure.Types.PatternChecker.check_nested/2, now descends into tuple scrutinees whose element types are enumerable and emits concrete, source-shaped witnesses for missing patterns:

Warning: match expression has nested non-exhaustive cases (E025)
  missing: %[Error(_), _]

Five new error codes—E021 through E025—land in Cure.Compiler.Errors for unknown record fields in patterns, record field type mismatches, non-literal map-pattern keys, unbound pin variables, and non-exhaustive nested matches. cure explain E0xx works for every one.

There is a caveat worth naming explicitly. Map patterns that used to silently succeed against arbitrary subjects no longer do so. If your code relied on the old broken behaviour—for example, a map pattern whose key was not in the subject but which the compiler accepted anyway—you will now see either a runtime badmatch or a compile-time non-exhaustive warning. Fix the pattern. The compiler is right and you are wrong; I say this with the humility of the person who wrote the earlier compiler.

v0.19.0: the furniture arrives

After the pattern engine settled, a queue of previously-slated features was ready to land. v0.19.0 is called “Bring the Furniture” because every item on it is the kind of thing one should not have to talk about in a release announcement; one should merely discover, pleasantly, that it is there.

Propositions acquire a syntactic home. A proof container is a module-shaped block whose bindings must return Eq(...) or a refinement witness:

proof Std.Proof
  fn plus_zero(_n: Int)       -> Eq(Int, n, n)        = :cure_refl
  fn append_nil(_xs: List(T)) -> Eq(List(T), xs, xs)  = :cure_refl

The compiler enforces the shape under E026. Runtime values are plain :cure_refl atoms; the checker does the interesting work; the resulting BEAM module is ordinary-looking code you can load alongside any other. assert_type expr : T arrives as a zero-cost compile-time assertion: the checker verifies the type, the codegen strips the wrapper, and mismatches surface as E027. fn doubled(n: Int) -> Int = assert_type n * 2 : Int does what it looks like.

Records gain field defaults, which is the sort of feature whose absence one does not forgive oneself:

rec Person
  name: String = "Anonymous"
  age: Int = 0
  active: Bool = true

Omitted fields fall back to the declared defaults at construction time; any caller-supplied value always wins. Type mismatches between the default and the declared field type are caught as E028. In the same spirit, @derive(Show, Eq, Ord) is finally wired end-to-end. The Cure.Types.Derive module had been sitting on disk since v0.12.0 without a codegen path; v0.19.0 plumbs it through so that decorating

@derive(Show, Eq, Ord)
rec Point
  x: Int
  y: Int

synthesises plain show/1, eq/2, and compare/2 exports. The accompanying example, examples/derived_show.cure, constructs two Point values, asks eq(p, q), and returns one if they compare equal—which, naturally, they do.

Property-based testing shows up via two cooperating stdlib modules. Std.Gen ships tiny stateless generators (int_in, bool, list_int, list_of_int, one_of, constant) backed by :rand. Std.Test.forall(gen, property, runs) runs the property against samples and returns :ok or raises :property_failed:

mod Laws
  use Std.Gen
  use Std.Test

  fn test_plus_zero() -> Atom =
    forall(fn(_) -> int_in(-100, 100), fn(n) -> n + 0 == n, 100)

Shrinking, histograms, and stateful generators are future work; this is deliberately the minimum viable property tester, and it is enough to catch the kind of bug one always catches with a property tester in the first week.

Std.Iter is the matching minimal lazy iterator protocol. Constructors empty/0, from_list/1, and range/2; consumers fold/3, take/2, to_list/1. take/2 stops before materialising the tail, so unbounded ranges are safe as long as you only peek at a prefix:

use Std.Iter

fn sum_range(n: Int) -> Int =
  let it = range(1, n)
  let add = fn(x) -> fn(acc) -> acc + x
  fold(it, 0, add)

The package-registry groundwork lands as a version parser and a dependency resolver. Cure.Project.Version handles SemVer plus compound constraints (~>, >=, <=, <, >, ==, combined with and), and accepts MAJOR.MINOR as shorthand for MAJOR.MINOR.0. Cure.Project.Resolver.resolve/2 is a deterministic backtracking resolver over a local registry that picks the newest compatible version and surfaces conflicts as E030. A remote index service, signing, and Hex.pm cross-publishing are what v0.20.0 is for.

Totality catches up with multi-function cycles. Cure.Types.Totality.check_mutual/1 runs Tarjan’s SCC algorithm on the module call graph, then classifies each non-trivial strongly-connected component as either :ok (structural decrease proven on at least one path) or :suspect (E029). It is not a full termination checker—nothing running at compile time on the BEAM is going to be—but it is strictly more than nothing, and it catches the obvious cases.

One last syntactic nicety: multi-head cons patterns,

match xs
  [a, b, c | rest] -> a + b + c
  _                -> 0

desugared by the parser into right-associated cons cells. Works in pattern and construction position. It is the sort of feature that every functional language grows eventually; we might as well have grown ours here.

Five new error codes (E026 through E030) round out the error catalog for the release, all of them available via cure explain.

v0.19.1: Dialyzer and the small sins

The point release should never be the interesting one, and v0.19.1 honours that convention. Its job was to add Dialyzer to the CI matrix (.github/workflows/ci.yml), resolve the backlog of specs it turned up across Cure.Compiler, Cure.Compiler.Codegen, Cure.FSM.Compiler, Cure.Types.Env, and friends, and tidy a handful of things that were unclean but not broken: the mix cure.escript task got proper CLI integration, mix.exs lost a dead dependency, and the stdlib preloader started force-compiling in tests instead of hoping the beams from the last run would still be fresh. None of this is exciting. All of it needed to happen, because a language without Dialyzer in CI is a language whose maintainer is lying to themselves about how robust its internals are.

The small print, for anyone upgrading. The lexer rule for trailing ? on identifiers introduced in v0.17.0 means x?y now tokenises as x? followed by y; add whitespace or parentheses if you need the old behaviour. Map patterns that relied on the pre-v0.18.0 construction-form bug will now fail matches honestly. None versus None() inside a record pattern now emits a warning when you probably meant the nullary constructor. And Cure.lock lockfiles produced by v0.19.0 and v0.19.1 remain source-compatible with v0.17.0 and later, but a mix cure.compile_stdlib && mix cure.check after pulling is cheap insurance.

What all this adds up to

If you have read this far you may have noticed a pattern. Each of these releases took a subsystem that the earlier Cure treated as decorative and made it load-bearing. FSMs stopped being a transition graph with a separately-authored runtime and became a language primitive one can actually define in one place. The dependent-type core stopped nodding politely at Vector(T, m + n) and started checking it. The pattern engine stopped accepting whatever came past and started matching. The furniture release filled in the items one would notice the absence of but would not necessarily think to name: defaults, derive, property tests, a lazy iterator, a version parser, mutual-recursion totality. And the point release added the tool (Dialyzer) that would have prevented some of these embarrassments from landing in the first place.

There is still a long queue. Full bitstring pattern specifiers. Refinement narrowing through nested record and map patterns. The remote package-registry index service with its signing and Hex.pm cross-publishing. True dependent types in their full glory—fn len(l: Vector(_, n)) -> NonNegInt = n remains, for the moment, a thing I write in the v0.2x plan rather than in actual Cure. None of this will be done quickly. All of it will be done, if the last four releases are any indication, by ripping out the almost-works version and replacing it with one that does.

Clone it, build it, break it:

git clone https://github.com/am-kantox/cure-lang.git
cd cure
mix deps.get && mix test
mix escript.build
./cure version
./cure run examples/destructuring.cure
./cure run examples/derived_show.cure
./cure run examples/lazy_iter.cure

The repository is at github.com/am-kantox/cure-lang, the site is at cure-lang.org, and the furniture, as of v0.19.1, is indoors.

SAFE: Bringing Real Static Analysis to the BEAM - Erlang Solutions

Sat, 18 Apr 2026 03:56:15 +0000

In recent years, software security has become a hot topic due to regulatory pressure (NIS2, EU Cyber Resilience Act, etc.). Beyond regulations, software communities and open source maintainers have also put security into focus, because open source libraries are often part of commercial software supply chains. This has reached the Erlang/Elixir ecosystem as well, and that is a good thing.

In this post, the SAFE team takes us through SAFE, Erlang Solutions’ security analysis tool for the BEAM, covering what it does, how it works, and what makes it effective in practice.

The BEAM is secure by default (Up to a point)

The BEAM’s architecture eliminates entire classes of bugs for free. Isolated process memory means processes can’t manipulate each other’s state, they only communicate through messages. Immutable data structures rule out a whole category of aliasing bugs. These are real wins, and they come without any effort from the developer.

But “secure by default” only goes so far. Application-level vulnerabilities, e.g., XSS, SQL injection, CSRF, unsafe deserialization, atom exhaustion, are just as possible in Erlang and Elixir as anywhere else. That’s the gap SAFE exists to cover.

What is SAFE?

SAFE (Security Analysis for Erlang/Elixir) is Erlang Solutions’ static analysis tool for the BEAM. It analyses compiled BEAM files rather than source code, so it works consistently across Erlang, Elixir, and Phoenix, including mixed-language codebases. SAFE is free for open source projects (subject to approval) and commercially available for other use cases.

It is developed in collaboration with academic research on static analysis from Eötvös Loránd University , and are aligned with the security recommendations of the Erlang Ecosystem Foundation (EEF).

SAFE detects a broad range of vulnerabilities, including:

Cross-Site Scripting (XSS)
SQL injection
Command injection
Remote Code Execution
Denial of Service (e.g. atom exhaustion)
Unsafe serialisation
Cross-Site Request Forgery (CSRF)
Session hijacking, fixation, and information leakage
Content Security Policy (CSP) misconfigurations

Data-flow analysis: the core of what makes SAFE different

The central feature that sets SAFE apart is data-flow analysis. Most static analysis tools work by pattern matching, they look for known dangerous function calls and flag them. The problem is that not every call to a dangerous function is actually dangerous. Without understanding the possible values flowing through the code, a tool has no way to tell the difference, and the result is a high rate of false positives.

SAFE takes a different approach. Data-flow analysis tracks what values variables can hold at each point in the program. This information is then used to filter the initial list of vulnerability candidates, eliminating findings where the data can be proven safe, and surfacing only the ones that represent real risk.

The practical impact of this is significant. In our tests across 7 popular open source BEAM projects (~70,000 lines of code), SAFE produced a false positive rate of 7.78% and that number continues to improve as we refine our analysis. We also manually review findings during development to further sharpen the filtering.

How data-flow analysis eliminates false positives

Example 1 — guarded atom creation

A common pattern in Elixir is to convert a binary to an atom only after validating it against an allowlist:

     
    def safe_to_atom(binary, allowed) do    
      if Enum.member?(allowed, binary), do: String.to_atom(binary)    
    end

A pattern-matching tool sees String.to_atom/1 and flags it. SAFE’s data-flow analysis traces the possible values ofbinaryat the point of the call and determines that it is always a member of a finite, controlled list so it eliminates the finding entirely.

Example 2 — finite compile-time atom generation

Metaprogramming is common in Elixir. Consider this pattern where atoms are generated at compile time:

 @variants [:case_a, :case_b, :case_c]    
    #    
    # ...    
    #    
    for var <- @variants do    
      defp unquote(var)() do    
        env = Application.get_env(:my_app, :environment)    
        if env == "test" do    
          unquote(Macro.escape(Module.get_attribute(__MODULE__, :"test_#{var}")))    
        else    
          unquote(Macro.escape(Module.get_attribute(__MODULE__, var)))    
        end    
      end    
    end

The number of atoms created here is strictly bounded by the length of @variants, a compile-time constant. SAFE calculates this and correctly determines the atom count is finite hence no vulnerability. A tool without data-flow analysis cannot make this determination.

What SAFE catches in practice

Session management vulnerabilities

Session management vulnerabilities allow attackers to gain unauthorised access to user sessions, which can lead to data theft, unauthorised actions, and account takeover. SAFE detects session hijacking, session fixation, and session information leakage. Session hijacking occurs when an attacker gains access to cookie contents. To prevent this, thehttp_only and secure attributes should both be set to true when setting a cookie. Below is a vulnerable example:

 @spec set_cookie(Plug.Conn.t()) :: Plug.Conn.t()    
    def set_cookie(conn) do    
      Plug.Conn.put_resp_cookie(conn, "my_cookie", "true",    
        http_only: false,    
        max_age: @max_age # an integer    
      )    
    end

Session fixation is an attack where a malicious user plants a session ID for a victim to use, then hijacks their account after login. The Plug.Session API provides the configure_session/2 function for renewing the session ID. When this function is misconfigured by setting the renew option to false, session fixation can occur.

Session information leakage can be prevented by encrypting cookie contents. Encryption can be enabled by setting the encryption_saltin Plug.Session. For non-session cookies, encryption can be enabled via the encrypt option in Plug.Conn.put_resp_cookie/4.

Content Security Policy misconfigurations

When it comes to Content Security Policy (CSP), any policy is better than none. Using :put_secure_browser_headers without a custom policy won’t be enough on its own:

   plug :put_secure_browser_headers, %{    
      "content-security-policy": "[Your Policy]"    
    }

There are also dedicated plugs for this, such as PlugContentSecurityPolicy:

   plug PlugContentSecurityPolicy,

SAFE inspects the policy content itself and will flag an overly permissive default. You should define your own policy, keeping it as restrictive as possible to avoid unintentionally whitelisting too much. Beyond that, SAFE checks that all pipelines accepting HTML have CSP protection at all, a gap that is easy to miss during development.

Results from the field

The top three vulnerability types are XSS, DoS, and CSP. Notably, a large share of DoS findings trace back to unguarded String.to_atom/1 calls, a known footgun that is consistently flagged in Erlang/Elixir documentation, yet still appears frequently in practice.

SAFE found a total of 90 vulnerabilities, of which 7 were false positives after manual investigation, a false positive ratio of 7.78%, and an area of active improvement.

The projects tested are anonymized to avoid identification, since the disclosed vulnerabilities may still be present in production systems. While this limits full reproducibility, responsible disclosure takes priority. Contact us at safe@erlang-solutions.com to discuss the methodology in detail.

Closing notes

Security analysis on the BEAM is a genuinely hard problem. The same flexibility that makes Erlang and Elixir so expressive also makes it easy to introduce subtle vulnerabilities without realising it. SAFE is built specifically for this environment, grounded in academic research, and designed to give you signals you can act on rather than noise you have to filter.

If you maintain an open-source project, SAFE is free: reach out to us at safe@erlang-solutions.com and after a short approval process you’ll receive a licence at no cost. For commercial use or a third-party security review of your system, get in touch with the team.

The Long Road to Cure

Thu, 16 Apr 2026 09:19:20 +0000

Just under a year ago I set out to fulfil a long-standing ambition: building a language that compiles to BEAM and implements two capabilities essential to my daily work—dependent types resolved and verified at compile time, and verifiable finite state machines as a first-class language primitive.

I chose Erlang as the implementation language because it seemed to me that writing a compiler would be easier that way. I piled together a heap of attractive little baubles, many of which were added simply “because it is fun to have them.” Quite quickly I lost my way, and before long I lost control of the resulting spaghetti. I stopped understanding what was breaking and why whenever I added something that looked innocuous. I brought in an LLM, and the blasted thing informed me that the codebase had been authored by a drunk lumberjack with a primary-school education—someone who could not be trusted with anything more complex than bubble sort. Not in those exact words, but close enough.

That upset me. I lost my temper, demanded that the wretched language model knock together a presentation site in five minutes on the spot (without specifying any details whatsoever) and shipped it. The announcements, predictably, were received with polite indifference. People liked the ambition behind the idea (naturally!), but calling the implementation good was something not even my mother would have attempted.

I shelved the coding—and recoding—for several months and set about thinking about what I had done wrong.

First: over the last decade I had worked primarily with Elixir, and choosing Erlang had been a momentary, entirely unjustified whim. One can call :compile.forms/2 perfectly well from Elixir. Moreover, the ecosystem allowed me to abandon make-files and a rather crude build setup.

Second: on my first approach to the apparatus I was trying to hunt ducks, hares, and wild boar simultaneously. There was nothing systematic about that approach—I just kept adding and adding new things, propping up the collapsing frame with sticks as I went. Adding a new operator could break module parsing entirely. In the current version I followed the plan strictly: better slow and coherent than fast and obscure.

And most importantly: in the first version I was in such a hurry that I repeated the mistake of almost every existing language—the AST was present as an annexe, a shed around the back. The lexer and parser could hand control directly to the compiler. That turned out to be the critical error.

Having reconsidered all this, I decided to start the rebuilding from the foundations. That is how the metastatic library came into being. Put another way: I not only placed the AST at the centre of things but made it tower above every other entity in the system.

Metastatic is a library that provides a unified MetaAST (Meta-level Abstract Syntax Tree) intermediate representation for parsing, transforming, and analyzing code across multiple programming languages using a three-layer meta-model architecture. Build tools once, apply them everywhere. Create a universal meta-model for program syntax that enables cross-language code analysis, transformation, and tooling. Metastatic provides the foundation—the MetaAST meta-model and language adapters. Tools that leverage this foundation (mutation testing, purity analysis, complexity metrics) are built separately.

If you look closely, MetaAST bears a strong resemblance to Elixir’s own AST, because there is simply no better way to represent a tree than {node, meta, children}. In any case, I invested considerable effort in creating and debugging this new AST. It became the foundation of my second attempt at a “dependently-typed programming language for the BEAM virtual machine with first-class finite state machines and SMT-backed verification,” as it reads on the landing page.

After that I opened the casket marked “things from the previous century,” pulled out an actual pen and a notebook, and wrote out all the features from the first version that I had managed to implement, however imperfectly. I grouped them by importance, utility, and complexity. At this first stage I decided to forgo support for true dependent types—fn len(l: Vector(_, n)) -> NonNegInt = n—but a great deal of interesting things survived.

FSMs as first-class citizens are still in a fairly embryonic state, but I am in no hurry now, and I shall bring them to completion steadily and carefully.

Give https://cure-lang.org a try—perhaps you will like it.

Avoiding Platform Lock-In in Regulated Environments

Mon, 13 Apr 2026 05:37:46 +0000

Platform lock-in is often discussed as a commercial issue. Organisations adopt infrastructure that works well initially and later realise that moving away from those services becomes expensive or operationally disruptive.

For platforms that run continuously under heavy demand, the consequences appear somewhere else first. They appear in architecture.

Infrastructure choices influence how systems scale, how faults are contained, and how easily the platform can evolve as requirements change. In regulated environments those decisions often remain in place for years, which means architectural flexibility matters as much as technical capability.

When infrastructure becomes tightly coupled to a particular provider, systems may still perform well day to day. The real impact usually surfaces later when workloads grow, regulations change, or operational expectations increase. At that point platform lock-in risks begin to affect reliability as well as flexibility.

Why Platform Lock-in Matters in Regulated Environments

These architectural constraints become particularly visible in regulated industries where infrastructure decisions cannot be changed casually.

Financial services platforms must maintain traceable transactions and strict audit trails. Betting platforms process large volumes of activity during live sporting events. Streaming platforms deliver real-time content to global audiences who expect uninterrupted interaction.

Systems supporting these environments often remain active for long periods, which means infrastructure decisions made early in the system’s lifecycle can shape how the platform changes years later.

The Operational Impact of Vendor Lock-in

Many organisations already recognise the risks associated with vendor lock-in. The 2024 Flexera State of the Cloud Report found that 89% of organisations now operate multi-cloud strategies, with reducing infrastructure dependency and avoiding vendor concentration cited as key motivations.

The concern goes beyond procurement strategy. When platforms rely heavily on provider-specific services for messaging, orchestration, or event processing, those dependencies begin shaping how the system behaves under load.

In regulated environments that dependency can become a reliability concern. Infrastructure decisions that once simplified development may later restrict how systems scale, evolve, or respond to operational change.

Distributed Systems Architecture and Long-Running Platforms

The reason platform lock-in becomes particularly serious in regulated environments is tied to how many of these platforms operate: as long-running distributed systems.

Large-scale entertainment services rarely behave like short-lived workloads that restart frequently. Messaging layers, real-time interaction systems, and event pipelines maintain persistent connections while processing continuous streams of activity.

Why Long-Running Systems Behave Differently

Gaming platforms illustrate this clearly. Competitive environments host thousands of players interacting simultaneously, all of whom expect consistent state across the system. Betting platforms experience similar behaviour during major sporting events when users react instantly to changing odds. Streaming platforms see comparable spikes as audiences interact during live broadcasts.

These platforms rely on distributed systems architecture that must coordinate large numbers of connections and events while remaining continuously available.

Research published by ACM Queue examining large-scale distributed systems highlights how persistent connections and real-time workloads increase coordination pressure across system components, particularly during sudden spikes in concurrency.

When coordination layers rely heavily on platform-specific services, architectural dependency gradually builds. Over time the system begins to inherit those infrastructure constraints.

Reliability Requirements in High Reliability Systems

Systems operating under these conditions often prioritise stability over rapid iteration. Platforms designed as high reliability systems must remain available while managing constant traffic, evolving workloads, and unpredictable user behaviour.

Infrastructure decisions therefore have long-term consequences. When coordination, messaging, or state management rely on proprietary platform services, architectural flexibility narrows over time.

Why Gaming, Betting and Streaming Platforms Reveal Infrastructure Limits

Systems built as long-running distributed environments face their toughest tests during moments of concentrated demand. Entertainment platforms provide a clear example.

Large audiences often react simultaneously. A football match entering extra time can trigger thousands of betting transactions within seconds. A major esports tournament can bring large numbers of players online at once. Streaming platforms experience bursts of interaction as viewers respond together during live broadcasts.

Traffic Spikes and Scalable Distributed Systems

Systems supporting these environments must function as scalable distributed systems capable of handling sudden increases in activity without losing consistency or responsiveness.

Instead of steady growth, activity often arrives in waves. Large numbers of users connect, interact, and generate events within very short timeframes. The system must coordinate these interactions across multiple nodes while maintaining reliable communication between services.

Infrastructure that appears sufficient under normal conditions can struggle during these spikes if the surrounding architecture relies too heavily on provider-specific services.

Real-World Example: BET Software

These architectural pressures are particularly visible in betting platforms where activity surges during live sporting events.

BET Software operates large-scale betting technology platforms where thousands of users interact with markets simultaneously. During major sporting events systems must process rapid updates, recalculate market information, and distribute new data to users in real time.

Their distributed systems illustrate how reliability and responsiveness become essential in environments where activity concentrates around shared moments.

Architectures designed with flexibility across infrastructure layers tend to scale and recover more predictably than those tightly coupled to provider-specific services.

Architectural Patterns to Avoid Vendor Lock-in

Recognising the risks of vendor lock-in is useful only if it leads to better architectural decisions. Systems that remain adaptable across infrastructure layers often share several structural characteristics.

Decoupling Infrastructure Dependencies

Architectures designed to avoid vendor lock-in typically separate application logic from infrastructure services wherever possible. This allows teams to evolve system components independently without redesigning the entire system,

Designing Fault Tolerant Systems

Platforms that must operate continuously also benefit from architectures designed as fault tolerant systems, where failures can be contained locally rather than cascading across the entire platform.

Common patterns include:

Decoupled services that scale independently
Communication through open protocols rather than proprietary messaging layers
Distributed state management instead of provider-specific coordination services
Horizontal scaling across nodes
Infrastructure abstraction layers separating application logic from provider-specific implementations
These approaches help ensure that infrastructure choices support the system rather than define its limitations.

These patterns help ensure that infrastructure choices support the system rather than define its limitations.

Where Elixir Supports High Reliability Systems

Technology choices also influence how easily distributed systems can maintain reliability while remaining adaptable.

Languages built on the Erlang virtual machine, including Elixir, were designed for environments where systems must remain available while handling large numbers of concurrent processes. The runtime emphasises process isolation and supervision structures that allow failures to be contained locally rather than cascading across the system.

Building Fault Tolerant Systems for Long-Running Platforms

These characteristics make the platform particularly well suited for high reliability systems that must remain active while managing heavy concurrency.

The advantage lies in the runtime model rather than any single infrastructure provider. Systems built around resilient distributed behaviour are easier to evolve because they remain stable even as infrastructure decisions change around them.

Designing Systems That Reduce Platform Lock-in

Looking across these examples reveals a consistent pattern.

Platform lock-in becomes most visible in systems that must operate continuously while adapting to changing demand. Regulated environments amplify the challenge because infrastructure decisions often remain in place for years while platforms continue to evolve.

Gaming, betting, and streaming services make these limits easier to see. Sudden spikes in activity quickly expose architectural weaknesses, and systems designed with flexible infrastructure tend to scale and recover more predictably.

If you are building platforms where reliability and long-running distributed workloads matter, it may be worth assessing how your architecture handles platform lock-in. To explore these challenges further, get in touch with the Erlang Solutions team.

The post Avoiding Platform Lock-In in Regulated Environments appeared first on Erlang Solutions.

Reliability is a Product Decision

Mon, 13 Apr 2026 05:37:45 +0000

Reliability is often treated as something that can be improved once a system is live. When things break, the focus shifts to monitoring, incident response, and recovery, with the belief that resilience can be strengthened over time as scale reveals weaknesses.

In reality, most of it is set much earlier.

Long before a system faces sustained demand, its underlying design has already shaped how it will respond under pressure. Choices about service boundaries, data handling, deployment models, and fault management influence whether a problem stays contained or spreads.

The conversation is gradually moving from reliability to resilience because distributed systems rarely operate without failure. The more useful question is how a platform continues running when parts of it inevitably fail. The sections that follow explore how early architectural decisions shape that outcome, why their impact becomes more visible at scale, and what it means to build resilience from the beginning rather than react to it later.

Early Decisions Create Long-Term Behaviour

Large-scale failures rarely emerge without warning. What appears sudden at scale is often the predictable outcome of structural decisions made earlier, when different commercial pressures shaped priorities.

In the early stages of a product, the focus is understandably on delivering value quickly, reducing development friction, and validating the market. These are rational business decisions. However, architecture chosen primarily for speed can quietly define the operational ceiling of the system, setting limits that only become visible once demand increases.

Systems Behave as They Were Built to Behave

Outages are often described as “unexpected events,” but distributed systems typically respond to pressure in ways that reflect their design. How services communicate, how state is shared, where dependencies sit, and how failure is managed all influence whether disruption remains contained within a single component or spreads across the wider platform.

Research from Google’s Site Reliability Engineering work shows that around 70% of outages are caused by changes to a live system, such as configuration updates, deployments, or operational changes, rather than by hardware failures. Similarly, the Uptime Institute’s Annual Outage Analysis identifies configuration errors and dependency failures as leading causes of major disruption.

These findings are unsurprising. In distributed environments, dependencies increase and recovery paths become harder to trace, which means that architectural shortcuts that once seemed minor can have disproportionate impact under sustained load. Systems tend to fail along the structural lines already drawn into them, and those lines are shaped by early design decisions, even when those decisions were commercially sensible at the time.

Trade-offs That Compound Over Time

Architectural decisions are rarely made under ideal conditions. Early on, speed to market matters, simplicity reduces friction, and shipping is the priority. A tightly coupled service can help teams move faster, a single-region deployment keeps things straightforward, and limited observability may feel acceptable when traffic is still modest.

But overtime, these trade-offs compound.

Limited isolation between services makes it easier for problems in one area to affect others.
Shared infrastructure can create hidden dependencies that only become visible under heavy demand.
Concentrated regional deployments increase the impact of a local outage or cloud disruption.
Observability that felt sufficient at launch can fall short when trying to understand complex behaviour at scale.

At a smaller scale, these constraints can go largely unnoticed. As usage increases and demand becomes less predictable, they start to shape how the system responds under pressure. What once felt manageable begins to show its limits.

This is rarely about a lack of technical ability. It is simply what happens as complexity builds over time. Every system reflects the trade-offs made in its early stages, whether those choices were deliberate or just practical at the time.

When Architecture Becomes Business Exposure

As systems grow in scale and complexity, the way they are built starts to show up in practical ways. When services are tightly connected, recovery takes longer. When failures are not well contained, a problem in one area can disrupt others. Incidents become harder to resolve and more expensive to manage.

The cost of disruption is not abstract. ITIC’s 2023 Hourly Cost of Downtime Survey reports that more than 90% of mid-size and large enterprises estimate a single hour of downtime costs over $300,000, and roughly 41% place that figure between $1 million and $5 million per hour. At that level, even short-lived incidents carry material financial impact.

For organisations that rely on digital platforms to generate revenue, those numbers represent missed transactions, operational strain, and damage to customer trust. At that point, system design is no longer just an engineering decision. It becomes a business decision with measurable financial consequences.

When Failure Is Public

Some systems fail quietly, disrupting internal workflows or back-office processes with limited external visibility. Others operate in real time, where performance issues are experienced directly by customers, investors, and partners.

In sectors such as entertainment, demand is often synchronised and predictable. Premieres, sporting events, ticket releases, and major launches concentrate traffic into specific windows, placing simultaneous pressure on application layers, databases, and third-party services. These moments are not unusual spikes; they are built into the operating model. Platforms designed for large-scale engagement are expected to handle peak demand as part of normal business activity.

That expectation changes the stakes. When performance degrades in these environments, it is noticed immediately and often publicly. Frustration spreads quickly, confidence can shift in hours, and what might have been an operational issue becomes a visible business problem.

In this context, resilience shapes whether a high-demand event reinforces confidence in the platform or exposes its limits. When failure is experienced directly by users, it moves beyond internal metrics and becomes part of the customer experience itself.

Designing for Resilience

If failure is inevitable in distributed systems, then resilience has to be built in from the start. It cannot be something added later when the first serious incident forces the issue.

Resilient systems are structured so that problems stay contained. A fault in one component should not automatically take others down with it, and services should be able to keep operating even when parts of the system are degraded. External dependencies will fail. Traffic will spike. The design needs to account for that reality.

This way of thinking shifts the focus. Instead of trying to prevent every possible issue, teams concentrate on limiting the impact when something goes wrong. Speed still matters, but so does the ability to grow without introducing instability.

Technology choices can support that approach. Elixir programming language, running on the BEAM, was designed for environments where downtime had real consequences. Its structure reflects that:

Applications are made up of many small, independent processes rather than large, tightly connected components.
Failures are expected and handled locally.
Supervision and recovery are built into the runtime so the wider system keeps running.

No language guarantees reliability, but tools built around fault tolerance make it easier to create systems that continue operating under pressure.

To conclude

By the time serious issues appear at scale, most of the important decisions have already been made.

Failure is part of running distributed systems. What matters is whether problems stay contained and whether the platform keeps operating when something goes wrong.

Thinking about resilience early makes growth easier later. It helps protect revenue, maintain trust, and avoid the instability that forces costly redesigns.If you are building distributed platforms where reliability directly affects performance and reputation, now is the time to treat resilience as a core design decision. Get in touch to discuss how to build it into your architecture from the start.

The post Reliability is a Product Decision appeared first on Erlang Solutions.

The Always-On Economy: Fintech as Critical Infrastructure

Mon, 13 Apr 2026 05:37:45 +0000

We are living in an economy that rarely sleeps. Payments clear late at night. Payroll runs in the background. Businesses expect every digital touchpoint to work when they need it.

Most people do not think about this shift. They assume the systems behind it will hold.

That assumption carries weight.

Fintechs, including small and mid-sized ones, now sit inside the basic infrastructure of how money moves. Their uptime affects cash flow. Their stability affects trust. When something breaks, real businesses feel it immediately.

Expectations changed faster than most systems did.This follows on from our earlier piece, From Prototype to Production, which explored how early technical shortcuts surface as systems scale. Here, we look at what happens next, when those systems become part of the infrastructure businesses rely on every day.

When “tech downtime” becomes infrastructure failure

Outages no longer feel contained. A single failure can affect services that millions of people and businesses depend on.

Large providers experience this as much as small ones. When a shared platform falters, the impact spreads quickly. More than 60 percent of outages now come from third-party providers rather than internal systems, highlighting how tightly connected the ecosystem has become.

The true cost is trust

For fintechs, the impact is immediate because money is involved.

A payment delay blocks cash flow. A failed identity check stops onboarding. A stalled platform damages credibility.

For an SME, this can play out over the course of a single day. Payroll does not process in the morning. Supplier payments stall in the afternoon. Customer support queues fill up while teams wait for systems to recover. Even short interruptions create knock-on effects that last far longer than the outage itself.

And the numbers reflect that risk:

£25,000 is the average cost of downtime for SMEs.
40% of customers consider switching providers after a single outage.
Fintech revenue losses account for approximately US$37 million of downtime-related costs each year.

At this level of dependency, downtime stops being a technical issue. It becomes an infrastructure failure.

Fintechs have become infrastructure whether they intended to or not

Fintech services have moved beyond convenience. They now underpin everyday economic activity for businesses that depend on constant access to money, credit, and financial data.

This shift shows up in uptime expectations. Platforms that handle financial activity are measured against standards once reserved for mission-critical systems. Even brief disruption can have outsized consequences when services are expected to remain available throughout the day..

Customers do not adjust expectations based on company size or stage. If money flows through a service, users expect it to be available around the clock. When it is not, the failure feels systemic rather than technical. What breaks is not just functionality, but confidence.

That is the environment fintech leaders are operating in now.

Where resilience typically breaks down

Most fintech systems do not fail because the idea was weak. They fail because early decisions prioritised speed over durability.

Teams optimise for launch. They prove demand. They ship quickly. That works early on, but systems designed for experimentation often struggle once demand becomes constant rather than occasional.

Shortcuts that felt harmless early on start to surface under pressure. Shared components become bottlenecks. Manual processes turn into operational risk. Integrations that worked at low volume become fragile at scale. Recent analysis shows average API uptime has fallen year over year, adding more than 18 hours of downtime annually for systems dependent on third-party APIs.

Common pressure points include:

Shared components that act as single points of failure
Manual operational work that cannot keep up with growth
Third-party dependencies with limited visibility or control
Architecture built for bursts of usage instead of continuous demand

These are not accidental outcomes. They are the result of trade-offs made under pressure. Funding milestones, launch timelines, and growth targets shape architecture as much as technical skill does.

When outages happen, they rarely trace back to a single bug. They trace back to earlier choices about what mattered and what could wait.

From product thinking to infrastructure grade systems

Product thinking is about features and speed. Infrastructure thinking is about continuity.

Infrastructure-grade systems assume failure will happen. They are built to contain it, recover quickly, and keep the wider platform running. The goal is not perfection. The goal is staying available.

The goal is not perfection. The goal is staying available.

Continuous availability is now expected in financial services. Systems are updated and maintained without noticeable downtime because users do not tolerate interruptions when money is involved.

This approach does not slow teams down. It reduces risk. Deployments feel routine instead of stressful. Engineering effort shifts away from incident response and toward steady improvement.

Over time, this changes how organisations operate. Teams plan differently. Roadmaps become more realistic. Reliability becomes part of delivery rather than a separate concern.

Elixir and the always-on economy

Elixir programming language is designed for systems that are expected to stay available. It runs on the BEAM virtual machine, which was built in environments where downtime carried real consequences.

That background shows up in how Elixir applications are structured. Systems are composed of many small, isolated processes rather than large, tightly coupled components. When something fails, it fails locally. Recovery is expected. The wider system continues to operate.

Elixir in Fintech

This matters in fintech, where failure is inevitable and interruptions are costly. External services misbehave. Load changes without warning. Elixir applications are built to absorb those conditions and recover quickly without cascading outages.

Teams working in Elixir tend to spend less time managing fragile behaviour and more time improving core functionality. Systems evolve instead of being replaced. Reliability becomes part of the foundation rather than a promise teams struggle to maintain.

For fintechs operating in an always-on economy, that approach aligns with the expectations already placed on them.

Reliability as a competitive advantage

Reliable systems can completely change how a fintech operates day to day.

For growing fintechs, uptime supports trust and regulatory confidence. Customers stay because the platform behaves predictably. Growth becomes steadier because teams are not constantly reacting to incidents. Downtime costs make this real, especially for small businesses that lose revenue when systems are unavailable.

For larger providers, reliability reduces operational strain. Fewer incidents mean fewer emergency fixes and fewer difficult conversations with partners and regulators. Teams spend more time improving core services and less time managing fallout.

Reliability also shapes perception. Platforms that stay up become easier to trust with deeper integrations and higher volumes. Over time, that trust compounds and turns stability into a real advantage, even if it is rarely visible from the outside.

To conclude

The always-on economy creates real opportunity for fintechs, but it also raises expectations that many platforms were not originally built to meet.

The question is whether your system has the resilience to operate as infrastructure day after day. If you are a fintech and want to build with reliability in mind, get in touch.

Designing for resilience early makes it far easier to scale without introducing fragility later on.

The post The Always-On Economy: Fintech as Critical Infrastructure appeared first on Erlang Solutions.

Building a performance evaluation toolkit and a dataplane PoC for atproto

Mon, 13 Apr 2026 04:05:45 +0000

There is a new open social ecosystem emerging around atproto. Never heard of it? You should check it out, it's cool tech with an ethos.

The open system is a dream for builders: from day one you can tap into the social graph of more than 40 million Bluesky users. Building on this userbase, you can create everyones new favorite social network on atproto. It's no surprise that there is already a number of alternative communities emerging: Northsky, Eurosky, Blacksky, to name a few.

This is a blog post though, so you already know there is a problem.

The Bluesky dataplane

Bluesky runs almost entirely on open sourced components which enables alternatives to get started quickly.

There is one notable exception: the part of the stack that processes the event stream of the atproto firehose, the AppView, needs one component to digest and store data. This part is called dataplane, and it does the most heavy lifting in processing the event stream. However, the open source dataplane implementation is not very performant. Bluesky uses a closed source dataplane implementation in production for this reason.

The open source dataplane functions on fan-in principle:

events are streamed into the system (e.g. "new post")
the dataplane stores and indexes the events in ordinary Postgres tables
on user request, data is queried and presented to the user (fan-in)

This approach leads to the classic Twitter timeline problem, and in fact limits the user numbers that can safely be served to 100k users, often less.

Fan-out to ETS dataplane

The trick to solve this problem is to change the approach. Instead of querying the data on user request, we prepare data in a dedicated place for each user on write (fan-out). When the user makes a request, the data is already sitting there, just waiting to be served to the user.

This is the also the approach followed by the Bluesky closed source dataplane which builds on ScyllaDB. From all publicly available information, it seems to be pretty optimized to their hardware, and pretty expensive to run.

As you know, we are big fans of Elixir at bitcrowd, so we've built a Proof of Concept of a dataplane with ETS taking on the role of ScyllaDB. The idea here is that we can build something that runs on commodity hardware but scales way better than Bluesky's open source dataplane implementation.

A peek into performance

To evaluate the PoC, we measured reading and writing directly to the dataplane. We compared a Postgres backed version (representing Bluesky's open source implementation) with an ETS backed version.

We simulated write traffic and let a number of users make requests to load their timeline (get_timeline). In the dashboards below, you can see the number of simultaneously active users ("Active Sessions"), the write traffic ("Posts created"), the latencies of get_timeline requests, and the throughput of get_timeline requests.

Under heavy load, the Postgres backed fan-in implementation showed significantly larger latencies and lower throughput in comparison to the ETS backed fan-out implementation.

Dashboard showing data for Postgres backed fan-in implementation

Dashboard showing data for ETS backed fan-in implementation

Advanced simulations

We extended this PoC into a performance toolkit you can use to simulate traffic via atproto events and user requests.

As a basic principle, we want to be able to repeatedly simulate scenarios. Therefore, you provide the configuration for the scenario as file that can be repeatedly loaded into the system.

We provide three components:

Base data

You can quickly create base data, such as users, to prepare your application instance for the simulated scenario. It makes a difference for your app's performance whether you already have millions of users in your database or none.

Traffic simulation

Based on the simulation plan, we perform requests or emit events to create read and write traffic.

Measuring performance

Based on the simulated traffic, we can measure the performance users would experience. For instance, how long would it take for their timeline to load?

A Score for an Invisible Orchestra

Sun, 12 Apr 2026 18:34:09 +0000

Imagine a five-storey building with no lift, erected in the late fifties somewhere on the outskirts of Avtozavodskaya—or better still, in Kupchino. Every floor speaks its own language. Not figuratively but in the most literal sense: the ground floor communicates in Cyrillic, the second in Latin script, the third in ideograms, the fourth in cuneiform, and the fifth, in the manner of Wittgenstein, maintains a principled silence on the grounds that whereof one cannot speak, thereof one must be silent. The postman, delivering the correspondence, is obliged to carry five copies of one and the same letter, translated into each of these tongues, and to knock on the door every time, hoping the addressee has not moved to another floor.

That is precisely how the world of programming is arranged—if one looks at it from the wings rather than the stalls. Every language has its own internal representation of code. Python stores its AST the way a thrifty housewife stores dry goods: in tidy labelled containers—BinOp, FunctionDef, Name. Elixir, with characteristic self-assurance, uses triples {atom, metadata, children} and calls them quoted expressions, as though what we have before us is not a syntax tree but a collection of quotations from William Blake. Ruby stockpiles its S-expressions the way an antiquarian bookseller stockpiles yellowing volumes, and Erlang, faithful to tradition, converses in tuples and atoms intelligible only to Ericsson engineers and, by a curious coincidence, doctoral students at a handful of Swedish universities.

The problem is obvious to anyone who has ever attempted to build a code-analysis tool. Suppose you have written a superb cyclomatic-complexity analyser for Python. It is magnificent: it finds nested conditionals, counts branching points, draws control-flow graphs. Then a colleague comes along and asks, “Could you do one for Ruby?” And it transpires that all your work—all those tree walkers, all that pattern matching over Python’s AST—must be rewritten from scratch. From zero. For a different tree, with different nodes, different semantics, and different booby traps. And then a third colleague will turn up and request the same for Haskell.

Imagine a conductor forced to relearn musical notation every time a new instrument joins the orchestra. Violin—one system of writing. Cello—another. Oboe—a third, with reversed polarity, no less. The trumpet flatly refuses to acknowledge the existence of the staff and insists on a tablature of its own invention. Absurd, of course. In the real world every instrument reads the same score. Notes, rhythm, dynamics are universal. Only the technique of execution differs.

MetaAST is that score.

Before we turn to the details (and they deserve attention in the way a well-constructed detective plot deserves it), permit me a brief digression into theory. Fear not: no formulae, only an analogy. Though one formula will appear after all—but it is so elegant that failing to cite it would be a crime against aesthetics.

In the early two-thousands—when mobile telephones already existed but had not yet taken charge of our lives—the OMG consortium (Object Management Group, bearing no relation whatsoever to the divine or the exclamatory) released a standard called MOF: Meta-Object Facility. Its essence fits into four lines, yet it took the industry two decades to understand those four lines. MOF defines a four-level hierarchy of models:

M⁰ is running code. There it goes, spitting out results, crashing with errors, consuming memory. This is reality.

M¹ is a model of reality. For programs, it is the AST: the abstract syntax tree. Python’s BinOp(op=Add(), left=Name('x'), right=Num(5)) is M¹. Elixir’s {:+, [context: Elixir], [{:x, [], Elixir}, 5]} is also M¹. Every language describes its own code in its own M¹, the way every painter paints an apple in their own way.

M² is the model of models. The meta-model. It defines what a node of any AST can be. Not a concrete node of a concrete language, but the concept of a node. A binary operation is neither Python’s BinOp nor Elixir’s {:+, ...}. A binary operation is the idea that two operands are connected by an operator. UML lives at the M² level. And so does MetaAST.

M³ is the meta-meta-model. That which defines what meta-models themselves can be. The type system, the rules of composition. MOF lives here. In the context of MetaAST, this role is played by Elixir’s type system—@type and @spec.

The fundamental difference between MetaAST and LLVM IR, Java bytecode, or any other intermediate representation lies precisely here: all of those are models (M¹). They describe concrete code in a concrete format. MetaAST is a meta-model (M²). It describes what descriptions of code can be. The difference is roughly the same as between a dictionary and a language: a dictionary catalogues words, whereas a language defines the rules by which those words are possible in the first place.

Metastatic is an Elixir library that implements this idea in code. The name, as befits a respectable technical project, is charged with a double meaning: Met(a)-AST-atic, that is, “pertaining to the meta-level of AST.” The medical connotations are the house’s treat.

The architecture is three-layered, and this is not caprice but a consequence of theory:

M².1 — Core. Concepts present in every programming language on the planet. Literals, variables, binary operations, conditionals, function calls, assignments. Nothing exotic here. x + 5 in Python, Elixir, Ruby, Erlang, and Haskell is one and the same thing. An identical MetaAST representation. Literally:

{:binary_op, [category: :arithmetic, operator: :+],
  [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}

Five languages. One tree. One analysis tool. A score legible to any instrument in the orchestra.

M².2 — Extended. Constructs that exist in most languages but not all. Loops, lambdas, collection operations, pattern matching, exception handling. Haskell knows nothing of imperative loops—so be it: it has recursion at the M².1 level. Ruby knows nothing of guards—no matter: the adapter’s metadata will preserve the context.

M².3 — Native. The emergency exit for constructs that resist generalisation. Rust’s lifetimes, Haskell’s type classes, Elixir’s metaprogramming. They are wrapped in {:language_specific, :rust, ...}—like fragile porcelain in bubble wrap—and travel through the system without losing their identity, yet without claiming universality either.

Every MetaAST node is a triple: type, metadata, children (or value). The format is deliberately borrowed from Elixir’s quoted expressions, because if you already know how to write macros in Elixir, you already know how to work with MetaAST. The difference is semantic: where Elixir uses :+ (the operator itself), MetaAST uses :binary_op (the concept of a binary operation) and tucks the operator into metadata. Where Elixir inlines literals, MetaAST wraps them in {:literal, [subtype: :integer], 42}, ensuring structural uniformity.

Language adapters form the bridge between M¹ and M². The Python adapter takes Python’s AST (obtained via subprocess) and abstracts it up to MetaAST. The Elixir adapter takes quoted expressions and does the same. The Ruby adapter—likewise. The inverse operation—reification—turns MetaAST back into the native AST of the target language. This pair of operations, abstraction and reification, constitutes what mathematicians call a Galois connection:

Adapter_L = (alpha_L, rho_L)

alpha_L: AS_L -> MetaAST x Metadata    (abstraction: M¹ -> M²)
rho_L:   MetaAST x Metadata -> AS_L    (reification: M² -> M¹)

What does this mean in practice? Here is a scenario. You have written a function-purity analyser. It operates on MetaAST: traverses the tree, looks for side effects (I/O, state mutation, calls to random-number generators), and renders its verdict. This analyser was written once. Once. And it works with Python, Elixir, Ruby, Erlang, and Haskell. Because it analyses not the Python or Elixir AST but the meta-level. You write:

{:ok, doc} = Metastatic.Adapter.abstract(Python, "print('hello')", :python)
{:ok, result} = Metastatic.Analysis.Purity.analyze(doc)
result.pure?    # => false — side effect: I/O
result.effects  # => [:io]

And precisely the same code, unchanged, works if you replace Python with Ruby and print('hello') with puts 'hello'. Because both calls are {:function_call, [name: "print"], [{:literal, [subtype: :string], "hello"}]} at the M² level.

Let us return to MOF and the four-level hierarchy. Why is any of this necessary? Why erect meta-models when one could simply write a converter from one AST to another?

The answer is simple and brutal, like the truth of life. A converter works pairwise. For five languages you need twenty converters (5 × 4). For ten—ninety. For twenty—three hundred and eighty. A meta-model works through a hub: each language connects once, via its own adapter. Five languages—five adapters. Ten—ten. Twenty—twenty. Linear growth instead of quadratic. A mathematician who had spent half his life as a reporter would appreciate the irony.

But it is not only a matter of combinatorics. A meta-model provides standardisation. Every tool written for MetaAST is guaranteed to work with any language for which an adapter exists. This is not “Python support” and “Ruby support” as separate features. It is one feature: MetaAST support. Everything else follows as a consequence.

OMG understood this in 2002 when it released MOF. The entire UML industry is built on the same principle: the meta-model defines what models can be, and concrete diagrams are merely instances of that meta-model. MDA (Model-Driven Architecture) took the idea further: transformations between models are defined at the meta-level and applied automatically to any instances.

Metastatic does the same, but not for class diagrams—for the syntax trees of programs. It is not an IR, not a compiler, not a transpiler. It is the foundation upon which all of the above can be built—once, and for every language at once.

There is an old story about a farmer, I’ve heard from a stranger on a train. A man spends his entire life breeding different eggplant varieties. In decades, when the farmer is already old and blind, his fields have produced an unprecedented harvest. He calls for each and every agronomist in the world, they come, and the most famous one says, “These aubergines are adorable!” The farmer dies of frustration. The man does not know what aubergines are but he’s certain the great eggplant nobody would have called ‘aubergine.’ He considers his life wasted.

The ASTs of different languages are the eggplants. MetaAST is the knowledge that they all grow on the same field. The difference between them is terminological, not semantic. Python’s BinOp(op=Add()), Elixir’s {:+, [], [...]}, and Ruby’s s(:send, ..., :+, ...) are different names for one and the same thing: {:binary_op, [category: :arithmetic, operator: :+], [left, right]}.

One could, of course, spend an entire life rewriting tools for every new language. One could relearn musical notation each time, bow before every new instrument in the orchestra, translate letters into five tongues for five floors of one and the same building. But why, when one can ascend a single level of abstraction—to the place where eggplants and aubergines are indistinguishable, and the score is one for all?

Dropping Cloudflare for bunny.net | jola.dev

Tue, 07 Apr 2026 18:13:31 +0000

TL;DR my motivation and experience for moving my blog from Cloudflare to bunny.net

I’ve been a long time Cloudflare user. They offer a solid service that is free for the vast majority of their users, that’s very generous. Their infrastructure is massive and their feature set is undeniably incredible.

One of my biggest concerns though is around how easily I could become heavily dependent on this one single company that then can decide to cut me off and disable all of my websites, for any arbitrary reason. It’s a single point of failure for the internet. Every Cloudflare outage ends up in the news. And I can’t help but feel that the idea of centralizing the internet into a single US corporation feels off. Not to mention the various scandals that have surrounded them. So I was open to alternatives.

Bunny.net

Bunny.net (affiliate link because why not, raw link here) is a Slovenian (EU) company that is building up a lot of momentum. Their CDN-related services rival Cloudflare already, and although their PoP network is smaller than Cloudflare’s, they score highly on performance and speed across the globe. It’s a genuinely competitive alternative to Cloudflare.

It has the additional benefit of being a European company, and I like the idea of growing and supporting the European tech scene.

What I was moving away from

I’ve been using various different services, but focusing on this blog, the first thing was Cloudflare as the registrar for the domain name. I did some research on alternative registrars, but I just didn’t find any good European options. The closest I found was INWX, but their lack of free WHOIS Privacy made them a non-option. I ended up with Porkbun. They run on Cloudflare infrastructure, but they have better support. So the remaining thing Cloudflare was doing for me was the “Orange Cloud”: automatic caching, origin hiding, and optional protection features.

So that’s what we’re moving over! I’m gonna walk you through how to set up the bunny.net CDN for your website, with some sensible defaults.

Step by step

Setting up your bunny.net account is quick and you get $20 worth of free credits to play around with, those are valid for 14 days. You don’t need to give them a credit card up front to try things out, but if you do, you get another $30 worth of credits. You do need to confirm your email though before you can start setting things up. Once you’re out of the trial, you pay per use, which for most cases is cents a month. However, note that bunny.net require a minimum payment of $1 per month.

I guess a cheap price to pay to stop being the product and start becoming the customer.

Creating your pull zone

The pull zone is the main mechanism for enabling the CDN for your website. You’ll find them under CDN in the left navigation bar. Here’s how to set one up:

Fill in the pull zone name. Just make it something meaningful to you, for example the website name.
For origin type, select Origin URL.
Fill in your Origin URL. This would be the address for directly accessing your server. In my case, it’s the public IP of my server.
If you’re running multiple apps on your server, for example using Dokploy, coolify, or self-hosted PaaSs like that, you’ll want to pass the Host header as well. Here you put in the domain of your app. In my case, that’s jola.dev.
For tier, select Standard.
Finally you can select your pricing zones. Note that some zones are more expensive, so you can choose to disable them. This just means that people in those areas will get redirected to the closest zone you do have enabled.

And you’re done with the first part!

Configuring your pull zone

Now that you’ve set up the pull zone, it’s time to hook it up to your website and domain. Go to the pull zone you created. You’ll see a “hostnames” screen. Time to connect things.

Under “Add a custom hostname” fill in your website domain name.
You’ll get a modal with some instructions. You need to follow them to set up the DNS name to point your website to go through the CDN.
Go to where you manage domain name and add a CNAME record to point your domain to the given CNAME value in the modal, something like website.b-cdn.net.
Once you’ve done that, wait a few minutes to let it propagate, and then click “Verify & Activate SSL”.
If it says success, you’re done. Your website is now running through the bunny.net CDN, similar to the Cloudflare orange cloud.

Configuring caching

This is the part where bunny.net will really shine through!

If your website is set up to return the appropriate cache headers for each resource, things will just work. Bunny defaults to respecting the cache control headers when pointing a pull zone at an origin site. To verify, go to Caching → General and check that “Respect origin Cache-Control” is set under “Cache expiration time”. Note that if you set no-cache, bunny will use that and will not cache at the edge.

Alternatively, if you don’t have cache headers set up, and you don’t want to control that yourself, you can instead enable Smart Cache. This will default to caching typically cached resources like images, CSS, JS files etc, while avoiding caching things like HTML pages. This will work for most cases!

But I wanted to go faster. If you’ve read my post about building this website, here’s how I’ve set up my cache headers: I added a new pipeline in the router called public and added an extra middleware to it. I technically have everything using this pipeline, but leaving the standard browser pipeline that comes out of the box with Phoenix keeps my options open to add authenticated (uncached) pages in the future.

pipeline :public do
    plug :accepts, ["html"]
    plug :put_root_layout, html: {JolaDevWeb.Layouts, :root}
    plug :put_secure_browser_headers, @secure_headers
    plug :put_cdn_cache_header
  end
  
  defp put_cdn_cache_header(conn, _opts) do
    put_resp_header(conn, "cache-control", "public, s-maxage=86400, max-age=0")
  end

You can see the whole router here https://github.com/joladev/jola.dev/blob/main/lib/jola_dev_web/router.ex.

This setup means I even cache the HTML pages, which makes this ridiculously fast. Here’s the landing page response time from various locations, using the Larm response time checker tool:

Because I’m caching the HTML pages, if I publish a new post I do need to purge the pull zone to reset the cached HTML files.

Setting some sensible defaults

All of these are optional, but nice to have!

On your pull zone page, under General → Hostnames, go toggle “Force SSL” on for your domain to ensure that all requests use SSL. SSL/TLS is pretty standard these days, and many TLDs and websites use HSTS to enforce it, but no harm in enabling it here too.

DDoS protection comes out of the box, but we can set some other things up. First of all, go to Caching and then Origin Shield in the left menu on your pull zone, and activate Origin Shield. Select the location closest to your origin. This reduces load on your server, as bunny.net will cache everything in the Origin Shield location, and all edge locations will try that location first before hitting your server.

Next, go to Caching → General and scroll down. At the bottom of the page you can select Stale Cache: While Origin Offline and While Updating. This means bunny will keep serving cached content even if it is stale, if it can’t reach your origin, and that it will serve stale content while fetching the latest version. Both are nice to haves, nothing you have to enable, but provide a slightly better service to your users!

Next, let’s set up an Edge rule to redirect any requests to our automatically generated pull zone domain to our actual domain, to avoid confusing crawlers. On your pull zone, in the left menu, click Edge rules.

Add edge rule.
Name it “Default domain redirect”.
Under actions, select Redirect.
For URL, input your URL plus the path variable. Eg for me it’s https://jola.dev{{path}} .
Status code: use the default 301.
For conditions, pick Match any and Request URL Match any.
Input *://<slug>.b-cdn.net/* replacing <slug> with the name given to your pull zone.
Save edge rule!

Now you should be able to go to https://slug.b-cdn.net for your pull zone and get redirected to your proper domain!

Conclusion

This post just covers the very basics of getting set up on bunny.net. I haven’t even scratched the surface of edge rules, cache configuration, the Shield features for security and firewalls, video hosting and streaming, edge scripting and edge distributed containers, and much more.

I especially appreciate the great statistics, logs, and metrics you get out of the dashboard. You can even see every single request coming through to help you investigate issues, and clear feedback on what’s getting cached and not. I’m actively moving everything else over and I’m excited for the upcoming S3 compatible storage!

You should give bunny.net a try!

Newres Al Haider

Mon, 06 Apr 2026 18:15:11 +0000

Growing Yggdrasil, the World Tree, with Ash

2026-04-05

Declarative programming can be a powerful paradigm for organizing software systems. By defining the business processes once, we ensure there is a single source of domain knowledge. From this foundation, we can derive other parts of the system such as API endpoints, database schemas, and even user interfaces. This approach reduces repetition and helps prevent bugs caused by misaligned domain models.

The Ash Framework seems like an excellent way to see declarative programming in action. Written in Elixir, it allows you to describe your domain in a consistent and expressive way, from which it can automatically generate data layers, REST or GraphQL APIs, and admin interfaces. With Ash, you define what your application should do, and the framework takes care of how to make it happen. It can derive JSON REST endpoints, handle validation, manage persistence, and provide authorization logic, all from the same declarative definitions.

I am new to Ash and I tend to learn best by writing things out, so this article is as much for me as it is for you. Rather than trying to understand everything up front, I prefer to get hands-on quickly with a small, self-contained project. We’ll start with the basics here, and if things go well, expand on it in a follow-up or two.

Given the name Ash, it felt appropriate to build something inspired by the gigantic ash tree of Norse mythology: Yggdrasil, the World Tree.

An image of a Yggdrasil the world tree as a cybernetic Ash tree.

Yggdrasil is said to connect the Nine Worlds of Norse cosmology, though the exact number and nature of these worlds vary between sources. Each world has its own nature, inhabitants, and relationships with the others, making it an ideal metaphor for exploring how Ash models resources, attributes, and relationships. In this project, we will create a domain model for these concepts with Ash and derive a JSON REST API from them.

The first step is getting started with a basic Ash project for which we will use the Igniter tool. (I will assume Elixir is already installed, but if not, see the Elixir Install page for instructions). This is used for project setup and code generation, which will help us get started a lot quicker.

To start off following command will install Igniter:

mix archive.install hex igniter_new

Once we have Igniter, the next step is creating a new project in the yggdrasil directory, adding Ash, and moving into it.

mix igniter.new yggdrasil --install ash && cd yggdrasil

This will land us in a new Elixir project directory with Ash installed. From this seed we will evolve our application to represent the worlds and characters of Norse mythology.

The newly created project comes with a hello function in the lib/yggdrasil.ex module. Let's try it out in the iex, the Elixir interactive shell which we can start with:

iex -S mix

Running it will bring us into the shell, where we can run the hello function in the yggdrasil module:

iex(1)> Yggdrasil.hello()
:world

We now have a hello world, but in yggdrasil we want to represent the worlds of Norse mythology that are linked by Yggdrasil. The first thing we need for this is a Domain. This will function as a container for the various concepts, such as the worlds, that we will introduce later.

For simplicity's sake, first we will replace the contents of lib/Yggdrasil.ex with the following:

defmodule Yggdrasil do
  @moduledoc """
  The Yggdrasil domain — acts as the trunk of the tree
  and organizes all resources like World and Character.
  """

  use Ash.Domain

  resources do
    # Resources will be registered here
    resource Yggdrasil.World
  end
end

We create a file lib/resources/world.ex with the following contents:

defmodule Yggdrasil.World do
  @moduledoc """
  A resource representing a world in Yggdrasil.
  """

  use Ash.Resource,
    # in-memory store
    data_layer: Ash.DataLayer.Ets,
    domain: Yggdrasil

  actions do
    create :create do
      accept [:name, :description]
    end

    update :update do
      accept [:description]
    end

    # Provide default actions
    defaults [:read, :destroy]
  end

  attributes do
    # Primary key
    uuid_primary_key :id

    # World name and description
    attribute :name, :string, allow_nil?: false, public?: true
    attribute :description, :string, public?: true
  end
end

And finally

config :yggdrasil, :ash_domains, [Yggdrasil]

With these files in place, we now have a minimal Ash domain containing a single resource: World. Let’s take a moment to unpack what we just created before moving on.

At the top level, Yggdrasil acts as our domain, the trunk of our system. It brings together all the resources that make up the application and defines how they relate to each other. Right now, our domain only includes one resource, Yggdrasil.World, but we’ll add more later.

The Yggdrasil.World module itself is declared as a resource. In Ash, a resource is the fundamental building block. It describes a specific type of data and what can be done with it. Instead of writing separate schemas, changesets, and controllers, we declare everything about a resource in one place, and Ash takes care of the details.

Our World resource uses the Ash.DataLayer.Ets data layer, which stores data in Elixir’s in-memory ETS tables. This setup is fast and simple, making it perfect for early experimentation, though data won’t persist between runs. Later, this can be swapped out for a different data layer to gain full persistence. The argument domain: Yggdrasil connects the resource back to the domain we just defined so that the framework knows where it belongs.

Inside the actions block, we declare what operations are available for this resource. The create action accepts a name and a description, while the defaults [:read, :destroy] line automatically adds the standard read, update, and delete actions. There’s no need to write any manual CRUD logic—Ash generates it for us.

The attributes block defines the structure of each world. Every world has a UUID primary key (:id) and two fields, :name and :description. The :name attribute is made required using (allow_nil?: false), ensuring that each world must have one. Both attributes are marked public?: true so they appear in APIs and outputs.

Finally, the configuration line we added tells Ash which domains to load when the application starts. Without this, the framework wouldn’t know about our new resource.

At this point, our small Ash tree has already taken root. We’ve declared the first piece of our domain, and Ash now knows how to create, read, update, and delete worlds. Let’s see that in action next by exploring our resource interactively in iex.

First, start the interactive shell from your project root:

iex -S mix

Once inside iex, we want to create our first world Asgard, the shining realm of the gods, with the following command:

asgard = (
  Yggdrasil.World
  |> Ash.Changeset.for_create(:create, %{
       name: "Asgard",
       description: "A shining realm of order and power, suspended high above the clouds."
     })
  |> Ash.create!()
)

which would return the world as such:

14:29:14.665 [debug] Creating Yggdrasil.World:

Setting %{id: "726b678e-6cb6-4277-b291-85ecfa313d3a", name: "Asgard",
 description: "A shining realm of order and power,...}

%Yggdrasil.World{
  id: "726b678e-6cb6-4277-b291-85ecfa313d3a",
  name: "Asgard",
  description: "A shining realm of order and power, suspended high above the clouds.",
  __meta__: #Ecto.Schema.Metadata<:loaded>
}

There are a multiple things happening here, so let's unwrap things step by step.

First we start off with our Ash resource that we have defined Yggdrasil.World. In Ash, resources describe the structure of our data, including attributes like name and description, as well as the actions that can be performed on them.

Next we are using the pipe operator. |>, to pass this result to our next function. Elixir’s pipe operator takes the result of the expression on the left and passes it as the first argument to the function on the right.

For example instead of writing:

function(value, a, b)

with the pipe operator we can equivalently write:

value |> function(a, b)

This allows us to write a sequence of operations in a readable, step-by-step style. In our code example, it means Yggdrasil.World is passed into the Ash.Changeset.for_create function, as the first parameter. This function also takes the identifier of our create action :create, as well as the structure representing Asgard, with its name and description.

What this function returns is a changeset, a data structure representing the intended change of a resource in Ash (e.g.: creating, updating, etc). This is especially useful when it comes to validation and error checking, as we will see it later down the line. For now we use this changeset and pipe it into the function that executes the actual creation: Ash.create!().

The resulting value is a %Yggdrasil.World{} struct, which represents the newly created world. Ash also automatically generated a UUID for the id field, which uniquely identifies this world inside the system.

Before returning the struct, Ash logs the operation it performed. That is why we see the debug output:

[debug] Creating Yggdrasil.World

The final line is the Elixir struct that was created:

%Yggdrasil.World{
  id: "726b678e-6cb6-4277-b291-85ecfa313d3a",
  name: "Asgard",
  description: "A shining realm of order and power, suspended high above the clouds.",
  __meta__: #Ecto.Schema.Metadata<:loaded>
}

This struct is also stored in the variable asgard, so we can reference it later in the session.

Now that we understand how creating a world works, let’s add another one.

midgard = (
  Yggdrasil.World
  |> Ash.Changeset.for_create(:create, %{
       name: "Midgard",
       description: "The realm of humans, bound to the earth and everyday struggles."
     })
  |> Ash.create!()
)

This follows the exact same pattern as before. We build a changeset that describes the creation of Midgard, and then execute it with Ash.create!(). Much simpler than the mythological creation of Midgard, which involved the slaying of the giant Ymir.

Now that we have some worlds, let's read them using the read action:

worlds = (
  Yggdrasil.World
  |> Ash.Query.for_read(:read)
  |> Ash.read!()
)

which would give us our list of worlds:

[
  %Yggdrasil.World{
    id: "some-uuid-1",
    name: "Asgard",
    description: "A shining realm of order and power, suspended high above the clouds."
  },
  %Yggdrasil.World{
    id: "some-uuid-2",
    name: "Midgard",
    description: "The realm of humans, bound to the earth and everyday struggles."
  }
]

As one can expect, we can also do an update call. For example, let’s change the description of Asgard:

asgard = (
  asgard
  |> Ash.Changeset.for_update(:update, %{
       description: "The fortified realm of the Aesir, ruled by Odin."
     })
  |> Ash.update!()
)

There are a few things to note here. Instead of starting from the Yggdrasil.World module, we now start from the existing asgard struct. This is because we are modifying a resource that already exists.

The function for_update creates a changeset that describes the intended update. Just like with creation, the changeset itself does not perform the update, it only represents the change we want to make.

We then pass this changeset into Ash.update!(), which executes the update. Ash applies the changes, runs any validations, and returns the updated %Yggdrasil.World{} struct.

We can verify the change by reading the list of worlds again:

worlds = (
  Yggdrasil.World
  |> Ash.Query.for_read(:read)
  |> Ash.read!()
)

which would give us a result such as:

[
  %Yggdrasil.World{
    id: "6b62b3ea-b08b-4387-8539-37e645e53026",
    name: "Midgard",
    description: "The realm of humans, bound to the earth and everyday struggles.",
    __meta__: #Ecto.Schema.Metadata<:loaded>
  },
  %Yggdrasil.World{
    id: "d2646509-6c92-4049-a2db-0555612fc365",
    name: "Asgard",
    description: "The fortified realm of the Aesir, ruled by Odin.",
    __meta__: #Ecto.Schema.Metadata<:loaded>
  }
]

An interesting thing we could try out is updating the name of a world instead:

asgard2 = (
  asgard
  |> Ash.Changeset.for_update(:update, %{
       name: "Asgard2"
     })
  |> Ash.update!()
)

We get the following error:

** (Ash.Error.Invalid)
Invalid Error

* No such input `name` for action Yggdrasil.World.update

The attribute exists on Yggdrasil.World, but is not accepted by Yggdrasil.World.update

Perhaps you meant to add it to the accept list for Yggdrasil.World.update?


Valid Inputs:

* description

This is because when we were defining our update action in our module, the only attribute we accept is :description, see fragment below:

update :update do
      accept [:description]
    end

In other words, while the name attribute exists on the resource, it is not allowed to be modified through the update action. This is a domain modelling decision, and gives us fine-grained control over how our data can change. In this case, we decided that a world’s name is fixed after creation, while its description can evolve over time.

Finally we get to do delete, where we destroy asgard, our Ragnarok action if you will. We can do this by the following:

Ash.destroy!(asgard)

Ash.destroy! takes a resource struct, in this case asgard, and removes it from the data store. Since we’re using an in-memory ETS store, it deletes it from memory immediately. The function should return :ok on success. We can double check this by requesting our list of worlds again by our usual means:

worlds = (
  Yggdrasil.World
  |> Ash.Query.for_read(:read)
  |> Ash.read!()
)

which returns only Midgard:

[
  %Yggdrasil.World{
    id: "6b62b3ea-b08b-4387-8539-37e645e53026",
    name: "Midgard",
    description: "The realm of humans, bound to the earth and everyday struggles.",
    __meta__: #Ecto.Schema.Metadata<:loaded>
  }
]

At this point, we’ve taken the first steps in modeling our little piece of Yggdrasil. We have a domain, a resource, and a way to create, read, update, and delete worlds, enough to bring about a small Ragnarok.

Next, we will explore how we can start connecting resources together. After all, the worlds need their heroes and villains to really come alive.

How Many Paradigms Does It Take to Screw In a Lightbulb?

Mon, 06 Apr 2026 12:52:41 +0000

A developer who knows only one programming paradigm resembles a carpenter whose entire toolbox contains a single hammer. Naturally, a hammer will drive a nail with admirable precision. Or a screw, if sufficient enthusiasm is applied. But try to saw or plane a board with that hammer, and it becomes immediately clear—assuming you’ve encountered a saw or a plane at least once in your life—that the instrument has been chosen poorly. So it is with paradigms: knowledge of nothing but imperative programming, or nothing but object-oriented design, transforms a developer into a mechanical executor of tasks, incapable of seeing an elegant solution even when it lies on the surface, waiting to be noticed.

The narrowness of a programmer trapped in a single paradigm manifests in everything. They will erect loops where a single higher-order function would suffice. They will breed classes and inheritance where a pure function and composition would have been more than enough. They will attempt to verify the correctness of an algorithm with a debugger and tests instead of proving it formally at the type level. Such a developer resembles a tourist who knows exactly one word of the foreign language and is attempting, with its help, to explain a route across the entire city to a taxi driver. And it’s a small mercy if the word isn’t obscene.

Let us, for a start, walk through the principal paradigms and see what instruments each offers for solving problems. We’ll begin with the most ancient and familiar—the imperative paradigm.

Imperative programming is the world of instructions and mutable state. The programmer tells the machine: do this, then that, change this variable, repeat five times. A classical example in C:

intsum=0;for(inti=0;i<10;i++){sum+=i;}

Here we explicitly manage the state of the variable sum, accumulating the result step by step. This is natural for the machine, but tedious for the human. Every step must be spelled out, every mutation tracked. The imperative style serves well when the task reduces to a sequence of actions with side effects: write to a file, update a database, print to the screen. But as soon as the task grows in complexity, the code devolves into a tangle of interrelated variables and conditions.

Procedural programming is the imperative approach enriched with structures and functions. We group instructions into procedures to avoid repetition and improve readability. The same example:

intcalculate_sum(intn){intsum=0;for(inti=0;i<n;i++){sum+=i;}returnsum;}

Now the logic is packaged into a function that can be reused. The procedural style dominated the era of Pascal and early C. It taught programmers to think in modules and structure their code, but it never freed them from the problems of mutable state and side effects.

Object-oriented programming (in Gosling’s understanding, not Kay’s) promised to solve all problems at once: encapsulation, inheritance, polymorphism—the three pillars upon which the entire world supposedly rests. Data and methods unite into objects, objects assemble into class hierarchies. It sounds splendid, until you begin to examine how the code actually works:

classCounter{privateintvalue=0;publicvoidincrement(){value++;}publicintgetValue(){returnvalue;}}

State lives inside the object, convenient methods form the API, full encapsulation achieved. So it would seem, but the state hasn’t gone anywhere—it has merely relocated into a class field. And along with it relocated all the old afflictions: data races in multithreading, the difficulty of testing, the unpredictability of behavior. The object-oriented approach serves well for modeling a domain when you need to describe entities and their interactions. But it transforms into a nightmare when class hierarchies sprawl to dozens of inheritance levels, and half the methods exist solely to pass a call further down the chain.

Functional programming looks at the task from an entirely different angle. Here there is no mutable state, no loops, no side effects. There are only functions that receive data and return results. The same summation example in Haskell:

sum=foldl(+)0[0..9]

One line instead of five. No loops, no intermediate variables. The function foldl takes (1) an addition operation, (2) an initial value, and (3) a list, returning the result. The code reads like a mathematical expression, not a sequence of commands. The functional style is particularly well suited for working with collections, for building data-processing pipelines, for parallel computation. When there is no mutable state, there is no need for locks and synchronization. Functions can be safely launched simultaneously on different processor cores. Though for the domain of Accounting for a liquor store in the suburbs—it’s a rather dubious ally.

Logic programming overturns one’s very notion of how to write code. Instead of explaining how to solve a task, the programmer describes what they want to obtain. The system finds the solution on its own. Prolog is the classical representative of this paradigm:

parent(tom,bob).parent(tom,liz).parent(bob,ann).grandparent(X,Z):-parent(X,Y),parent(Y,Z).

We described kinship relations and a rule for determining grandparents. Now we can pose the question: grandparent(tom, ann)?—and the system will answer “yes,” having found the path through the facts. Logic programming is indispensable in certain corners of artificial intelligence, expert systems, and task planning. I even dragged it into the consistency validation of finite automata in one of my libraries. But an attempt to write a web server in Prolog would look rather like an attempt to hammer a mole with a microscope.

Declarative programming is a general term for approaches where the programmer describes the desired result rather than the sequence of steps. SQL is the textbook example:

SELECTnameFROMusersWHEREage>18ORDERBYname;

We don’t explain how to traverse the table, how to check the condition, how to sort the result. We simply declare: I want the names of users over eighteen, sorted alphabetically. The database will figure out how to do this efficiently on its own. The declarative style dominates in HTML, CSS (for now—I suspect someone will drag recursion into it before long), and configuration files. It allows one to separate the what from the how.

Concatenative programming is built on the idea of function composition via a stack. Forth is its most vivid representative:

: square dup * ;
5 square .

The function square duplicates the top element of the stack and multiplies it by itself. The number 5 is placed on the stack, the function square is applied, the result is printed. The code reads right to left, like reverse Polish notation. Concatenative languages are compact and efficient, but they demand a particular cast of mind. They remain popular in embedded systems and wherever code size and execution speed are critical.

Reactive programming focuses on data streams and the propagation of changes. When a data source changes, all dependent computations update automatically. An example in RxJS:

constclicks=fromEvent(document,'click');constpositions=clicks.pipe(map(event=>event.clientX));positions.subscribe(x=>console.log(x));

We create a stream of click events, transform it into a stream of coordinates, and subscribe to changes. Each click automatically produces the coordinate in the output. The reactive style is ideal for interfaces, event handling, and working with asynchronous data sources. It liberates you from callback hell and makes the data flow explicit.

Aspect-oriented programming addresses the problem of cross-cutting concerns—logging, caching, access control. Instead of smearing these aspects across the entire codebase, they can be described separately:

@Transactional@LoggedpublicvoidupdateUser(Useruser){repository.save(user);}

The annotations @Transactional and @Logged are aspects. They will be automatically “applied” to the method, wrapping it in a transaction and adding logging. The core code remains clean and comprehensible. The aspect-oriented approach is popular in enterprise development, where cross-cutting concerns permeate the entire system.

Metaprogramming is the programming of programs that write programs. Macros in LISP allow code to be generated at compile time:

(defmacrowhen(condition&restbody)`(if,condition(progn,@body)))

The macro when expands into an if construct with a progn block. Metaprogramming grants extraordinary flexibility, enabling the creation of domain-specific languages right inside the host language. But with great power comes great responsibility: poorly written macros turn code into an unreadable mess. If you want to see what metaprogramming looks like when practiced by a sane person—take any of my libraries, or write your own in Elixir. I know of no other language where macros have been done properly.

Dependently-typed programming elevates the type system to a new plane. Types can depend on values, allowing complex invariants to be expressed at the type level.

dataVec(A:Set):Nat->Setwhere[]:VecAzero_::_:{n:Nat}->A->VecAn->VecA(sucn)append:{A:Set}{mn:Nat}->VecAm->VecAn->VecA(m+n)

The type Vec A n is a vector of elements of type A with length n. The function append takes two vectors of lengths m and n and returns a vector of length m + n. The compiler verifies correctness at the type level. It is impossible to write a function that violates the length invariant. Dependent types are used for the formal verification of critical systems, where an error costs far too much.

Theorem-proving as a paradigm is the proof of program correctness by mathematical methods. Lean and Coq allow one to write not merely code, but proofs that the code does precisely what was intended:

theoremadd_comm(nm:Nat):n+m=m+n:=byinductionnwith|zero=>simp[Nat.zero_add,Nat.add_zero]|succnih=>simp[Nat.succ_add,Nat.add_succ,ih]

This is not simply an addition function—it is a proof that addition is commutative. The compiler doesn’t merely check types; it checks the mathematical proof. This approach is employed in cryptography, compilers, and operating systems—domains where the price of an error is measured not in irritated users, but in human lives or millions of dollars in losses.

The actor model views a program as a collection of independent actors that exchange messages. Each actor has its own mailbox, processes messages sequentially, and can create new actors. Erlang was built upon this idea:

-module(counter).-export([start/0,loop/1]).start()->spawn(fun()->loop(0)end).loop(N)->receive{increment,Pid}->Pid!{value,N+1},loop(N+1);{get,Pid}->Pid!{value,N},loop(N)end.

The actor counter receives increment and get messages, modifies its state, and replies. No shared data, no locks. Actors scale horizontally, failures are isolated. This model is ideal for distributed systems, where failures are the norm rather than the exception.

Dataflow programming describes computation as a graph of data streams. The nodes of the graph are operations, the edges are data flows between them. A change in one node propagates automatically through the graph. LabVIEW uses visual dataflow programming for hardware control. The approach is intuitive for engineers accustomed to thinking in schematics and diagrams.

Constraint programming describes a task as a set of constraints that must be satisfied. The system searches for a solution by enumerating possibilities and pruning the impossible. MiniZinc is a language for constraint programming:

var 1..9: x;
var 1..9: y;
constraint x + y = 10;
constraint x * y = 21;

Two variables, two constraints. The system will find x = 3, y = 7 or x = 7, y = 3. Constraint programming is applied in planning, scheduling, and resource optimization—wherever a task is formulated as finding a solution under constraints.

Phew.

Now let us pose the question: why does any of this matter to an ordinary developer? The answer is simple and simultaneously non-obvious. Each paradigm is a way of thinking, an approach to solving problems. A programmer who knows only imperative programming will solve every task with loops and conditionals. They will see a list-processing task and write a for loop with intermediate variables. A programmer acquainted with the functional paradigm will write map or fold—elegantly, concisely, free of side effects. One who has mastered reactive programming will construct an event-processing pipeline where each stage is explicitly described and easily testable.

Knowledge of different paradigms expands one’s arsenal of tools. You won’t write a web server in Prolog or prove theorems in JavaScript. But an understanding of logic programming will help you formulate conditions more precisely and build database queries. Familiarity with dependent types will teach you to think in invariants and express constraints at the type-system level. Experience with actors will show you how to build scalable distributed systems without the headaches of synchronization.

In truth, in the modern world all mature languages have long since become multi-paradigm. Scala combines object-oriented and functional approaches. Rust adds a powerful ownership and borrowing system to the imperative style. Python allows one to write procedurally, in an object-oriented fashion, and functionally. F# unites functional programming with the .NET ecosystem. Swift attempts to incorporate elements of all major paradigms at once. A programmer who understands when an aspect is needed (yes, in any language—for instance, I dragged aspects intoElixir) uses the language to its full power. One who knows only a single paradigm writes in any syntax as though it were PHP.

Paradigms are not a religion where you must choose one true faith and wage war on the heretics. They are tools, and a good craftsman knows when to reach for the hammer, when for the saw, and when for the plane. Need to parse something? Take the functional approach with map and fold. Build a system with thousands of simultaneous connections? Actors are your choice. Formally prove an algorithm’s correctness? Welcome to Lean or Agda. Developing an interface with many interactive elements? Reactive programming will make the code comprehensible.

A programmer trapped in a single paradigm is condemned to solve problems inefficiently. They will drag familiar patterns behind them even when those patterns don’t fit. They will write a class where a function would suffice. They will create mutable state where it could be avoided entirely. They will erect a complex hierarchy where composition would have been enough. They resemble a person who knows only one route from home to work and stubbornly waits at the bus stop every day, even though the road has been torn up for a month and the bus now runs on the next street over.

If a developer claims the badge of mid-level-plus but doesn’t feel at ease in at least the five principal paradigms—they are a pompous fool, and you should show them the door.

Roastidio.us Tagged with elixir

Erlang Ecosystem Foundation - Supporting the BEAM community

Atom Exhaustion Is Not a Footgun. It's One Third of Our CVEs.

Highest Random Weight in Elixir | jola.dev

Rendezvous hashing

Basic HRW algorithm

Linear growth

HRW skeleton

Distribution

Announcing HRW, the library

Make Friends With Your AI Assistant

Typography

Workflows

Context Is King (And Your Assistant Has Amnesia)

Atomic Tasks, or: Stop Writing Novels

Feed It Good Code, Get Good Code

Trust, But Verify (Especially the Tests)

The Art of Saying “No, Try Again”

Know When to Kill the Session

The Checklist

Mocks Are Your Friends, Not Your Servants

Mock Is a Noun

Stubs Are for Candles, Not for Software

Promote Your Mocks to Lead Actors

OTP, Race Conditions, and the Debugger You Deserve

Listeners, Visibility, and the Joy of Seeing What Happens

Finitomata: A Case Study in Mock-Driven Testing

The Moral

Running local models on an M4 with 24GB memory | jola.dev

Qwen 3.5-9B (4b quant)

Pi setup

OpenCode setup

Comparison with SOTA models

Example 1:

Example 2:

Conclusion

Elixir versus Python for Data Science - DockYard

Introduction

Numerical Computing

Deep Learning

Traditional Machine Learning

Data Analysis

Data Presentation/Visualization

Pipelines / Orchestration

Domain-Specific Libraries

Conclusion

Passkey Sign In with Elixir and Phoenix

Database schema §

Registration JavaScript §

Creating a user §

Login form §

Login JavaScript §

Validating a login attempt §

Handling login if the user enters an email §

Passkey reset §

Conclusion §

From legacy code to verifiable specifications with Surveyor

What is legacy code?​

Legacy software is the code we lost confidence in​

"Can we rewrite this?"​

How we got here​

What we don't want​

What we want instead​

Organised and illustrated​

Job done! Or is it?​

1. Verifiability​

2. Reproducibility​

Runnable specs​

That magic is Assay​

The systemic approach​

Surveyor — discovering the architecture you inherited​

Examples​

The System Context of a Medical Application​

An example of an assay spec created in the second phase:​

Assay — proving the rewrite behaves the same​

How Assay actually works​

A two-pass parser​

The step macro generates real module functions​

Component-scoped step matching​

Pre-check, then execute​

What is legacy code?

Legacy software is the code we lost confidence in

"Can we rewrite this?"

How we got here

What we don't want

What we want instead

Organised and illustrated

Job done! Or is it?

1. Verifiability

2. Reproducibility

Runnable specs

That magic is Assay

The systemic approach

Surveyor — discovering the architecture you inherited

Examples

The System Context of a Medical Application

An example of an assay spec created in the second phase:

Assay — proving the rewrite behaves the same

How Assay actually works

A two-pass parser

The step macro generates real module functions

Component-scoped step matching

Pre-check, then execute

Per-scenario state in the process dictionary

How they fit together

Four audiences, one artifact

Where the LLM lives, and where it doesn't

How you use them on a project

Status and next steps

Lesson: Calling your `GenServer`s is fast, but not 90k times per second fast#