For the last few decades, we have defined science by the publication of peer-reviewed research. That is, a scientist is someone who can write a paper and have it reviewed by two or four other scientists who conclude that the work is credible. Notice how this is a circular definition: it defines a scientist as, … Continue reading Beyond papers: rethinking science in the era of artificial intelligence| Daniel Lemire's blog
In software, we represent real numbers as binary floating-point numbers. Effectively, we represent real numbers as a fixed-precision integer (the significand) multiplied by a power of two. Thus we do not represent exactly the number ‘pi’, but we get a very close approximation: 3.141592653589793115997963468544185161590576171875. The first 16 digits are exact. You can represent the approximation … Continue reading The smallest number that is infinite| Daniel Lemire's blog
When I was an undergraduate student, I discovered symbolic algebra. It was great! Instead of solving for a variable by hand, I could just put all the equations in the machine and get the result.| Daniel Lemire's blog
Our processors execute instructions based on a clock. Thus, a 4 GHz processor has 4 billion cycles per second. It is difficult to increase the clock frequency of our processors. If you go much beyond 5 GHz, your processor is likely to overheat or otherwise fail.| Daniel Lemire's blog
Suppose that you have a long string and you want to insert line breaks every 72 characters. You might need to do this if you need to write a public cryptographic key to a text file.| Daniel Lemire's blog
Earlier this year, both major Web engines (WebKit/Safari and Chromium/Chrome/Edge/Brave) accelerated HTML parsing using SIMD instructions. These ‘SIMD’ instructions are special instructions that are present in all our processors that can process multiple bytes at once (e.g., 16 bytes).| Daniel Lemire's blog
My favorite text editor is Visual Studio Code. I estimate that it is likely the most popular software development environment. Many major software corporations have adopted Visual Studio Code.| Daniel Lemire's blog
Loading data from memory often takes several nanoseconds. While the processor waits for the data, it may be forced to wait without performing useful work. Hardware prefetchers in modern processors anticipate memory accesses by loading data into the cache before it is requested, thereby optimizing performance. Their effectiveness varies depending on the access pattern: sequential reads benefit from efficient prefetching, unlike random accesses.| Daniel Lemire's blog
Last week, I was chatting with a student and I was explaining what SIMD instructions were. I was making the point that, in practice, all modern processors have SIMD instructions or the equivalent. Admittedly, some small embedded processors do not, but they lack many other standard features as well. SIMD stands for Single Instruction, Multiple Data, a type of parallel computing architecture that allows a single instruction to process multiple data elements simultaneously. For example, you can ...| lemire.me
Last week, I was chatting with a student and I was explaining what SIMD instructions were. I was making the point that, in practice, all modern processors have SIMD instructions or the equivalent. Admittedly, some small embedded processors do not, but they lack many other standard features as well. SIMD stands for Single Instruction, Multiple Data, a type of parallel computing architecture that allows a single instruction to process multiple data elements simultaneously. For example, you can ...| Daniel Lemire's blog
Modern-day text in software can be expected to be Unicode. Unicode is stored in two formats: UTF-8 and UTF-16.| lemire.me
“All innovation comes from industry is just wrong, universities invented many useful things.” But that’s not the argument. Nobody thinks that Knuth contributed nothing to software programming. Rather the point is that the direction of the arrow is almost entirely wrong. It is not academia → industry → consumers This is almost entirely wrong. I … Continue reading Innovation starts with consumers, not academia| Daniel Lemire's blog
« Normal science, the activity in which most scientists inevitably spend most all their time, is predicated on the assumption that the scientific community knows what the world is like. Normal science often suppresses fundamental novelties because they are necessarily subversive of its basic commitments. As a puzzle-solving activity, normal science does not aim at … Continue reading Rebels on campus| Daniel Lemire's blog
0One of my most popular blog posts of all times is Data alignment for speed: myth or reality? According to my dashboard, hundreds of people a week still load the old blog post. A few times a year, I get an email from someone who disagrees. The blog post makes a simple point. Programmers are … Continue reading Dot product on misaligned data| Daniel Lemire's blog
Studying productivity is challenging. About 15-20 years ago, I was obsessed over my own productivity.| Daniel Lemire's blog
Universities require their professors to publish research papers. Yet publishing your research has little to do with most of the teaching that goes on in universities. And with online teaching, we can almost completely separate teaching from research. Yet we are typically happy to dismiss these concerns by pointing out that universities have also a research purpose. But this answer is not entirely satisfying: who gets to decide what universities should do beside provide degrees and teaching?| lemire.me
Some compilers align data structures so that if you read an object using 4 bytes, its memory address is divisible by 4. There are two reasons for data alignment:| Daniel Lemire's blog
The Apple M2, introduced in 2022, and the Apple M4, launched in 2024, are both ARM-based system-on-chip (SoC) designs featuring unified memory architecture. That is, they use the same memory for both graphics (GPU) and main computations (CPU). The M2 processor relies on LPDDR5 memory whereas the M4 relies on LPDDR5X which should provide slightly more bandwidth.| Daniel Lemire's blog
JSON, or JavaScript Object Notation, is a lightweight data-interchange format. It is widely used for transmitting data between a server and a web application, due to its simplicity and compatibility with many programming languages. The JSON format has a simple syntax with a fixed number of data types such as strings, numbers, Booleans, null, objects, … Continue reading Just say no to broken JSON| Daniel Lemire's blog
C and C++ compilers like GCC first take your code and produce assembly, typically a pure ASCII output (so just basic English characters). This assembly code is a low-level representation of the program, using mnemonic instructions specific to the target processor architecture. The compiler then passes this assembly code to an assembler, which translates it into machine code—binary instructions that the processor can execute directly.| Daniel Lemire's blog
Back when I started programming, project teams were large. Organizations had dozens of programmers on sprawling projects. Reusing code was not trivial. Sharing code required real effort. Sometimes you had to hand over a disk or recopy code from a printout.| Daniel Lemire's blog
In North America, my home province of Quebec has a slightly higher life expectancy than the rest of the country. It is also a poorer-than-average province, so that is maybe surprising. For 2023, the life expectancy at birth in Ontario was 82.33 years, whereas it was 82.55 years for Quebec. However, if drill down in … Continue reading Life expectancy of men in Canadian provinces| Daniel Lemire's blog
Prescientific and preindustrial thought tied truth to authority and tradition. “We’ve always done it this way.” “The king decrees it.” “We know this is how it’s done.” The scientific and industrial revolutions shattered this mindset. Suddenly, outsiders could challenge entrenched norms. Two brothers in a rundown workshop could build empires to rival the wealthiest lords. … Continue reading Don’t argue: build| Daniel Lemire's blog
Guido van Rossum, Python’s creator, recently said: “We have a huge community, but relatively few people, relatively speaking, are contributing meaningfully.” This highlights a paradox. Software thrives on the network effect, or Metcalfe’s Law, where a system’s value scales with the square of its users. Linux excels because its vast user base fuels adoption, documentation, … Continue reading Metcalfe’s Law against Brooks’ Law| Daniel Lemire's blog
There is an extremely naive model of science and innovation called the linear model: The model postulated that innovation starts with basic research, is followed by applied research and development, and ends with production and diffusion. According to Godin (2006), the model has been falsely attributed to Bush and its dominance is derived rather from … Continue reading Let us bury the linear model of innovation| Daniel Lemire's blog
We often need to quickly classify characters. For example, consider how the binary data that you send by email is converted to an ASCII string made of 64 distinct characters (A-Z, a-z, 0-9, +, /). ASCII characters are stored as 7-bit integer values (from 0 to 128) using on byte per character. During the decoding … Continue reading Fast character classification with z3| Daniel Lemire's blog
The ancient Greeks crafted extraordinary models that continue to resonate. For instance, Ptolemy’s geocentric model, with Earth at the core and planets tracing intricate epicycles, elegantly accounted for celestial motion, much like the ancient ambition behind the Tower of Babel sought to bridge humanity and the heavens through a grand, unifying structure. Ptolemy’s framework dominated … Continue reading Models and science| Daniel Lemire's blog
Suppose that you have an array of N elements and you want to divide it into M chunks. It is a common task when trying to spread N units of work over M threads, for example. You want chunks to be all the same size, plus or minus one element (fair sized). It might be … Continue reading Dividing an array into fair sized chunks| Daniel Lemire's blog
Many programming languages such as the Go programming language are designed to make it easy to return several values at once from a function. In Go, it is often used to return an optional error code. The C++ programming language does not have a built-in support for returning several values. However, several standard types can … Continue reading Returning several values from a function in C++ (C++23 edition)| Daniel Lemire's blog
A few weeks ago, I attended a software engineering seminar focused on the role of large language models in programming. The distinguished software engineering professors in attendance were notably skeptical, frequently dismissing the technology’s potential. During a break, I turned to a senior professor, a longtime colleague, and playfully remarked, “ChatGPT writes better C++ than … Continue reading Producing useful commands on the go using C++ and AI| Daniel Lemire's blog
In C++, templates enable generic programming by allowing functions and classes to operate on different data types without sacrificing type safety. Defined using the template keyword, they let developers write reusable, type-agnostic code, such as functions (e.g., template <typename T> max(T a, T b)) or classes (e.g., std::vector), where the type T is specified at … Continue reading C++20 concepts for nicer compiler errors| Daniel Lemire's blog
In software, we often use key-value data structures, where each key is unique and maps to a specific value. Common examples include dictionaries in Python, hash maps in Java, and objects in JavaScript. If you combine arrays with key-value data structures, you can represent most data. In C++, we have two standard key-value data structures: … Continue reading Streamlined iteration: exploring keys and values in C++20| Daniel Lemire's blog
When trying to write fast functions operating over many bytes, we sometimes use ‘SWAR’. SWAR stands for SIMD Within A Register, and it is a technique to perform parallel computations on multiple data elements packed into a single word. It treats a register (e.g., 64-bit) as a vector of smaller units (e.g., 8-bit bytes) and … Continue reading Detect control characters, quotes and backslashes efficiently using ‘SWAR’| Daniel Lemire's blog
It is often puzzling to encounter organizations run by highly capable and ambitious people… appear dysfunctional. An example that I like are colleges that claim to provide great learning experience… while providing their students with recycled lectures given by contract lecturers… I like to tell the tale of this math class I took at the … Continue reading How can really smart people appear totally incompetent?| Daniel Lemire's blog
Do large language models (AI) make you 3x faster or only 3% faster? The answer depends on the quality of the work you are producing. If you need something like a stock photo but not much beyond that, AI can make you 10x faster. If you need a picture taken at the right time of … Continue reading How helpful is AI?| Daniel Lemire's blog
Random integer generation is a fundamental operation in programming, often used in tasks like shuffling arrays. Go’s standard library provides convenient tools like rand.Shuffle for such purposes. You may be able to beat the standard library by a generous margin. Let us see why. Go’s rand.Shuffle implements the Fisher-Yates (or Knuth) shuffle algorithm, a standard … Continue reading Faster shuffling in Go with batching| Daniel Lemire's blog
Most mobile devices use 64-bit ARM processors. A growing number of servers (Amazon, Microsoft) also use 64-bit ARM processors. These processors have special instructions called ARM NEON providing parallelism called Single instruction, multiple data (SIMD). For example, you can compare sixteen values with sixteen other values using one instruction. Some of the most recent ARM … Continue reading Mixing ARM NEON with SVE code for fun and profit| Daniel Lemire's blog
Let us consider a simple C++ function which divides all values in a range of integers: void divide(std::span<int> i, int d) { for (auto& value : i) { value /= d; } } A division between two integers is one of the most expensive operations you can do over integers: it is much slower than … Continue reading Speeding up C++ code with template lambdas| Daniel Lemire's blog
In practice, the software we write runs on several processors. Unfortunately, much of what we take for granted on a single processor becomes false when there are more than one processor. For example, if two processors modify the same piece of memory, what is the state of the memory after the modifications? It is difficult … Continue reading An overview of parallel programming (Go edition)| Daniel Lemire's blog
Jarred Sumner, the main author of the Bun JavaScript engine, commented a few days ago on X that opening many files on macOS could be slow due to thread contention: “your $5,000 computer is only capable of opening 1 file at a time”. I was curious and I decided to test it out. I wrote … Continue reading How fast can you open 1000 files?| Daniel Lemire's blog
Convention computer instructions operate on a single piece of data at once (e.g., they can negate an integer or add two integers). For better performance, CPU vendors add support for SIMD instructions. SIMD stands for Single Instruction, Multiple Data. It is a type of parallel processing where a single operation is executed simultaneously on multiple … Continue reading AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors| Daniel Lemire's blog
Though current C++ compilers do not yet support reflection, we have access to prototypical compilers. In particular, engineers from Bloomberg maintain a fork of LLVM, the popular compiler framework. Though such an implementation should not be used in production and is subject to bugs and other limitations, it should be sufficient to test the performance: how fast can serialization-based JSON processing be in C++? Importantly, we are not immediately concerned with compilation speed: the Bloomb...| Daniel Lemire's blog
Herb Sutter just announced that the verdict is in: C++26, the next version of C++, will include compile-time reflection.| Daniel Lemire's blog
Concerns persist that artificial intelligence (AI) could render software developers obsolete, particularly with tools like GitHub Copilot and Cursor streamlining certain programming tasks. While these tools undoubtedly boost efficiency—Microsoft’s CEO has estimated that AI could write 30% of all code—the precise impact on productivity remains challenging to quantify. Moreover, productivity gains do not automatically lead to unemployment. In fact, increased efficiency could drive greater...| Daniel Lemire's blog
There are two main types of fixed-precision integers in modern software: unsigned and signed. In C++20 and above, the signed integers must use the two’s complement convention. Other programming languages typically specify two’s complement as well.| Daniel Lemire's blog
A common operation in software is the copy of a block of memory. In C/C++, we often call the function memcpy for this purpose.| Daniel Lemire's blog
Parsing text files is often confusing irrespective of your programming language. It can also be surprising slow.| Daniel Lemire's blog
When optimizing small functions, I often rely on a table lookup: I replace the actual computation with table of precomputed values. It is often surprisingly efficient.| Daniel Lemire's blog
Recent versions of C++ (C++20) have a new feature: concepts. A concept in C++ is a named set of requirements that a type must satisfy. E.g., ‘act like a string’ or ‘act like a number’. In C++, we have two closely related terms: traits and concepts. For example, std::is_floating_point is a type trait that checks if a type is a floating-point type. A concept in C++ is used to specify requirements for template parameters. So std::floating_point is the concept corresponding to the trait s...| Daniel Lemire's blog
Copying data in software is cheap, but it is not at all free. As you start optimizing your code, you might find that copies become a performance bottleneck.| Daniel Lemire's blog
Recently, the two major Web engines (WebKit and Chromium) adopted fast SIMD routines to scan HTML content. The key insight is to use vectorized classification (Langdale and Lemire, 2019): you load blocks of characters and identify the characters you seek using a few instructions. In particular, we use ‘SIMD instructions’, special instructions that are available on practically all modern processors and can process 16 bytes or more at once.| Daniel Lemire's blog
Modern processors have instructions to process several bytes at once. Effectively all processors have the capability of processing 16 bytes at once. These instructions are called SIMD, for single instruction, multiple data.| Daniel Lemire's blog
In software, we often represent strings by surrounding them with quotes ("). What happens if the string itself contains quotes? We then need to escape the string. For example, the quote character (") or the backslash character (\) should be replaced by \" or \\. Most programmers are familiar with this process.| Daniel Lemire's blog
The Go programming language makes it easy to call C code. Suppose you have the following C functions:| Daniel Lemire's blog