The following is an aggregation of a Twitter thread I posted on April 14th, 2022. Ever wondered how to open a UDP socket in @risc_v assembly? Wonder no more! li a0, 2 li a1, 2 li a2, 0 li a7, 198 ecall Let’s walk through it! 👇🏼🧵 The first thing to understand is that we are just a “normal” program running in user space. We don’t have special privileges in the system, and opening a socket is a privileged operation.| danielmangum.com
Today’s @risc_v Tip: If cross-compiling a dynamically linked program, then executing using binfmt_misc, you’ll need to both: Inform the OS where the dynamic linker resides (or invoke directly) Inform dynamic linker where other shared libraries reside (LD_LIBRARY_PATH) Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: The A extension defines 2 types of instructions for atomic operations: load-reserved/store-conditional (LR/SC) and atomic fetch-and-op (AMO). Both support acquire and release bits to impose additional ordering semantics. LR/SC uses them as follows: Note that LR.rl should not be set unless http://LR.aq is also, and http://SC.aq should not be set unless SC.rl is also. This is due to the fact that those arrangements would provide no additional ordering guarantees, but co...| danielmangum.com
Today’s @risc_v Tip: The second PPO rule defines “Same-Address Load-Load” ordering. This rule serves to preserve “Coherence for Read-Read pairs” (CoRR), specifying that instructions in same hart reading same byte preserve order if return values were written by different mem ops. Original Tweet| danielmangum.com
Today’s @risc_v Tip: RISC-V defines its memory model (RVWMO) in the context of 3 orderings: Program Order: order of ops performed by a single hart Global Memory Order: order of ops performed by all harts Preserved Program Order: subset of program order respected globally This diagram is an abstract example, but the key points are illustrated: memory operations appear to perform sequentially in the context of a single hart, but may not be observed in the same order globally.| danielmangum.com
Tonight’s @risc_v Tip: The Zmmul extension species only the multiplication instructions from the M extension, allowing constrained RISC-V implementations to opt not to support division. The spec calls out FPGAs as an example, as many have built-in multiplier hardware. You’ll notice that MULW is only supported on RV64 implementations. This instruction multiplies the lower 32 bits (i.e. “Multiply Word”) of rs1 and rs2 and places the sign-extension of the lower 32 bits of the result in rd.| danielmangum.com
Today’s @risc_v Tip: Another new extension in the v1.12 Privileged Spec is “NAPOT Translation Continuity” (Svnapot), which allows a PTE to indicate a 64 KiB range with consistent “flags” (bits [5:0]). The lower bits of the VPN are used to replace indicator bits in PPN on read. Original Tweet| danielmangum.com
Today’s @risc_v Tip: A new version of the RISC-V Privileged Architecture (v1.12) was released on Dec 3rd. One of the larger changes is the addition of the “Page-Based Memory Types” extension (Svpbmt). It is configured in leaf PTE bits [62:61] in Sv39, Sv48, and the new Sv57. Original Tweet| danielmangum.com
Today’s @risc_v Tip: The FENCE instruction is defined as part of the base ISA and allows for explicit ordering of instructions prior to (“predecessor set”) and following (“successor set”). Types of instructions to be ordered are specified in each set using the P and S bits. Note that it is common to just see a plain fence in RISC-V assembly, which is actually a pseudoinstruction that maps to fence iorw, iorw.| danielmangum.com
Today’s @risc_v Tip: Regularity is a key design principle of RISC-V and has a large impact on implementation complexity. Fields always reside in the same location when present in an instruction, meaning that generic decode can be performed regardless of the eventual operation. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Atomicity Physical Memory Attributes (PMAs) specify support for atomic instructions described in the A extension in the Unprivileged Spec for a given range. There are two types: Atomic Memory Operations (AMO) Load Reserved / Store Conditional (LR/SC) Original Tweet| danielmangum.com
Today’s @risc_v Tip: Today’s tip comes courtesy of the OpenTitan (a @lowRISC project) RISC-V assembly style guide. Using n(reg) offset syntax (even if n=0) when interacting with registers that are storing pointers makes it visually clear that the contents are a memory address. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: RISC-V contains a few interrupt controller specs, each providing additional functionality over basic privilege-based preemption: Core-Local Interrupt Controller (CLIC) Platform-Level Interrupt Controller (PLIC) Advanced Interrupt Architecture (AIA) The Advanced Interrupt Architecture (AIA) spec encompasses two different interrupt controllers: Advanced Platform-Level Interrupt Controller (APLIC) Incoming Message-Signaled Interrupt Controller (IMSIC) Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: The Supervisor Trap Base Address CSR (stvec) indicates the location of the Supervisor (S) trap handler(s). The BASE address may be virtual or physical, and two modes are supported, with “Vectored” allowing for different handlers for each interrupt type. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Sv32 supports a 2-level page table, but any Page Table Entry (PTE) may be a leaf. Page tables are the size of a page, so a leaf level 1 PTE corresponds to a “megapage” (4 MiB). 1 Page = 4 KiB 1 PTE -> 1 Page 1 Page Table -> 1024 PTE 4 KiB * 1024 = 4 MiB Errata: Small correction to diagram: VPN[0] is used to populate PPN[0], while PPN[1] is populated with PPN in the Level 1 leaf PTE.| danielmangum.com
Tonight’s @risc_v Tip: A 32-bit Page Table Entry (PTE) in Sv32 virtual memory mode contains four bits that indicate access permissions. The R, W, X bits correspond to Read, Write, and Execute access respectively. The U bit indicates whether the page is accessible in User (U) mode. Original Tweet| danielmangum.com
Today’s @risc_v Tip: RISC-V has a separate debug spec that defines a Debug Module (DM), Debug Module Interface (DMI), Debug Transport Module (DTM), and Trigger Module (TM). The TM can be implemented independently of a DM, and adds support for setting native hardware breakpoints. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: You can use two convenient pseudo-instructions to load the address of a symbol into a CSR. la t1, main => auipc t1, main[31:12]; addi t1, t1, main[11:0] csrw mepc, t1 => csrrw x0, mepc, t1 Original Tweet| danielmangum.com
Today’s @risc_v Tip: Machine (M) mode supports hardware performance monitoring (HPM) with the following CSRs: mcycle: number of clock cycles executed minstret: number of instructions retired mhpmcounter[3-31]: event counters mhpmevent[3-31]: event selectors Original Tweet| danielmangum.com
Today’s @risc_v Tip: The satp (Supervisor Address Translation and Protection) CSR contains virtual memory translation information, including MODE. Valid modes for RV32 & RV64 include: RV32: Bare / Sv32 RV64: Bare / Sv39 / Sv48 Bare indicates no translation or protection. Original Tweet| danielmangum.com
Today’s @risc_v Tip: RISC-V systems may support 3 main privilege levels: Machine (M) (Req.) Supervisor (S) (Opt.) User (U) (Opt.) Execution context changes via traps. A trap that moves to a higher privilege level is a vertical trap, one that does not is a horizontal trap. Original Tweet| danielmangum.com
Today’s @risc_v Tip: Fields of CSRs adhere to one of the following behaviors: WIRI: Reserved Writes Ignored, Reads Ignore Values WPRI: Reserved Writes Preserve Values, Reads Ignore Values WLRL: Write/Read Only Legal Values WARL: Write Any Values, Reads Legal Values Original Tweet| danielmangum.com
Today’s @risc_v Tip: You likely think “Hardware Security Module” when you see “HSM”. When working with RISC-V systems you’ll frequently see the term “Hart”, which is a “Hardware Thread”, and in related contexts “HSM” will typically refer to “Hart State Management”. Original Tweet| danielmangum.com
QEMU @risc_v Tip of the Night: Running without -bios specified or with -bios default will automatically load the OpenSBI binary. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: The first rule in RVWMO Preserved Program Order falls under the Overlapping-Address Orderings category and ensures that any loads or stores to a memory address are ordered before a subsequent store to an overlapping memory address. Original Tweet| danielmangum.com
Today’s @risc_v Tip: Instruction-Address-Misaligned exceptions can occur when a control transfer instruction (e.g. branch / jump) provides a misaligned target. The base ISA specifies 4 byte alignment (IALIGN=32), but extensions (such as C) may relax to 2 bytes (IALIGN=16). Original Tweet| danielmangum.com
First @risc_v Tip of 2022! The final new extension added in the v1.12 Privileged Spec is “Fine-Grained Address-Translation Cache Invalidation” (Svinval). It breaks SFENCE.VMA into three distinct instructions, allowing for its operations to be more efficiently pipelined. Original Tweet| danielmangum.com
Today’s @risc_v Tip: The Zifencei extension defines one instruction, FENCE.I, which orders memory writes and instruction fetches. Unlike FENCE, it only synchronizes instructions visible to a single hart. The rd / rs1 / imm[11:0] fields are reserved for future use. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Implementations may define an arbitrary number of Non-Maskable Interrupts (NMIs). NMIs cannot be disabled and must only be used to represent hardware errors. The mcause CSR may be updated to specify a dedicated interrupt code, or otherwise must be 0. Original Tweet| danielmangum.com
Christmas @risc_v Tip: The Zicsr extension defines instructions for clearing and setting CSR bits, which are abstracted by the pseudoinstructions csrc and csrs. Example tandem usage below with bit masks (t0 / t1) to set “Machine Previous Privilege” (mstatus[12:11]): Original Tweet| danielmangum.com
Today’s @risc_v Tip: The trap vector BASE address CSRs (mtvec / stvec) can use the lowest two bits for MODE because the BASE address must be 4-byte aligned (i.e. lowest two bits = 00). Note that a non-masked write of a valid address will overwrite MODE to Direct (00). Original Tweet| danielmangum.com
Today’s @risc_v Tip: If using ld as your linker, a global symbol defined as start will take precedence over earlier symbols in your .text section for program entry point when explicit ENTRY is not defined in your script or via -e flag on the command line. Original Tweet| danielmangum.com
Today’s @risc_v Tip: Traps become more complicated in a pipelined implementation if exceptions occur after subsequent instructions have been issued. The sepc CSR holds the address of the offending instruction, but an implementation may choose whether it is precise or not. Original Tweet| danielmangum.com
Tonight’s @risc_vTip: Understanding marketing material for processors can be confusing. @SiFive recently released its highest performance RISC-V processor: P650. Let’s break down what it broadly means to be a “thirteen-stage, four-issue, out-of-order processor”. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Physical Memory Attributes (PMAs) define the capabilities of a given address range. A range may be mapped to main memory, I/O devices, or may be empty. PMAs include: Access Type Atomicity Memory-Ordering Coherence / Cacheability Idempotency Original Tweet| danielmangum.com
Today’s @risc_v Tip: The address of a Control and Status Register (CSR) encodes access and privilege level in the top 4 bits. CSRs are accessible at their privilege level + all higher privilege levels. The Use bits further segment address space into Standard and Custom CSRs. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: MIT’s xv6 teaching OS is a great introduction to RISC-V. For example, start() contains most of the logic required to jump from Machine (M) mode to Supervisor (S) mode. Original Tweet| danielmangum.com
Today’s @risc_v Tip: The RISC-V Hypervisor extension recently reached v1.0 and was ratified. Hypervisor (HS) mode extends Supervisor (S) mode with additional CSRs and instructions, allowing for two additional privilege levels: Virtual Supervisor (VS) and Virtual User (VU). Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Come join me (virtually or in person) on Monday to kick off #RISCVSummit! We’ll be talking about @crossplane_io and how open hardware fits into the open cloud 🏗 Schedule Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Supervisor (S) mode has a dedicated memory-management (mm) fence instruction (SFENCE.VMA) that ensures visible stores are ordered before subsequent implicit references to mm data structures. The values of registers in rs1 and rs2 dictate behavior. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Sv32 uses a two-level page table to enable virtual memory. A virtual address contains two Virtual Page Numbers (VPN) and an offset. The Physical Page Number (PPN) of the leaf Page Table Entry (PTE) is combined with the offset to form the physical address. Original Tweet| danielmangum.com
Bonus Holiday @risc_v Tip: If the SUM bit in the sstatus CSR is 0, a PTE with U=1 means the page is not accessible from Supervisor (S) mode. If SUM=1, supervisor access is permitted. A supervisor may never execute code from a page accessible from User (U) mode. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: The satp CSR configures Supervisor (S) mode address translation and protection. RV32 and RV64 support different sets of virtual addressing schemes, which can be configured by writing MODE bits. Writing an unsupported scheme is ignored (WARL). Original Tweet| danielmangum.com
Today’s @risc_v Tip: Our last PMP addressing mode is Top of Range (TOR). A TOR entry forms an address range from the preceding entry’s pmpaddr (inclusive) to its pmpaddr. If a TOR entry is in the first pmpconfig CSR, the bottom bound of its address range is 0. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Physical Memory Protection (PMP) and page-based virtual memory are designed to work together. Accessing virtual memory sometimes results in physical memory accesses. In those situations, the physical memory accesses are checked against PMP entries. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Looking again at Physical Memory Protection (PMP) address matching modes, Naturally Aligned Powers of Two (NAPOT) addressing allows you to specify ranges with four-byte granularity by encoding the size in the low-order bits of the pmpaddr CSRs. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Yesterday we talked about the Physical Memory Protection (PMP) unit. PMP entries are defined with separate config and address CSRs. The 8-bit config CSR for an entry dedicates 2 bits (A) to defining the matching mode of its address CSR. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: An optional Physical Memory Protection (PMP) unit is defined that enables memory access control for software on a given hart. This is accomplished through a set of configuration and address CSRs, and applies to all accesses with effective mode of S or U. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: You might assume some exceptions, such as an ecall from User mode (U), are handled in Supervisor mode (S), but all traps are handled in Machine mode (M) by default. However, the medeleg CSR can be used to delegate some traps to a lower privilege level.. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: Continuing with the Trigger Module (TM), the RISC-V debug spec provides access to triggers via the tselect and tdata1 / tdata2 / tdata3 CSRs. A list of supported triggers for a hart can be obtained by a sequence of write / read back operations. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: If you have ever worked with low-level programming, you may be familiar with the ambiguity of the terms exception / interrupt / trap. RISC-V clearly defines each in the unprivileged spec (S 1.6), with the definitions adhering to the IEEE-754 standard. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: The GNU assembler (gas) supports a few specific RISC-V directives, including .option, which allows you to modify assembler options inline. A common use case: temporarily disabling relaxation to perform initial global pointer load into gp register. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip (feat. @golang): Go uses a compare and swap (CAS) implementation similar to the example provided in the RISC-V unprivileged spec. The Go assembler always uses acquire access (aq=1) to align with the language’s memory requirements. Original Tweet| danielmangum.com
Tonight’s @risc_v Tip: You can list all the harts on the platform you are debugging with gdb using info threads. Switching to another hart can be done with thread . A quick way to check the current hart is to display the contents of the mhartid CSR. Original Tweet| danielmangum.com
Today’s @risc_v Tip: The mcause (M) / scause (S) CSRs indicate what type of exception caused a trap to the respective privilege level. A single instruction could cause multiple synchronous exceptions, in which case the register indicates event with the following priority: Original Tweet| danielmangum.com
Today’s @risc_v Tip: The 8th bit in the Supervisor (S) Status (sstatus) CSR is called the SPP bit and indicates the hart’s privilege level before entering S mode. 0 indicates User (U) mode, and 1 indicates other. An sret instruction changes mode to U if SPP is 0 and S if 1. Original Tweet| danielmangum.com
GDB tip of the day: info registers will show you the contents of general purpose registers, but info all-registers will expand the output to floating point registers and CSRs. Original Tweet| danielmangum.com