In the previous analysis of sequence parallelism, I covered two papers 1 2. Those are early works about sequence parallelism and didn’t get attention as there was low demand for context parallelism. After LLMs are required to have longer context support, new papers that tackle the problems of such early works. What are the Problems of Early Sequence Parallelism Works? # Both works follow the traditional attention computation: compute $QK^T$, apply mask.| Better Tomorrow with Computer Science
Perseus has been accepted to SOSP'24. Perseus is for optimizing energy consumption of large model distributed training while minimizing impact to training throughput.| Better Tomorrow with Computer Science
This post explains flash attention 1 2. More references are also useful to understand flash attention as well 3 4 5. Backgrounds # Attention # $$\text{Attention}(Q, K, V)=\text{softmax}(\frac{QK^T}{\sqrt{d^k}})V$$ This equation can be implemented as: class OPTAttention(nn.Module): def forward(...): # hidden states is an input tenor of Attention layer # Calculate Q, K, and V with linear projections to the input # query_states = self.q_proj(hidden_states) # key_states = self.k_proj(hidden_state...| Better Tomorrow with Computer Science
This post explains how tensor model parallelism and sequence parallelism work especially on attention layer, and how they are different. Backgrounds # Attention Layer # Attention calculation with a single sequence with T number of tokens. d_attn is config.embed_dim // config.num_attention_heads Bold boxes are model parameters, while others are temporarily created tensors. All the other terms are borrowed from HuggingFace Transformers OPT config. Implementation of computing attention:| Better Tomorrow with Computer Science
Recent days, many papers have been published to optimize LLM inference. This post introduces two of them, which focus on improving throughput by exploiting characteristics of batched LLM serving and characteristics of attention. Orca # Orca, published in OSDI'22, proposes two novel techniques: 1. continuous batcing (or iteration-level scheduling) 1, and 2. selective batching. Continuous Batching # Before the introduction of continuous batching, static batching starts batch at once and wait al...| Better Tomorrow with Computer Science
This post explains the basic of LLM inference, mainly focusing on differences from training LLM. Autreogressive Text Generation # Unlike training, where tokens are parallelized and trained, inference generates tokens one by one. Therefore, to create a full sentence, several forward pass should be executed (# tokens times). The following video from HuggingFace illustrates how it works. Autoregressive token generation. Source: HuggingFace Before generating the first token, LLM first puts all in...| Better Tomorrow with Computer Science
Torch fx # torch.fx is a PyTorch module that captures a model and applies transformation for optimization 1. In recent days, the importance of model optimization is getting more important. torch.fx enables transparent transformation without touching to the original model implementation, allowing fine-grained model optimization. Since PyTorch 2.0, it seems TorchDynamo replaces legacy fx.tracer for tracing the model. This post focuses on existing torch.fx module, and I will post another one reg...| Better Tomorrow with Computer Science
HF Transformers # HuggingFace (🤗) Transformers is a library that enables to easily download the state-of-the-art pretrained models. It is also possible to create and train a model from scratch, after modifying the structure of existing models. Although the library starts from transformer based language models, it became a general community hub and includes other models such as convolution based Resnet. It can easily be installed via pip 1: pip install transformers Most code is borrowed fro...| Better Tomorrow with Computer Science
I am going to work as an Intern at Tesla Autopilot Infrastructure team. Can’t wait to experience and work on large scale supercomputers that I have never seen before!| Better Tomorrow with Computer Science
Changed blog theme to Congo. Thanks James Panther for making a good theme!| Better Tomorrow with Computer Science
We exploit the inherent parallelism in the multi-head attention operation to partition the self-attention block (shown in Figure 5b). The key (K), Query (Q), and value (V) matrices can be partitioned in a column-parallel fashion. The output linear layer can then directly operate on the partitioned output of the attention operation (weight matrix partitioned across rows). Deepak Narayanan et al, Efficient large-scale language model training on GPU clusters using megatron-LM, SC'21| Better Tomorrow with Computer Science
This post analyzes transformer models, specifically memory and computation overhead. Many transformer based models just explain themselves as a model with X-B parameters; I wanted to break it down and look into the model structure how they are stored and used in actual computing hardware. Many illustrations and analysis are based on the following papers 1. Transformer-based Model # Since Google announed attention and transformer models in 2017 2, MLP and image classification models are rapidl...| Better Tomorrow with Computer Science
Distributed deep learning refers to use a distributed system that includes several workers to perform inference or training deep learning. Since mid 2010, people have been thinking about accelerating deep learning with scale-out, and distributd deep learning has been introduced. Parameter server is one of the well-known architecture for distributed deep learning. Recent days, many parallelization mechanisms, the way of distributing computation to multiple workers, have been introduced. First ...| Better Tomorrow with Computer Science
This post explains how a basic RPC framework can be implemented by using modern C++ functionalities. The explanation in this post is heavily inspired from: simple-rpc by evenleo buttonrpc-cpp14 by button-chen C++ Features used # Parameter Pack 1 # Parameter pack is similar to C variadic arguments that are used in printf() family: int printf(char *format, ...); which is implemented with va_ variadic function API [src]: int printf(const char* format, .| Better Tomorrow with Computer Science
In embedded CPU (ECPF: Embedded CPU Physical Function) mode of NVIDIA BlueField DPU, Open vSwitch (OvS) is used for packet processing. Once BlueField Linux is installed, several frameworks are installed together as well, and OvS is one of them. # in SmartNIC Linux $ systemctl status openvswitch-switch ● openvswitch-switch.service - LSB: Open vSwitch switch Loaded: loaded (/etc/init.d/openvswitch-switch; generated) Active: active (running) since Sun 2022-01-16 18:17:46 UTC; 1 day 2h ago Docs...| Better Tomorrow with Computer Science
SmartNIC is a new emerging hardware where a NIC with general-purpose CPU cores. NVIDIA BlueField2 equips 8 ARM Cortex A-72 cores, which can be used to process offloaded functions. This functions are not limited to packet processing, but can also be more complicated applications, e.g., file system, etc. This post talks about how to configure NVIDIA BlueField2 SmartNIC, on CloudLab r7525 machines. Regarding CloudLab, please refer to [this paper](https://www.usenix.org/conference/atc19/presentat...| Better Tomorrow with Computer Science
Libvirt provides virtual networking 1 to VMs with a virtual network switch, which is implemented with a bridge by default: $ ip addr ... 20: virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:e9:7b:57 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever libvirt provides several modes for this bridge, two of which will be introduced in this post so that we can directly access to VMs via net...| Better Tomorrow with Computer Science
This post explains how I measured Ceph RBD performance with block/network virtualization technology (virtio and vhost), and the result. VM execution is done through qemu-system-x86_64, without using libvirt. Host: Fedora 33 (Linux kernel 5.10.19) Guest: Ubuntu 20.04 LTS Server (Linux kernel 5.4.0.67) # Qemu general configuration qemu-system-x86_64 --machine q35,accel=kvm, --enable-kvm \ -m 2048 -smp 2 -cpu host \ -vnc :0 \ -hda disk.qcow2 \ Performance is measured with fio running on the gue...| Better Tomorrow with Computer Science
This post explains overheads of virtio, and introduce vhost that increases performance of virtio. Virtio Review & Vhost Introduction # virtio-net example diagram. [Source] VirtI/O is implemented with virtqueues, shared by the guest and QEMU process. When the guest rings a doorbell after inserting requests into the virtqueue, the context is forwarded to host KVM handler (VM-exit), and again to QEMU process via ioeventfd. QEMU process reads the request from the shared virtqueue, and handles it.| Better Tomorrow with Computer Science
This post explains virtio and vhost, device virtualization techniques used in Linux kernel virtual machine (KVM). Here, I focus only on block device and network device virtualization. Virtio (Virtual I/O) # Virtio is one of I/O (block, NIC, etc) virtualization techniques that are used in virtualization. It is a paravirtualized I/O solution that implements a set of communication framework for I/O interaction between guest applications and hypervisor 1 2, which means a device driver that is awa...| Better Tomorrow with Computer Science
This post explains how we can use a Ceph RBD as a QEMU storage. We can attach a Ceph RBD to a QEMU VM through either virtio-blk or vhost-user-blk QEMU device (vhost requires SPDK). Assume that a Ceph cluster is ready following the manual. Setting a Ceph client Configuration 1 # For a node to access a Ceph cluster, it requires some configuration: Config file setup User authentication setup This step requires an access permission to a host; a host refers to a node that is already configured as ...| Better Tomorrow with Computer Science
This post explains how Ceph daemons and clients communicate with each other, with Ceph network architecture. Ceph offical document provides a very high-level diagram that depicts the Ceph architecture: High level Ceph architecture. [src] However, I could not find out detailed documents explaining how clients with librados actually communicate with daemons, except a few blog post 1 2 3 4 5 6 7. Even after reading those, I was not clear how they work.| Better Tomorrow with Computer Science
This post explains how we build a container image inside a container, isolating all dependent packages into the container. The introduction below clearly shows why it is required. Lots of people would like to build OCI/container images within a system like Kubernetes. Imagine you have a CI/CD system that is constantly building container images, a tool like Red Hat OpenShift/Kubernetes would be useful for distributing the load of builds. Until recently, most people were leaking the Docker sock...| Better Tomorrow with Computer Science
This post explains how we can deploy a Ceph development cluster from Ceph source code. I tested it in Windows 10 + Docker for Windows with WSL2 engine + WSL2 Ubuntu 20.04. 1. Prerequisites # Two Github repositores are necessary: Ceph 1 and Ceph-dev-docker 2. Ceph dev docker is a kind of wrapper that automates all required steps for deloying Ceph development cluster. It users Docker container to deploy the local development of Ceph.| Better Tomorrow with Computer Science
Kubelet, at launch time, loads configuration files from pre-specified files. Changed configurations are not applied into the running Kubelet process during runtime, hence manual restarting Kubelet is required after modification. Dynamic Kubelet configuration eliminates this burden, making Kubelet monitors its configuration changes and restarts when it is updated1. It uses Kubernetes a ConfigMap object. Kubelet Flags for Dynamic Configuration # Dynamic kubelet configuration is not enabled by d...| Better Tomorrow with Computer Science
Fedora Silverblue # Fedora Silverblue 1 is an immutable desktop operating system based on Fedora Linux distribution. What immutable does mean is that most directories including rootfs (/) are mounted as read-only, and user applications run in an isolated execution environment. It is a part of Atomic Host project, and share the same underlying system with Fedora CoreOS (FCOS). For this purpose, Fedora Silverblue adopted two technologies:| Better Tomorrow with Computer Science
In [previous post], I briefly introduced a custom resource definition and how to create it through CLI. In this post, I introduce how to implement Go code that programatically specifies a CRD and a custom controllers that handles CRD events. Many tutorials are exist, but not perfect 1 2 3 4 [^tutorial4]. I by myself implement a new custom controller to fully understand how it works, and introduce almost full details here.| Better Tomorrow with Computer Science
One of main advantages of Kubernetes API is flexibility; users can add a custom resource to the Kubernetes cluster, and Kubernetes apiserver manages defined custom resources like standard resources (e.g. ReplicaSet, etc). Main introduction in Kubernetes document is in [here]. A major confusing point comes from ambiguous distinction between Custom Resource Definition (CRD) and Aggregated APIserver (AA). Even the document explains some differences of two types of implementation, it is not clear...| Better Tomorrow with Computer Science
This post explains the basic of RDMA programming. There are many examples and posts regarding this, however, I personally could not find enough explanations for the examples. It was hard to understand how it works, and here I summarize what I got. Backgrounds # Channel Adapter (CA) # Channel adapter refers an end node in the infiniband network. It is equivalent of Ethernet network interface card (NIC), but with more features regarding Infiniband and RDMA 1.| Better Tomorrow with Computer Science
Mellanox is a manufacturer of networking products based on infiniband, which in these days are used for Remote DMA (RDMA). Though their documents are explained and managed well in their [website], I cannot find how to build an infiniband device driver from source code they provide. Building Mellanox OFED source code: inside install script # Source code can be downloaded in [here]. Currently the latest version of MLNX_OFED is 4.7-3.2.9.0.| Better Tomorrow with Computer Science
Public Key Management and X.509 Certificates A self-signed certificate is a ceritificate, which is not signed by a certificate authority (CA) 1 2. (There is no parent-like CA when creating a CA, CA itself is a self-signed certificate.) When using Kubernetes, kubeadm automatically genereates a self-signed Kubernetes CA before generating other certificates. Steps to create a certificate 3 # Follow the steps to create a self-signed certificate: Generate a private key Generate a Certificate Signi...| Better Tomorrow with Computer Science
1 For access control, Kubernetes steps the procedures above for each API operation: authentication (who can access), authorization (what can be accessed), and admisssion control. This post is about Kubernetes authentication. All API accesses are handled by Kubernetes api server. All accesses have to be authenticated by the API server for Kubernetes operations. Kubernetes API server serve on 2 ports: one for testing, and the other for all other cases.| Better Tomorrow with Computer Science
In the previous post, I implemented a Go shim layer that enables c++ codes to use Go functionalities. This post dives a little bit deeper into CMake build system for this interaction. The following CMakeLists.txt provides a binary compilation altogether with compiling Go based static library. cmake_minimum_required(VERSION 3.0)project(test)set(TARGET_OUT test.out)set(TARGET_LIB test.lib)# Go configurations set(GO_SRCS test.go)set(GO_LIBNAME libtest.a)# Custom command for 'go build -buildmode=...| Better Tomorrow with Computer Science
Linking Go and C # Since Go 1.5, Go supports packaging Go codes into a shared or static library, which can be linked in C programs 1. package main // buildmode=[c-archive|c-shared] requires exactly one main package import "C" import "fmt" //export hello func hello(name string) { fmt.Printf("Hello from Go, %s!\n", name); } func main() {} ## as c-shared library go build -buildmode=c-shared -o libtest.so test.go ## as c-archive(static) library go build -buildmode=c-archive -o libtest.| Better Tomorrow with Computer Science
This post summarizes how to install cri-o container runtime and initialize a Kubernetes master node in Debian machine. Tested with Debian 10 running on a VirtualBox VM. root@kubernetesdebian:/etc# cat os-release PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" Installing cri-o # 0. Pr...| Better Tomorrow with Computer Science
cri-o # cri-o is a lightweight container runtime framework for Kubernetes. After introducing Open Container Initiative (OCI) container standard, Red Hat implemented cri-o to support the OCI standard and optimize performances by getting rid of unuseful features from Docker for Kubernetes; hence it is lightweight and for Kubernetes. {: width=“1000px”} cri-o Archituecture. It manages containers under the supervison of Kubelet, a node agent of Kubernetes.| Better Tomorrow with Computer Science
vscode running as a standalone app (lower right), and vscode frontend UI running in Safari web browser (upper left). Their looks are nearly identical except for menu, etc. Visual Studio Code # Visual Studio Code, implemented and managed by Microsoft, is one of the best open-source code editors over the world. I’am using this too for almost every works; programming codes, writing Markdowns, writing Latex, etc. With tremendous number of plugins, its functionality is nearly limitless.| Better Tomorrow with Computer Science
From Docker to Kubernetes, these days container solutions are emerging. {: .center-image width=“1000px”} Why Docker? source: https://www.docker.com. They clear state that Docker is a Container Runtime. In the era of Docker the term “Container runtime” was quite clear; the software that runs and manages containers. but as the internal architecture is being complicated and various container frameworks are introduced, this definition becomes unclear. Here are very clear explanations what...| Better Tomorrow with Computer Science
The Open Container Initiative (OCI) standard is an open standard for Linux containers. As born in 2013, Docker has been a de-facto standard of Linux container framework, but the OCI standard was born for a need of open standard, based on the Docker manifest. As the standard is based on Docker manifest, its specifications and structures are very similar to Dockers', enabling providing compatibilities between Docker and OCI-based container frameworks.| Better Tomorrow with Computer Science
In recent days, I barely managed my blog after graduation. Switching my most favorite code editor from Atom to VScode also makes it hard, since there is no plugins such like Jekyll Atom. It provides an integrated Jekyll commands, and several tools for post management. There is a plugin named Jekyll Snippets in VScode marketplace, , but I don’t think it is not comparable to Atom’s plugin. As I do not want to use several editors, I kept looking for a way to manage my blog with VScode, and I...| Better Tomorrow with Computer Science
Netlink Protocol # Netlink is a communication protocol between kernel and userspace. Unlike ioctl(), netlink is based on socket, which enables notification from the kernel to userspace. With ioctl(), the kernel can only send a response regarding to a user request. With netlink socket, however, user processes can be blocked via blocking functions such as recv() to receive any messages from the kernel. #include #include #include netlink_socket = socket (AF_NETLINK, socket_type, netlink_family)...| Better Tomorrow with Computer Science
Identifying the device # {: .center-image} [source] When a USB device is inserted to system, the very first initialization function to be started is drivers/usb/core/usb.c:usb_init(), written in [here]. The USB root hub driver (i.e. hcd) initiates the USB device initialization, the USB core takes the control and initializes an actual device structure struct usb_device. linux/include/linux/usb.h struct usb_device { int devnum; char devpath[16]; ... struct usb_device *parent; struct usb_bus *bu...| Better Tomorrow with Computer Science
What is udev? # udev (userspace /dev) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primaily manages device nodes in the /dev directory. At the same time, udev also handls all user space events raised when hardware devices are added into the system or removed from it, including firmware loading as reuqired by certain devices. https://en.wikipedia.org/wiki/Udev udev first appeared at Linux kernel version 2.| Better Tomorrow with Computer Science
NVMeDirect Overview # NVMeDirect is a software framework that is used to directly access to a NVMe SSD from a user space application. In the existing system, applications should access to a storage through several software I/O stacks. {: .center-image width=“800px”} As the storage media is getting faster, overheads by the software stack takes larger portion of I/O time. Hence, this software I/O stack is being reduced as shown in the above figure.| Better Tomorrow with Computer Science
I tried to use AMD’s GPU for research, with the software stack called AMD ROCm. AMD ROCm is a AMG discrete GPU compatible Heterogeneous System Architecure (HSA) framework. HSA is unique properties, different from OpenCL or CUDA: user mode queuing mode. {: .center-image} Instead of traditional driver centrl models, in HSA, user applications are responsible to create a request packet to the command queue heading to the GPU directly.| Better Tomorrow with Computer Science
This post is a study of the paper Operating Systems Challenges for GPU Resource Management (International Workshop on Operating Systems Platforms for Embedded Real-Time Applications, 2011), and Implementing Open-Source CUDA Runtime (Programming Symposium, 2013). {: .center-image width=“800px”} The GPU channel is an interface that bridges across the CPU and the GPU contexts, especially when sending GPU commands from the CPU to the GPU. GPU channel is the only way to send GPU commands to th...| Better Tomorrow with Computer Science
In the previous post, I showed how to link a trusted function that can be called insdie the enclave. However, Intel SGX provides a way to import EDL to make a library have an ECALL. The post from Intel is [here]. 1. Implementing a trusted SGX library # As we do in the previous post, make a trusted library. {: .center-image} So our simple trusted SGX library has a function named ecall_testlib_sample.| Better Tomorrow with Computer Science
Intel SGX Trusted Library # Trusted libraries are libraries that are linked to a SGX program, and used inside an enclave. Hence, it should follow SGX enclave restrictions to be used. According to Intel SGX SDK document, restrictions are as follow. Trusted libraries are static libraries that linked with the enclave binary. This functions/objects can only be used from within the enclave. (=ECALL cannot be implemented in a library) We should not link the enclave with any trusted library includin...| Better Tomorrow with Computer Science
I made two kernel modules, one of which calls a function of the other module, but it kept saying me that WARNING: "" undefined!. Even though I exported the function, there actually is another step that I should follow. References: http://stackoverflow.com/a/9499893 What I did before finding the reference was to export the target function. /kernel1/functions.h void function1(void); /kernel1/functions.c #include void function1(void){}; EXPORT_SYMBOL(function1); However, the other kernel module ...| Better Tomorrow with Computer Science
KVM is an acronym of “Kernel based Virtual Machine”, and is a virtualization infrastructure for the Linux kernel that turns it into a hypervisor. It is used with QEMU to emulate some peripherals, called QEMU-KVM. The basic architecture for KVM is as follows. {: .center-image} KVM Architecture. {: .center} QEMU process runs as a userspace process on top of the Linux kernel with KVM module, and a guest kernel runs on the of emulated hardware in QEMU.| Better Tomorrow with Computer Science
1. Initialize a Git Repository # $ git init This will create .git directory to store all information for version control. 2. Checking out a Remote Repository # $ git checkout https://github.com/username/abc.git $ git checkout https://github.com/username/abc.git branch_name The last .git can be omitted. This will copy files in the remote repository. You can directly checkout a branch by adding branch_name to the tail of the command. To checkout a remote branch into the existing git local repos...| Better Tomorrow with Computer Science
Virtual Function I/O (VFIO) # Introduced to replace the old-fashioned KVM PCI device assignment (virtio). Userspace driver interface Use IOMMU (AMD IOMMU, Intel VT-d, etc) Full PCI interrupt, MMIO and I/O port access, PCI configuration space access support Take an abstract view of a device: to support anything! VFIO Device Filer descriptor # located in /dev/vfio Each divided into regions Each region maps to a device resource (MMIO BAR, IO BAR, PCI configuration space) Region count and informa...| Better Tomorrow with Computer Science
GPU Model # {: .center-image width:600px} It explains several important designs that recent GPUs have adopted. MMIO. The CPU communicates with the GPU via MMIO. Hardware engines for DMA are supported for transferring large amounts of data, however, commands should be written via MMIO. The I/O ports can be used to indirectly access the MMIO regions, but rarely used. An open source device driver Nouveau currently never uses it.| Better Tomorrow with Computer Science
Environment # Host: Ubuntu 14.04.5 LTS, Linux kernel 4.6.0, Intel Core-i7 6700 Skylake processor Guest: Ubuntu 14.04.4 LTS, Linux kernel 3.16.5, QEMU-KVM based virtual machine (using Intel VT-x) 1. ENCLS # SGX Programming Reference, Section 5.2.1 ENCLS instruction is used to execute an enclave system function (privileged) of specified leaf number. Software specifies the leaf function by setting the appropriate value in the register EAX as input.| Better Tomorrow with Computer Science
Implementing a SGX SDK Function for the Instruction # 1. Function Declaration # Declare a Function into /linux-sgx/common/inc/sgx_urts.h. ... sgx_status_t SGXAPI sgx_create_abc(void); 2. Function Definition # Define a Function into any file in /linux-sgx/psw/urts/. I defined it in /linux-sgx/psw/urts/linux/urts.cpp, as follows. extern "C" sgx_status_t sgx_create_abc() { printf("Hello from %s!\n", __func__); return SGX_SUCCESS; } Also, you should define a function with the same name in /linux-...| Better Tomorrow with Computer Science
Many research papers have dealed about how SGX internally works, however, none have handled how SGX SDK works. This post explains how Intel Linux SGX SDK calls Intel SGX CPU instructions, to create an enclave. As we all know, There is an SGX instruction we use to create an enclave, EADD. This is a Intel CPU microcode instruction. However, a user program does not directly call this instruction, but calls sgx_create_enclave() SDK function.| Better Tomorrow with Computer Science
Memory mapping is one of the most important features to protect the memory system in Linux. Linux provides several functions to map a physical address into a virtual address. 1. mmap # Linux/fs/sysfs/bin.c:337 static int mmap(struct file* file, struct vm_area_struct* vma) {...} mmap() maps the file to a virtual memory. And this can be used with the special file /dev/mem (system memory) or /dev/kmem (kernel memory). 2. vm_insert_pfn # This function is called by sgx_enclave_add_page().| Better Tomorrow with Computer Science
1. ECREATE # [Intel SGX Explained p63] Section 5.3.1. Creation [Programming References p21] Section 5.3. ECREATE An enclave is born when the system software issues the ECREATE instruction, which turns a free EPC page into the SECS for the new enclave. ECREATE copies an SECS structure outside the EPC into an SECS page inside the EPC. The internal structure of SECS is not accessible to software. Software sets the following fields in the source structure: SECS:BASEADDR, SECS:SIZE, and ATTRIBUTES.| Better Tomorrow with Computer Science
All Figure numbers are same with those in the paper. Glossary # PMH: Page Miss Handler. MMU: Memory Management Unit. TLB: Translation Look-aside Buffer. FSM: Finite State Machine. EPC: Enclave Page Cache. EPCM: Enclave Page Cache Map. PRM: Processor Reserved Memory. ELRANGE: Enclave Linear Address Range. Address Translation # Concepts # Section 2.5.1 Address Translation Concepts System software relies on the CPU’s address translation mechanism for implementing isolation among less privilege...| Better Tomorrow with Computer Science
{: .center-image} {: width=“860px”, height=“390px”,} Atom is a modern, approachable text editor. Until the last year, I had used Sublime Text for a text editor. However, after I saw some post examining Atom, I switched to it. It is as modern as Sublime Text, and plugins are much much more powerful, I think. I am using Markdown (Especially, Markdown Preview), Latex, Git, and Jekyll package on Atom.| Better Tomorrow with Computer Science
After unexpectedly erasing all data in my previous blog, I did not make another blog for a long time. Now I restarted a blog, now with Jekyll and Github Pages. Comparing Wordpress, it has following advantages: The format of posts, Markdown, is easy to edit in anywhere. I need to accustom markup languages, e.g., LaTex or Markdown. Use Git system. chance to be more familiar with Git? Not as many plugins as in Wordpress, but all the necessary are provided and fast!| Better Tomorrow with Computer Science
This post introduces Go modules, introduced in Go version 1.11. Go Modules? # Go 1.11 introduces a new dependency mangement system, Go modules (That’s why Go uses the environment variable name GO111MODULE: indicating to use Go 1.11 module). Google introduced Go module as an alternative to GOPATH for versioning and package distribution. At first I did not understand what it means specifically. Here is my explanaion. Importing Packages without Go Modules # Go programmers can import third-part...| Better Tomorrow with Computer Science
Oobleck has been accepted to SOSP'23! Oobleck is a distributed training framework that supports fast fault tolerance without full restart from checkpoint.| Better Tomorrow with Computer Science
This post explains how we can accelerate buildig a Ceph RPM package. Knowledge in the post can be generally applied to packaging all other applications, not only Ceph. Ceph source code is managed by Github 1, and it contains several shell scripts for packaging as well. Before illustrating how these scripts work, we have to figure out how RPM packaging works. 1. RPM Packaing 101 # RPM (originally stands for Red Hat Package Manager) is a package management system developed by Red Hat 2.| Better Tomorrow with Computer Science
Years before, I posted how to use libibverbs for RDMA communication. When initializing queue pair connection, we need some destination information: bool changeQueuePairStateToRTR(struct ibv_qp* queue_pair, int ib_port, uint32_t destination_qp_number, uint16_t destination_local_id) { struct ibv_qp_attr rtr_attr; memset(&rtr_attr, 0, sizeof(rtr_attr)); rtr_attr.qp_state = ibv_qp_state::IBV_QPS_RTR; rtr_attr.path_mtu = ibv_mtu::IBV_MTU_1024; rtr_attr.rq_psn = 0; rtr_attr.max_dest_rd_atomic = 1; ...| Better Tomorrow with Computer Science
I/OAT (I/O Acceleration Technology) 1 # Intel I/OAT is a set of technologies for improving I/O performance. This post specifically illustrates how to use Intel QuickData Technology, which enables data copy by the chipset instead of the CPU, to move data more efficiently and provide fast throughput. Using Linux DMA Engine # I/OAT (specifically QuickData Technology) is implemented as ioatdma kernel module in Linux, and integrated into the Linux DMA subsystem.| Better Tomorrow with Computer Science
Configuring Ceph # Ceph daemons use /etc/ceph/ceph.conf by default for configuration. However, modern ceph clusters are initialized with cephadm, which deploys deach daemon in individual containers; then, how we can apply configuration changes to Ceph daemons? 1. Dynamic Configuration Injection 1 # Warning: it is not reliable; make sure that the changed parameter is active. Otherwise, use method 2. Use injectargs to inject configuration values into the existing values.| Better Tomorrow with Computer Science
Ceph is an open-source distributed software platform 1 2. It mainly focuses on scale-out file system including storage distribution and availability. Ceph Cluster Overview. [Source] Ceph Cluster Overview. [Source] Ceph Cluster Overview. [Source] A ceph storage cluster roughly consists of three components: Ceph Storage Nodes: equip physical storage media, and Ceph Object Storage Daemons (OSDs, or ceph-osd), Ceph Monitors (ceph-mon), and Ceph Managers (ceph-mgr) manage them.| Better Tomorrow with Computer Science
Flatpak is one of app sandboxing frameworks, along with AppImage and Snap 1. Although Snap is the most famous one, I think the internal architecture of Flatpak is more reliable. Fedora Silverblue and EndlessOS provide software installation primarily via Flathub, a central repository of Flatpak based applications 2 3. This post briefly summarizes how to use Flatpak in terms of implementing a sample applications. Installing Flatpak # In Ubuntu distributions, there is no Flatpak preinstalled, wh...| Better Tomorrow with Computer Science
Kubernetes # Kubernetes is is an container-based cluster orchestration tool, originally implemented by Google. It manages containerized workloads and services in clusters. Kubernetes is really an orchestration tool? Kubernetes does not call itself as an orchestration system, due to its different behaviors from the technical definition of “orchestration”. Orchestration (from [Wikipedia]) Orchestration is the automated configuration, coordination, and management of computer systems and soft...| Better Tomorrow with Computer Science
What is systemd? # systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as PID 1 and starts the rest of the system. https://www.freedesktop.org/wiki/Software/systemd systemd is now the init process running as PID 1 as indicated above. /sbin/init was the actual init process of Linux (also known as System V init boot system), it is now replaced with /usr/lib/systemd in many Linux distributions.| Better Tomorrow with Computer Science
There is a few information related to sealing, even no detailed explanation in the paper: Intel SGX explaned. All in this post are from Intel, with a little thought of mine. Sealing # Sealing is a service that Intel provides with Intel SGX technology for secure data saving. Intel SGX provides protections data only if it is in the enclave, part of main memory. Therefore, when the enclave process exits, the enclave will be destroyed and any data that is secured whithin the enclave will be lost.| Better Tomorrow with Computer Science
We already know that fork() and exec() are system calls for making a new process from user space. However, system calls cannot be called in kernel space. Then how to execute a process from kernel space? Usermode Helper API is for creating a user mode process from kernel space. Data structure that is used for the API is struct subprocess_info. /linux/include/kmod.h struct subprocess_info { struct work_struct work; struct completion* complete; const char* path; char** argv; char** envp; int wai...| Better Tomorrow with Computer Science
CPU affinity, also called CPU pinning, enables the binding of a process or a thread to a specific CPU core, or CPU cores. The following function is provided as a standard library to set affinity. [reference] #include int sched_setaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask); int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask); On success, schedu_setaffinity() and sched_getaffinity() return 0. On error, -1 is returned, and errno is set appropriately. cpu_set_t can ...| Better Tomorrow with Computer Science
Gdev # Gdev is an open-source CUDA software, containing device drivers, CUDA runtimes, CUDA/PTX compilers, and so on. You can download it from [here]. Detail implementations are described in the other paper that the author wrote, Implementing Open-Source CUDA Runtime. (link) Internal Implementation # {: .center-image} Implementation of Gdev Gdev uses the existing open-source NVIDIA device driver, Nouveau. It also supports NVIDIA proprietary drivers and pscnv as well, pscnv is not maintained a...| Better Tomorrow with Computer Science
I/O Hardware Overview # The basic I/O hardware elements, such as ports, buses, and device controllers, accomodate a wide variety of I/O devices. To encapsulate the details and oddities of different devices, the kernel of an operating system is structured to use device-driver modules. A device communicates with a computer system by sending signals over a cable or through the air. The device communicates with the machine via a connection point, or port.| Better Tomorrow with Computer Science