ICS EX 习题课

The Missing Course of Your CS Education

https://missing.csail.mit.edu/

UNIX 哲学

Keep it simple, stupid. (KISS)

Everything is a file and pipeline programs to work together

拥抱开源社区

C 语言拾遗：机制

预编译

预编译的实质——复制粘贴：

vgalaxy@vgalaxy-VirtualBox:~/Templates/include$ ll
total 36
drwxrwxr-x 2 vgalaxy vgalaxy  4096  9月 30 11:05 ./
drwxr-xr-x 4 vgalaxy vgalaxy  4096  9月 30 11:03 ../
-rw-rw-r-- 1 vgalaxy vgalaxy    67  9月 30 11:05 a.c
-rw-rw-r-- 1 vgalaxy vgalaxy    13  9月 30 11:04 a.inc
-rwxrwxr-x 1 vgalaxy vgalaxy 16096  9月 30 11:05 a.out*
-rw-rw-r-- 1 vgalaxy vgalaxy    18  9月 30 11:05 b
vgalaxy@vgalaxy-VirtualBox:~/Templates/include$ cat a.c
#include<stdio.h>

int main(){
  printf(
    #include "a.inc"
      );
}
vgalaxy@vgalaxy-VirtualBox:~/Templates/include$ cat a.inc
#include "b"
vgalaxy@vgalaxy-VirtualBox:~/Templates/include$ cat b
"Hello, world!\n"
vgalaxy@vgalaxy-VirtualBox:~/Templates/include$ ./a.out
Hello, world!

注意 #include "a.inc" 需要另起一行。

以下代码的区别：

#include <stdio.h>
#include "stdio.h"

观察查找路径，使用 gcc --verbose a.c 命令：

#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/10/include
 /usr/local/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.

添加当前目录到头文件查找路径中，使用 gcc --verbose a.c -I. 命令：

#include "..." search starts here:
#include <...> search starts here:
 .
 /usr/lib/gcc/x86_64-linux-gnu/10/include
 /usr/local/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.

另有 -L<dir> 选项，添加 dir 到库文件查找路径。关于头文件和库文件的区别，STFW。

找到 printf 的声明，便不需要添加 #include<stdio.h>

extern int printf (const char *__restrict __format, ...);

int main(){
    printf("Hello, world!\n");
}

在预编译中，变量未经定义便可使用，所以以下代码会输出 Yes

#include<stdio.h>

int main(){
#if aa == bb
    printf("Yes\n");
#else
    printf("No\n");
#endif
}

使用 gcc -E a.c 观察：

# 3 "a.c"
int main(){

 printf("Yes\n");



}

宏定义与展开：

#define A "aaaaaaaaaa"
#define TEN(x) x x x x x x x x x x
#define B TEN(A)
#define C TEN(B)
#define D TEN(C)
#define E TEN(D)
#define F TEN(E)
#define G TEN(F)


int main(){
    puts(G);
}

连接操作，躲避关键词过滤：

#define A sys ## tem

int main(){
    A("echo Hello\n");
}

X-Macros，元编程：

#define NAMES(X) X(A) X(B) X(C) X(D)

int main(){
#define PRINT(x) puts("Hello, "#x"!");
    NAMES(PRINT);
}

编译和链接

a.c → a.s → a.o

编译略。

链接，可以在 C++ 中链接 C，注意这里使用的是 g++：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat a.cc
extern "C" {
  int foo() { return 0; }
}

int bar() {
  return 0;
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ g++ -c a.cc
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ls
a.cc  a.o
vgalaxy@vgalaxy-VirtualBox:~/Templates$ objdump -d a.o

a.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  55                     push   %rbp
   5:  48 89 e5               mov    %rsp,%rbp
   8:  b8 00 00 00 00         mov    $0x0,%eax
   d:  5d                     pop    %rbp
   e:  c3                     ret

000000000000000f <_Z3barv>:
   f:  f3 0f 1e fa            endbr64
  13:  55                     push   %rbp
  14:  48 89 e5               mov    %rsp,%rbp
  17:  b8 00 00 00 00         mov    $0x0,%eax
  1c:  5d                     pop    %rbp
  1d:  c3                     ret

注意这里函数名 bar 变成了 _Z3barv，这被称为 C++ Name Mangling，是实现函数重载的底层机制，详见 https://zhuanlan.zhihu.com/p/359466948。

加载：进入 C 语言的世界

C 程序执行的两个视角：

静态：C 代码的连续一段总能对应到一段连续的机器指令
动态：C 代码执行的状态总能对应到机器的状态

两个视角的共同之处——内存：

代码、变量 (源代码视角)
地址、长度 (机器指令视角)

int main (int argc, char *argv[]) {
    int *p = (void *)1; // OK
    *p = 1; // Segmentation fault
}

第一行为指针 p 的值为 1，第二行为指针 p 所指的值为 1（触发段错误）。

内存模型：

地址：指针
类型：对指向的内存的解读

一切皆可取地址：

#include <stdio.h>

void printptr(void *p) {
    printf("p = %p; *p = %016lx\n", p, *(long*)p);
}

int x;

int main(int argc, char *argv[]) {
    printptr(main); // 代码
    printptr(&main);
    printptr(&x); // 数据
    printptr(&argc); // 堆栈
    printptr(argv);
    printptr(&argv);
    printptr(argv[0]);
}

可能的输出：

$ ./a.out
p = 0x562ea2e9017b; *p = e5894855fa1e0ff3
p = 0x562ea2e9017b; *p = e5894855fa1e0ff3
p = 0x562ea2e93014; *p = 0000000000000000
p = 0x7ffe9e89c20c; *p = 0000000000000001
p = 0x7ffe9e89c308; *p = 00007ffe9e89d2e8
p = 0x7ffe9e89c200; *p = 00007ffe9e89c308
p = 0x7ffe9e89d2e8; *p = 0074756f2e612f2e

另一个例子：

#include<stdio.h>
#include<assert.h>

int main (int argc, char *argv[]) {
    int (*f)(int, char*[]) = main;
    if(argc!= 0) {
        char***a = &argv, *first = argv [0], ch = argv[0][0];
        printf("arg = \"%s\"; ch = '%c'\n", first, ch);
        assert(***a == ch);
        f(argc- 1, argv+ 1);
    }
}

打印 main 函数的参数：

$ ./a.out 1 2 3 hello
arg = "./a.out"; ch = '.'
arg = "1"; ch = '1'
arg = "2"; ch = '2'
arg = "3"; ch = '3'
arg = "hello"; ch = 'h'

从 main 函数开始执行：

谁调用的 main
进程执行的第一条指令是什么

参考 https://xuxeu.gitee.io/main/。

C 语言拾遗：编程实践

怎样写代码才能从一个大型项目中存活下来？

核心准则：编写可读代码
两个例子：计数器，YEMU

The Art of Readable Code

原书见：软件工程师的自我修养

一个反例

$ wget https://www.ioccc.org/2011/hou/hou.c
$ gcc hou.c -lm -o hou

此处的 -lm 选项：

-nolibc

Do not use the C library or system libraries tightly coupled with it when linking.  Still link with the startup files, libgcc or toolchain provided language support libraries such as libgnat, libgfortran or libstdc++ unless options preventing their inclusion are used as well.  This typically removes -lc from the link command line, as well as system libraries that normally go with it and become meaningless when absence of a C library is assumed, for example -lpthread or -lm in some configurations.  This is intended for bare-board targets when there is indeed no C library available.

一个可能会碰到的例子

void (*signal (int sig, void (*func)(int)))(int);

使用 http://c-faq.com/decl/spiral.anderson.html 中的技巧，即 Clockwise/Spiral Rule。

signal 是一个函数，参数是一个 int 和一个函数（参数是 int，返回值是 void），返回值是一个函数（参数是 int，返回值是 void）。

我们可以改写上面的声明，通过 man signal 可得：

SYNOPSIS
       #include <signal.h>

       typedef void (*sighandler_t)(int);

       sighandler_t signal(int signum, sighandler_t handler);

编写代码的准则——降低维护成本：

宏观：做好分解和解耦（现实世界也是这样管理复杂系统）
微观：不言自明，不言自证

计数器（数码管显示）

FPGA

门层面

思路：状态（存储模拟）和计算模拟
技巧：使用预编译

$ wget http://jyywiki.cn/pages/ICS/2020/demos/logisim.c
$ wget http://jyywiki.cn/pages/ICS/2020/demos/display.py

logisim.c

#include <stdio.h>
#include <unistd.h>

#define FORALL_REGS(_)  _(X) _(Y)
#define OUTPUTS(_)      _(A) _(B) _(C) _(D) _(E) _(F) _(G)
#define LOGIC           X1 = !X && Y; \
                        Y1 = !X && !Y; \
                        A  = (!X && !Y) || (X && !Y); \
                        B  = 1; \
                        C  = (!X && !Y) || (!X && Y); \
                        D  = (!X && !Y) || (X && !Y); \
                        E  = (!X && !Y) || (X && !Y); \
                        F  = (!X && !Y); \
                        G  = (X && !Y);

#define DEFINE(X)   static int X, X##1;
#define UPDATE(X)   X = X##1;
#define PRINT(X)    printf(#X " = %d; ", X);

int main() {
  FORALL_REGS(DEFINE);
  OUTPUTS(DEFINE);
  while (1) { // clock
    LOGIC;
    OUTPUTS(PRINT);
    putchar('\n');
    fflush(stdout);
    sleep(1);
    FORALL_REGS(UPDATE);
  }

编译后输出：

$ ./a.out
A = 1; B = 1; C = 1; D = 1; E = 1; F = 1; G = 0;
A = 0; B = 1; C = 1; D = 0; E = 0; F = 0; G = 0;
A = 1; B = 1; C = 0; D = 1; E = 1; F = 0; G = 1;
A = 1; B = 1; C = 1; D = 1; E = 1; F = 1; G = 0;
A = 0; B = 1; C = 1; D = 0; E = 0; F = 0; G = 0;
A = 1; B = 1; C = 0; D = 1; E = 1; F = 0; G = 1;
A = 1; B = 1; C = 1; D = 1; E = 1; F = 1; G = 0;
A = 0; B = 1; C = 1; D = 0; E = 0; F = 0; G = 0;
A = 1; B = 1; C = 0; D = 1; E = 1; F = 0; G = 1;
A = 1; B = 1; C = 1; D = 1; E = 1; F = 1; G = 0;
A = 0; B = 1; C = 1; D = 0; E = 0; F = 0; G = 0;
A = 1; B = 1; C = 0; D = 1; E = 1; F = 0; G = 1;
A = 1; B = 1; C = 1; D = 1; E = 1; F = 1; G = 0;
^C

display.py

使用 turtle 库显示

#!/usr/bin/env python3

import turtle
import fileinput

A, B, C, D, E, F, G = 0, 0, 0, 0, 0, 0, 0

turtle.setup(width=240, height=320)
turtle.pen(shown=False, pendown=False, pencolor='#aa0000', pensize=5)
turtle.tracer(0)

stdin = fileinput.input()
while True:
  turtle.clear()
  turtle.home()
  turtle.setpos(-20, 0)
  for i, draw in enumerate([G, C, D, E, F, A, B]):
    turtle.penup(); turtle.forward(5)
    turtle.pendown() if draw else turtle.penup()
    turtle.forward(40)
    turtle.penup(); turtle.forward(5)
    turtle.right(90)
    if i == 3: turtle.left(90)
  turtle.update()
  exec(stdin.readline())

需要预先安装 python3-tk：

$ sudo apt-get install python3-tk

否则会报错 ModuleNotFoundError: No module named 'tkinter'

使用管道连接即可：

$ ./a.out | ./display.py

YEMU

简化的计算机系统，详见《计算机系统基础》第一章：

寄存器：PC, R0, R1, R2, R3 (8-bit)
内存：16 字节，按字节访问
指令：load, store, mov, add

存储模型：内存和寄存器（状态机视角）

模拟存储：全局变量，类型为 uint8_t

类型的碎碎念：

思考为什么使用无符号数

TODO

把指针转换成整数，应该用什么类型？

intptr_t，考虑 32 位和 64 位

软件设计——以寄存器为例

#include<stdint.h>

#define NREG 4
#define NMEM 16

typedef uint8_t u8;

u8 pc = 0, R[NREG], M[NMEM] ={ ... };

给寄存器名字：

enum
{ RA, R1, ..., PC };
u8 R[] =
{ [RA] = 0, [R1] = 0, ... [PC] = init_pc, };

#define pc (R[PC])
#define NREG (sizeof(R) / sizeof(u8))

减少 dependencies → 降低代码耦合程度：

// breaks when adding a register
#define NREG 5

// breaks when changing register size
#define NREG (sizeof(R) / sizeof(u8))

// never breaks
#define NREG (sizeof(R) / sizeof(R[0])) // 但需要R的定义

// even better (why?)
enum { RA, ... , PC, NREG }

软件设计——以指令实现为例

#include <assert.h>
#include <stdio.h>
#include "yemu.h"

typedef union inst {
  struct { u8 rs  : 2, rt: 2, op: 4; } rtype;
  struct { u8 addr: 4,        op: 4; } mtype;
} inst_t;

#define RTYPE(i) u8 rt = (i)->rtype.rt, rs = (i)->rtype.rs;
#define MTYPE(i) u8 addr = (i)->mtype.addr;

void idex() {
  inst_t *cur = (inst_t *)&M[pc];
  switch (cur->rtype.op) {
    case 0b0000: { RTYPE(cur); R[rt]   = R[rs];   pc++; break; }
    case 0b0001: { RTYPE(cur); R[rt]  += R[rs];   pc++; break; }
    case 0b1110: { MTYPE(cur); R[RA]   = M[addr]; pc++; break; }
    case 0b1111: { MTYPE(cur); M[addr] = R[RA];   pc++; break; }
    default: assert(0);
  }
}

union 和 bit fields，配合 struct，一段内存空间多种解读方式。

struct 的碎碎念：

在 C 中：
typedef struct {

} A;

A a;
在 C++ 中：
struct A {

};

A a;

一览 Makefile

.PHONY: run clean test

CFLAGS = -Wall -Werror -std=c11 -O2
CC = gcc
LD = gcc

yemu: yemu.o idex.o
    $(LD) $(LDFLAGS) -o yemu yemu.o idex.o

yemu.o: yemu.c yemu.h
    $(CC) $(CFLAGS) -c -o yemu.o yemu.c

idex.o: idex.c yemu.h
    $(CC) $(CFLAGS) -c -o idex.o idex.c

run: yemu
    @./yemu

clean:
    rm -f test yemu *.o

test: yemu
    $(CC) $(CFLAGS) -o test idex.o test.c && ./test

如何阅读全都是宏的程序

$ gcc -E test.c | indent - | vim -

此处的 indent 需要预先安装，用于排版 C 程序的显示。

NEMU 框架选讲 (1): 编译运行

The UNIX-Hater’s Handbook

介绍了 UNIX 的一些缺陷，如手册冗长，还有一些命令行工具的缺陷。

例如删除一个文件名为 -rf 的文件：

$ rm -rf

Fail!

-- 将后面的选项解析为文件名：

$ rm -- -rf

Git

.gitignore

即白名单。基本原则是，一切生成的文件都不放在 Git 仓库中。

观察 ~/ics2021/.gitignore：

*.*
*
!*/
!/nemu/*
!/nexus-am/*
!/nanos-lite/*
!/navy-apps/*
!Makefile
!README.md
!.gitignore
!init.sh
/fceux-am
/am-kernels

再观察 ~/ics2021/nemu/.gitignore：

*.*
*
!*/
!Makefile
!*.mk
!*.[cSh]
!.gitignore
!README.md
!Kconfig
include/config
include/generated

前两行大概是忽略所有文件，之后以 ! 开头表示除了……，例如除了 .c 文件、.S 文件和 .h 文件。

模板库：https://github.com/github/gitignore
手册：https://git-scm.com/docs/gitignore

.git

是一个隐藏目录，藏在 ~/ics2021 中，记录了所有快照信息。

下面解释 make submit 发生了什么。总体来说，提交脚本仅上传 .git，在服务器执行 git reset。

观察 ~/ics2021/Makefile：

include nemu/scripts/git.mk

default:
    @echo "Please run 'make' under any subprojects to compile."

submit:
    git gc
    STUID=$(STUID) STUNAME=$(STUNAME) bash -c "$$(curl -s http://jyywiki.cn/static/submit.sh)"

.PHONY: default submit

可以看到一个链接 http://jyywiki.cn/static/submit.sh，下载后打开：

COURSE=ICS2021
MODULE=$(git rev-parse --abbrev-ref HEAD | tr '[a-z]' '[A-Z]')
FILE=/tmp/upload.tar.bz2
tar caf "$FILE" ".git" $(find . -maxdepth 2 -name "*.pdf") && \
  curl -F "token=$TOKEN" \
       -F "stuid=$STUID" \
       -F "stuname=$STUNAME" \
       -F "course=$COURSE" \
       -F "module=$MODULE" \
       -F "file=@$FILE" \
       http://jyywiki.cn/upload

大概工作流程先定位阶段，存储在 MODULE 中：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ git branch
  master
  pa0
  pa1
* pa2
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ git rev-parse --abbrev-ref HEAD
pa2

然后将最近的 .git 快照和 pdf 文件打包为 /tmp/upload.tar.bz2，最后上传到服务器。

git trace

注意到之前的文件包含了 include nemu/scripts/git.mk 文件，我们观察一下：

STUID = XXX
STUNAME = XXX

# DO NOT modify the following code!!!

GITFLAGS = -q --author='tracer-ics2021 <tracer@njuics.org>' --no-verify --allow-empty

# prototype: git_commit(msg)
define git_commit
    -@git add $(NEMU_HOME)/.. -A --ignore-errors
    -@while (test -e .git/index.lock); do sleep 0.1; done
    -@(echo "> $(1)" && echo $(STUID) && hostnamectl && uptime) | git commit -F - $(GITFLAGS)
    -@sync
endef

这里解释了我们每一次 make 和 make run 时记录的 git 信息：

vgalaxy@vgalaxy-VirtualBox:~$ hostnamectl
   Static hostname: vgalaxy-VirtualBox
         Icon name: computer-vm
           Chassis: vm
        Machine ID: beeabc33a4e84c64ba86a73bb19c624a
           Boot ID: 293d87d068e74a8ea858bbff7c29835d
    Virtualization: oracle
  Operating System: Ubuntu 21.04
            Kernel: Linux 5.11.0-16-generic
      Architecture: x86-64
vgalaxy@vgalaxy-VirtualBox:~$ uptime
 20:12:26 up 11:08,  1 user,  load average: 0.62, 0.59, 0.44

git add $(NEMU_HOME)，配合前面的白名单理解。

应用：如何抓抄袭

make

yemu

一个简单的例子

.PHONY: run clean test

CFLAGS = -Wall -Werror -std=c11 -O2
CC = gcc
LD = gcc

yemu: yemu.o idex.o
    $(LD) $(LDFLAGS) -o yemu yemu.o idex.o

yemu.o: yemu.c yemu.h
    $(CC) $(CFLAGS) -c -o yemu.o yemu.c

idex.o: idex.c yemu.h
    $(CC) $(CFLAGS) -c -o idex.o idex.c

run: yemu
    @./yemu

clean:
    rm -f test yemu *.o

test: yemu
    $(CC) $(CFLAGS) -o test idex.o test.c && ./test

CFLAGS 为编译选项，CC 表示生成可重定位目标文件，LD 表示生成可执行目标文件。

Lab

Lab 的 Makefile 呈现出层次关系。

在 multimod 目录下的 Makefile 文件：

NAME := $(shell basename $(PWD))
all: $(NAME)-64 $(NAME)-32
export MODULE := Lab1
export STUID = XXX
export STUNAME = XXX
include ../Makefile

这里的 NAME 即为 multimod：

vgalaxy@vgalaxy-VirtualBox:~/ics-workbench/multimod$ basename $PWD
multimod

生成的可执行文件有两个，multimod-64 和 multimod-32。

再来观察 ics-workbench 目录下的 Makefile：

ifeq ($(NAME),)
$(error Should make in each lab's directory)
endif

SRCS   := $(shell find . -maxdepth 1 -name "*.c")
DEPS   := $(shell find . -maxdepth 1 -name "*.h") $(SRCS)
CFLAGS += -O1 -std=gnu11 -ggdb -Wall -Werror -Wno-unused-result -Wno-unused-value -Wno-unused-variable

.PHONY: all git test clean commit-and-make

.DEFAULT_GOAL := commit-and-make
commit-and-make: git all

$(NAME)-64: $(DEPS) # 64bit binary
    gcc -m64 $(CFLAGS) $(SRCS) -o $@ $(LDFLAGS)

$(NAME)-32: $(DEPS) # 32bit binary
    gcc -m32 $(CFLAGS) $(SRCS) -o $@ $(LDFLAGS)

$(NAME)-64.so: $(DEPS) # 64bit shared library
    gcc -fPIC -shared -m64 $(CFLAGS) $(SRCS) -o $@ $(LDFLAGS)

$(NAME)-32.so: $(DEPS) # 32bit shared library
    gcc -fPIC -shared -m32 $(CFLAGS) $(SRCS) -o $@ $(LDFLAGS)

clean:
    rm -f $(NAME)-64 $(NAME)-32 $(NAME)-64.so $(NAME)-32.so

include ../Makefile.lab

其中包含了 Makefile.lab：

# **DO NOT MODIFY**

export COURSE := ICS2021
URL := 'http://jyywiki.cn/static/submit-icslab.sh'

submit:
    @cd $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) && \
      curl -sSL '$(URL)' | bash

git:
    @git add $(shell find . -name "*.c") $(shell find . -name "*.h") -A --ignore-errors
    @while (test -e .git/index.lock); do sleep 0.1; done
    @(hostnamectl && uptime) | git commit -F - -q --author='tracer-nju <tracer@nju.edu.cn>' --no-verify --allow-empty
    @sync

Makefile.lab 里面是提交脚本。

我们观察 make 将会执行的指令：

vgalaxy@vgalaxy-VirtualBox:~/ics-workbench/multimod$ make -nB
git add ./main.c ./multimod.c  -A --ignore-errors
while (test -e .git/index.lock); do sleep 0.1; done
(hostnamectl && uptime) | git commit -F - -q --author='tracer-nju <tracer@nju.edu.cn>' --no-verify --allow-empty
sync
gcc -m64 -O1 -std=gnu11 -ggdb -Wall -Werror -Wno-unused-result -Wno-unused-value -Wno-unused-variable ./main.c ./multimod.c -o multimod-64
gcc -m32 -O1 -std=gnu11 -ggdb -Wall -Werror -Wno-unused-result -Wno-unused-value -Wno-unused-variable ./main.c ./multimod.c -o multimod-32

简单来说，就是先 git trace，然后编译出两个可执行文件。

默认 make 的行为是 commit-and-make，也就是先 git，后 all，而 all 依赖两个可执行文件，这就可以说通了。

NEMU

终于来到了重头戏。

static

首先是 ics2021 主目录下的 Makefile：

include nemu/scripts/git.mk

default:
    @echo "Please run 'make' under any subprojects to compile."

submit:
    git gc
    STUID=$(STUID) STUNAME=$(STUNAME) bash -c "$$(curl -s http://jyywiki.cn/static/submit.sh)"

.PHONY: default submit

这里是 make submit 的地方，之前已经提过了。

下面是 ~/ics2021/nemu 下的 Makefile，有一点复杂，我们简单理一下包含关系：

# Include variables and rules generated by menuconfig
-include $(NEMU_HOME)/include/config/auto.conf
-include $(NEMU_HOME)/include/config/auto.conf.cmd

...

# Include rules for menuconfig
include $(NEMU_HOME)/scripts/config.mk

...

# Include rules to build NEMU
include $(NEMU_HOME)/scripts/native.mk

auto.conf 里面就是 menuconfig 生成的各种宏。

config.mk 则提供了 make menuconfig。

关键在 native.mk 中：

include $(NEMU_HOME)/scripts/git.mk
include $(NEMU_HOME)/scripts/build.mk

include $(NEMU_HOME)/tools/difftest.mk

compile_git:
    $(call git_commit, "compile")
$(BINARY): compile_git

# Some convenient rules

override ARGS ?= --log=$(BUILD_DIR)/nemu-log.txt
override ARGS += $(ARGS_DIFF)

# Command to execute NEMU
IMG ?=
NEMU_EXEC := $(BINARY) $(ARGS) $(IMG)

run-env: $(BINARY) $(DIFF_REF_SO)

run: run-env
    $(call git_commit, "run")
    $(NEMU_EXEC)

gdb: run-env
    $(call git_commit, "gdb")
    gdb -s $(BINARY) --args $(NEMU_EXEC)

clean-tools = $(dir $(shell find ./tools -name "Makefile"))
$(clean-tools):
    -@$(MAKE) -s -C $@ clean
clean-tools: $(clean-tools)
clean-all: clean distclean clean-tools

.PHONY: run gdb run-env clean-tools clean-all $(clean-tools)

build.mk 是真正编译代码的地方，即 CC 和 LD，而 git.mk 上面已经提到过了。

每一次 make / make run / make gdb，都会进行 commit：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ grep -nr git_commit
scripts/git.mk:8:# prototype: git_commit(msg)
scripts/git.mk:9:define git_commit
scripts/native.mk:7:  $(call git_commit, "compile")
scripts/native.mk:22:  $(call git_commit, "run")
scripts/native.mk:26:  $(call git_commit, "gdb")

下面代换 make run 中的 NEMU_EXEC：

NEMU_EXEC := $(BINARY) $(ARGS) $(IMG)

其中 BINARY 在 build.mk 中：

BINARY   = $(BUILD_DIR)/$(NAME)$(SO)

其中 NAME 在 ~/ics2021/nemu 下的 Makefile 中：

remove_quote = $(patsubst "%",%,$(1))

# Extract variabls from menuconfig
GUEST_ISA ?= $(call remove_quote,$(CONFIG_ISA))
ENGINE ?= $(call remove_quote,$(CONFIG_ENGINE))
NAME    = $(GUEST_ISA)-nemu-$(ENGINE)

在 ~/ics2021/nemu/.config 中：

CONFIG_ISA="riscv32"
...
CONFIG_ENGINE="interpreter"

也就是运行 riscv32-nemu-interpreter 这个可执行文件。

另一方面，在 build.mk 中：

.PHONY: app clean

app: $(BINARY)

$(BINARY): $(OBJS) $(ARCHIVES)
    @echo + LD $@
    @$(LD) -o $@ $(OBJS) $(LDFLAGS) $(ARCHIVES) $(LIBS)

clean:
    -rm -rf $(BUILD_DIR)

LD 出现，OBJS 中又有 OBJ_DIR：

OBJS = $(SRCS:%.c=$(OBJ_DIR)/%.o) $(CXXSRC:%.cc=$(OBJ_DIR)/%.o)

# Compilation patterns
$(OBJ_DIR)/%.o: %.c
    @echo + CC $<
    @mkdir -p $(dir $@)
    @$(CC) $(CFLAGS) -c -o $@ $<
    $(call call_fixdep, $(@:.o=.d), $@)

$(OBJ_DIR)/%.o: %.cc
    @echo + CXX $<
    @mkdir -p $(dir $@)
    @$(CXX) $(CFLAGS) $(CXXFLAGS) -c -o $@ $<
    $(call call_fixdep, $(@:.o=.d), $@)

CC 出现。

dynamic

静态分析完后，我们进行动态分析：

make -nB | grep -ve '^\(\#\|echo\|mkdir\)' | vim -

在 vim 中进一步筛选：

:g/fixdep/d
:g/mv/d

观察第一行：

gcc -O2 -MMD -Wall -Werror -I/home/vgalaxy/ics2021/nemu/include -I/home/vgalaxy/ics2021/nemu/src/engine/interpreter -I/home/vgalaxy/ics2021/nemu/src/isa/riscv32/include -O2  -ggdb3  -DITRACE_COND=true -D__GUEST_ISA__=riscv32 -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/nemu-main.o src/nemu-main.c

-D__GUEST_ISA__=riscv32 表示传入宏。

键入 Ctrl+v 进入 VISUAL BLOCK 模式，删去编译选项。

gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/nemu-main.o src/nemu-main.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/cpu/difftest/dut.o src/cpu/difftest/dut.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/cpu/cpu-exec.o src/cpu/cpu-exec.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/monitor.o src/monitor/monitor.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/watchpoint.o src/monitor/sdb/watchpoint.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/expr.o src/monitor/sdb/expr.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/sdb.o src/monitor/sdb/sdb.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/log.o src/utils/log.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/timer.o src/utils/timer.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/state.o src/utils/state.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/rand.o src/utils/rand.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/memory/paddr.o src/memory/paddr.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/memory/vaddr.o src/memory/vaddr.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/port-io.o src/device/io/port-io.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/map.o src/device/io/map.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/mmio.o src/device/io/mmio.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/engine/interpreter/hostcall.o src/engine/interpreter/hostcall.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/engine/interpreter/init.o src/engine/interpreter/init.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/system/intr.o src/isa/riscv32/system/intr.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/system/mmu.o src/isa/riscv32/system/mmu.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/reg.o src/isa/riscv32/reg.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/difftest/dut.o src/isa/riscv32/difftest/dut.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/instr/decode.o src/isa/riscv32/instr/decode.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/logo.o src/isa/riscv32/logo.c
gcc -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/init.o src/isa/riscv32/init.c
g++ -O2 -MMD -Wall -Werror -I/home/vgalaxy/ics2021/nemu/include -I/home/vgalaxy/ics2021/nemu/src/engine/interpreter -I/home/vgalaxy/ics2021/nemu/src/isa/riscv32/include -O2  -ggdb3  -DITRACE_COND=true -D__GUEST_ISA__=riscv32 -I/usr/lib/llvm-11/include -std=c++14   -fno-exceptions -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -fPIE -c -o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/disasm.o src/utils/disasm.cc
git add /home/vgalaxy/ics2021/nemu/.. -A --ignore-errors
while (test -e .git/index.lock); do sleep 0.1; done
(echo ">  "compile"" && echo 201250012 && hostnamectl && uptime) | git commit -F - -q --author='tracer-ics2021 <tracer@njuics.org>' --no-verify --allow-empty
sync

前面是一堆 gcc 和一个 g++，接着是 git trace。

最后一个是比较长的 LD：

g++ -o /home/vgalaxy/ics2021/nemu/build/riscv32-nemu-interpreter /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/nemu-main.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/cpu/difftest/dut.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/cpu/cpu-exec.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/monitor.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/watchpoint.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/expr.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/monitor/sdb/sdb.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/log.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/timer.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/state.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/rand.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/memory/paddr.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/memory/vaddr.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/port-io.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/map.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/device/io/mmio.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/engine/interpreter/hostcall.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/engine/interpreter/init.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/system/intr.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/system/mmu.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/reg.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/difftest/dut.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/instr/decode.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/logo.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/isa/riscv32/init.o /home/vgalaxy/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/utils/disasm.o -O2 -O2  -ggdb3   -lreadline -ldl -pie -lLLVM-11

可以看到最终生成的可执行文件 riscv32-nemu-interpreter。

AbstractMachine

其 Makefile 文件中有：

### *Get a more readable version of this Makefile* by `make html` (requires python-markdown)
html:
    cat Makefile | sed 's/^\([^#]\)/    \1/g' | markdown_py > Makefile.html
.PHONY: html

docs as code

实际上需要安装的是 python3-markdown

Summary

编程 ≠ 闷头写代码
使用工具也是编程的一部分
- version-control systems: git, svn, …
- build systems: make, cmake (C++), maven (Java), …
- shell: bash, zsh, …

NEMU 框架选讲 (2): 代码导读

项目规模

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ find . -name "*.c" -o -name "*.h" | xargs cat | wc -l
23415
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ cd tools/
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu/tools$ find . -name "*.c" -o -name "*.h" | xargs cat | wc -l
18927

实际部分只有 5000+ 行。

main

首先要找到 main 函数：

$ grep -n main $(find . -name "*.c")
$ find . | xargs grep --color -nse '\<main\>'

Vim 当然也支持：

:vimgrep /\<main\>/ **/*.c

使用 :cn, :cp 切换。

或者使用 fzf 寻找。

找到之后，研究其中的代码，预编译后的结果为：

int main(int argc, char *argv[]) {
  init_monitor(argc, argv);
  engine_start();
  return is_exit_status_bad();
}

通过 ctags，从 main 转到 init_monitor 再转到 parse_args，参数 argc 和 argv 一直在被传递。

这里要区分 parse_args 中的参数和 sdb_mainloop 中解析的参数，后者是简易调试器中传入的参数，而前者是真正运行可执行程序时传入的参数。

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ make run
+ LD /home/vgalaxy/ics2021/nemu/build/riscv32-nemu-interpreter
/home/vgalaxy/ics2021/nemu/build/riscv32-nemu-interpreter --log=/home/vgalaxy/ics2021/nemu/build/nemu-log.txt
[src/utils/log.c:13 init_log] Log is written to /home/vgalaxy/ics2021/nemu/build/nemu-log.txt
[src/memory/paddr.c:36 init_mem] physical memory area [0x80000000, 0x88000000]
[src/monitor/monitor.c:34 load_img] No image is given. Use the default build-in image.
[src/monitor/monitor.c:13 welcome] Trace: ON
[src/monitor/monitor.c:14 welcome] If trace is enabled, a log file will be generated to record the trace. This may lead to a large log file. If it is not necessary, you can disable it in menuconfig
[src/monitor/monitor.c:17 welcome] Build time: 17:23:04, Oct  2 2021
Welcome to riscv32-NEMU!
For help, type "help"
(nemu)

通过 make run 可知，传入可执行程序的参数为 --log=/home/vgalaxy/ics2021/nemu/build/nemu-log.txt 。

所以我们可以不通过 make run 执行程序：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ./build/riscv32-nemu-interpreter --help
Usage: ./build/riscv32-nemu-interpreter [OPTION...] IMAGE [args]

  -b,--batch              run with batch mode
  -l,--log=FILE           output log to FILE
  -d,--diff=REF_SO        run DiffTest with reference REF_SO
  -p,--port=PORT          run DiffTest with port PORT

使用 -b 进入 batch mode，注意带上 —log 选项，否则会输出两次：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ./build/riscv32-nemu-interpreter -b --log=/home/vgalaxy/ics2021/nemu/build/nemu-log.txt
[src/utils/log.c:13 init_log] Log is written to /home/vgalaxy/ics2021/nemu/build/nemu-log.txt
[src/memory/paddr.c:36 init_mem] physical memory area [0x80000000, 0x88000000]
[src/monitor/monitor.c:34 load_img] No image is given. Use the default build-in image.
[src/monitor/monitor.c:13 welcome] Trace: ON
[src/monitor/monitor.c:14 welcome] If trace is enabled, a log file will be generated to record the trace. This may lead to a large log file. If it is not necessary, you can disable it in menuconfig
[src/monitor/monitor.c:17 welcome] Build time: 17:23:04, Oct  2 2021
Welcome to riscv32-NEMU!
For help, type "help"
[src/cpu/cpu-exec.c:122 cpu_exec] nemu: HIT GOOD TRAP at pc = 0x8000000c
[src/cpu/cpu-exec.c:55 statistic] host time spent = 18 us
[src/cpu/cpu-exec.c:56 statistic] total guest instructions = 4
[src/cpu/cpu-exec.c:57 statistic] simulation frequency = 222,222 instr/s

于是自动化测试成为了可能：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ cat input
si
q
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ./build/riscv32-nemu-interpreter --log=/home/vgalaxy/ics2021/nemu/build/nemu-log.txt < input
[src/utils/log.c:13 init_log] Log is written to /home/vgalaxy/ics2021/nemu/build/nemu-log.txt
[src/memory/paddr.c:36 init_mem] physical memory area [0x80000000, 0x88000000]
[src/monitor/monitor.c:34 load_img] No image is given. Use the default build-in image.
[src/monitor/monitor.c:13 welcome] Trace: ON
[src/monitor/monitor.c:14 welcome] If trace is enabled, a log file will be generated to record the trace. This may lead to a large log file. If it is not necessary, you can disable it in menuconfig
[src/monitor/monitor.c:17 welcome] Build time: 17:23:04, Oct  2 2021
Welcome to riscv32-NEMU!
For help, type "help"
(nemu) si
0x80000000: b7 02 00 80 lui  t0, 524288
(nemu) q

更好的方法是将测试写入 Makefile 中，修改 native.mk 文件：

test: run-env
    $(NEMU_EXEC) < input

甚至可以躲过 git trace

static & inline

static

static 声明告诉编译器符号不要泄露到编译单元 (translation unit) 之外。

来看一个例子：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat a.c
#include "a.h"

void foo(){
  f();
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat b.c
#include "a.h"

void bar(){
  f();
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat main.c
int main(){
  return 0;
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat a.h
int f() {return 0;}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc a.c b.c main.c
/usr/bin/ld: /tmp/ccjWoe4C.o: in function `f':
b.c:(.text+0x0): multiple definition of `f'; /tmp/ccpMMypI.o:a.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

将 a.h 中的 f 改为 static int f() {return 0;} 即可成功编译。

前面的 parse_args 函数就有 static 声明。

这也是为什么不在头文件里定义函数的原因，两个 translation unit 同时引用，就导致 multiple definition。

思考题：为什么 C++ 能把 class 都定义到头文件里，像 vector 的实现就是直接粘贴进去的 (TODO)

反汇编成功编译后的可执行文件：

0000000000001129 <f>:
    1129:  f3 0f 1e fa            endbr64
    112d:  55                     push   %rbp
    112e:  48 89 e5               mov    %rsp,%rbp
    1131:  b8 00 00 00 00         mov    $0x0,%eax
    1136:  5d                     pop    %rbp
    1137:  c3                     ret

0000000000001138 <foo>:
    1138:  f3 0f 1e fa            endbr64
    113c:  55                     push   %rbp
    113d:  48 89 e5               mov    %rsp,%rbp
    1140:  b8 00 00 00 00         mov    $0x0,%eax
    1145:  e8 df ff ff ff         call   1129 <f>
    114a:  90                     nop
    114b:  5d                     pop    %rbp
    114c:  c3                     ret

000000000000114d <f>:
    114d:  f3 0f 1e fa            endbr64
    1151:  55                     push   %rbp
    1152:  48 89 e5               mov    %rsp,%rbp
    1155:  b8 00 00 00 00         mov    $0x0,%eax
    115a:  5d                     pop    %rbp
    115b:  c3                     ret

000000000000115c <bar>:
    115c:  f3 0f 1e fa            endbr64
    1160:  55                     push   %rbp
    1161:  48 89 e5               mov    %rsp,%rbp
    1164:  b8 00 00 00 00         mov    $0x0,%eax
    1169:  e8 df ff ff ff         call   114d <f>
    116e:  90                     nop
    116f:  5d                     pop    %rbp
    1170:  c3                     ret

可以发现生成了两个 f。

static inline

我们添加一个文件 x.c，包含了 f 但是并没有使用，就会报 warning：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat x.c
#include "a.h"
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -Wall -Werror a.c b.c x.c main.c
In file included from x.c:1:
a.h:1:12: error: ‘f’ defined but not used [-Werror=unused-function]
    1 | static int f() {return 0;}
      |            ^
cc1: all warnings being treated as errors

这时将 a.h 中的 f 改为 static inline int f() {return 0;} 即可成功编译。

例如 src/isa/riscv32/local-include/reg.h 中的 check_reg_idx 就有 static inline 声明。

使用 inline 可以在调用 check_reg_index 时编译优化成零开销。

反汇编成功编译后的可执行文件，仅 a.c 和 b.c：

0000000000001129 <f>:
    1129:  55                     push   %rbp
    112a:  48 89 e5               mov    %rsp,%rbp
    112d:  b8 00 00 00 00         mov    $0x0,%eax
    1132:  5d                     pop    %rbp
    1133:  c3                     ret

0000000000001134 <foo>:
    1134:  f3 0f 1e fa            endbr64
    1138:  55                     push   %rbp
    1139:  48 89 e5               mov    %rsp,%rbp
    113c:  b8 00 00 00 00         mov    $0x0,%eax
    1141:  e8 e3 ff ff ff         call   1129 <f>
    1146:  90                     nop
    1147:  5d                     pop    %rbp
    1148:  c3                     ret

0000000000001149 <f>:
    1149:  55                     push   %rbp
    114a:  48 89 e5               mov    %rsp,%rbp
    114d:  b8 00 00 00 00         mov    $0x0,%eax
    1152:  5d                     pop    %rbp
    1153:  c3                     ret

0000000000001154 <bar>:
    1154:  f3 0f 1e fa            endbr64
    1158:  55                     push   %rbp
    1159:  48 89 e5               mov    %rsp,%rbp
    115c:  b8 00 00 00 00         mov    $0x0,%eax
    1161:  e8 e3 ff ff ff         call   1149 <f>
    1166:  90                     nop
    1167:  5d                     pop    %rbp
    1168:  c3                     ret

似乎还是有略微不同的。

如果用更高的编译优化级别，两种情况下 f 都会被优化掉。

assert

#define assert(cond) if (!(cond)) panic(...);

注意特殊情况：

if (...) assert(0);
else ...

if -> if -> true: panic
if -> if -> false: else

所以应该用花括号括起来：

#define assert(cond) \
  do { \
    if (!(cond)) { \
      fprintf(stderr, "Fail @ %s:%d", __FILE__, __LINE__); \
      exit(1); \
    } \
  } while (0)

#define assert(cond) ({ ... })

格式控制字符串

static void welcome() {
  Log("Trace: %s", MUXDEF(CONFIG_TRACE, ASNI_FMT("ON", ASNI_FG_GREEN), ASNI_FMT("OFF", ASNI_FG_RED)));
  IFDEF(CONFIG_TRACE, Log("If trace is enabled, a log file will be generated "
        "to record the trace. This may lead to a large log file. "
        "If it is not necessary, you can disable it in menuconfig"));
  Log("Build time: %s, %s", __TIME__, __DATE__);
  printf("Welcome to %s-NEMU!\n", ASNI_FMT(str(__GUEST_ISA__), ASNI_FG_YELLOW ASNI_BG_RED));
  printf("For help, type \"help\"\n");
}

注意以 ASNI_ 开头的宏，定位 include/utils.h：

#define ASNI_FG_BLACK   "\33[1;30m"
#define ASNI_FG_RED     "\33[1;31m"
#define ASNI_FG_GREEN   "\33[1;32m"
#define ASNI_FG_YELLOW  "\33[1;33m"
#define ASNI_FG_BLUE    "\33[1;34m"
#define ASNI_FG_MAGENTA "\33[1;35m"
#define ASNI_FG_CYAN    "\33[1;36m"
#define ASNI_FG_WHITE   "\33[1;37m"
#define ASNI_BG_BLACK   "\33[1;40m"
#define ASNI_BG_RED     "\33[1;41m"
#define ASNI_BG_GREEN   "\33[1;42m"
#define ASNI_BG_YELLOW  "\33[1;43m"
#define ASNI_BG_BLUE    "\33[1;44m"
#define ASNI_BG_MAGENTA "\33[1;35m"
#define ASNI_BG_CYAN    "\33[1;46m"
#define ASNI_BG_WHITE   "\33[1;47m"
#define ASNI_NONE       "\33[0m"

#define ASNI_FMT(str, fmt) fmt str ASNI_NONE

我们观察原始的欢迎信息：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ./build/riscv32-nemu-interpreter --log=/home/vgalaxy/ics2021/nemu/build/nemu-log.txt | less

可以看到：

ESC[1;34m[src/utils/log.c:13 init_log] Log is written to /home/vgalaxy/ics2021/nemu/build/nemu-log.txtESC[0m
ESC[1;34m[src/memory/paddr.c:36 init_mem] physical memory area [0x80000000, 0x88000000]ESC[0m
ESC[1;34m[src/monitor/monitor.c:34 load_img] No image is given. Use the default build-in image.ESC[0m
ESC[1;34m[src/monitor/monitor.c:13 welcome] Trace: ESC[1;32mONESC[0mESC[0m
ESC[1;34m[src/monitor/monitor.c:14 welcome] If trace is enabled, a log file will be generated to record the trace. This may lead to a large log file. If it is not necessary, you can disable it in menuconfigESC[0m
ESC[1;34m[src/monitor/monitor.c:17 welcome] Build time: 17:23:04, Oct  2 2021ESC[0m
Welcome to ESC[1;33mESC[1;41mriscv32ESC[0m-NEMU!
For help, type "help"

说明这里对 less 的自适应没有做好。

另一个例子是 ls 命令：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ls | less

看到的内容和原本一致，只不过没有了颜色。

我们只有这样：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ ls --color | less

才能看到格式控制字符串：

ESC[0mESC[01;34mbuildESC[0m
ESC[01;34mconfigsESC[0m
ESC[01;34mincludeESC[0m
input
Kconfig
Makefile
README.md
ESC[01;34mresourceESC[0m
ESC[01;34mscriptsESC[0m
ESC[01;34msrcESC[0m
tags
ESC[01;34mtoolsESC[0m

也可以直接输入玩一玩：

vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ echo -e "\033[34m HAHA \033[0m"
 HAHA
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ echo -e "\033[43;34m HAHA \033[0m"
 HAHA
vgalaxy@vgalaxy-VirtualBox:~/ics2021/nemu$ echo -e "\033[44;37m HAHA \033[0m"
 HAHA

VS Code

本质上是一个前端
现代的设计理念
各种选项的配置
两个快捷键：Ctrl + P 和 Ctrl + Shift + P

Segmentation Fault

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat demo.c
void bar(){
  *((int *)0)=0;
}

void foo(){
  bar();
}

int main(){
  foo();
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -g demo.c
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gdb ./a.out

此处使用 -g 生成调试信息。

一些有用的 gdb 命令：

bt: 函数调用轨迹
layout asm
layout src

Reading symbols from ./a.out...
(gdb) run
Starting program: /home/vgalaxy/Templates/a.out

Program received signal SIGSEGV, Segmentation fault.
bar () at demo.c:2
2    *((int *)0)=0;
(gdb) bt
#0  bar () at demo.c:2
#1  0x0000555555555151 in foo () at demo.c:6
#2  0x0000555555555166 in main () at demo.c:10

再谈如何阅读全都是宏的程序

元编程~~害死人~~

在 build.mk 里添加 gcc -E：

# Compilation patterns
$(OBJ_DIR)/%.o: %.c
    @echo + CC $<
    @mkdir -p $(dir $@)
    @$(CC) $(CFLAGS) -c -o $@ $<
    @$(CC) $(CFLAGS) -E -MF /dev/null $< | \
        grep -ve '^#' | \
        clang-format - > $(basename $@).i
    $(call call_fixdep, $(@:.o=.d), $@)

代价是编译时间会略微变长。

注意需要安装 clang-format，另外注意这里的缩进都是 Tab 而非 Space，所产生的 .i 等文件在 ~/ics2021/nemu/build/obj-riscv32-nemu-interpreter/src/ 中，不会被 git 所记录

数据的机器级表示

位运算

与或非，单一门电路
异或，两级或三级门电路
移位，移位器

加法比上述运算在电路上实现 fundamentally 更困难：

与或非：常数级门电路深度
加法：对数级门电路深度（进位）

Circuit Complexity

例子：一代传奇处理器 8-bit MOS 6502

3510 晶体管；56 条指令，算数指令仅有加减法和位运算

单指令多数据

如果我们操作的对象刚好每一个 bit 是独立的

我们在一条指令里就实现了多个操作
SIMD (Single Instruction, Multiple Data)

Bit Set

C++ bitset

以 32-bit 为例：

x \implies S \in \{1,2,...,31\}

如 0101 对应 {0,2}。

基本操作

元素属于集合
集合并元素
集合并集合
集合交集合
集合减集合

集合的大小

朴素的想法：元素属于集合，遍历相加

另：

int bitset_size1(uint32_t S) { // SIMD
  S = (S & 0x55555555) + ((S >> 1) & 0x55555555);
  S = (S & 0x33333333) + ((S >> 2) & 0x33333333);
  S = (S & 0x0F0F0F0F) + ((S >> 4) & 0x0F0F0F0F);
  S = (S & 0x00FF00FF) + ((S >> 8) & 0x00FF00FF);
  S = (S & 0x0000FFFF) + ((S >> 16) & 0x0000FFFF);
  return S;
}

以 8-bit 为例，如 01101001

step1: 01010101

一次观察 2 位，得到 01000001+00010100=01010101，即每 2 位中有 1 个 1

step2: 00110011

一次观察 4 位，得到 00010001+00010001=00100010，即每 4 位中有 2 个 1

step3: 00001111

一次观察 8 位，得到 00000010+00000010=00000100，即每 8 位中有 4 个 1，得到结果

结果间有依赖关系，对现代处理器不友好，通常处理是分成几个部分分别查表

lowbit

树状数组

lowbit(x) = x & -x

-x 即 ~x+1
如 lowbit(01001000) = 00001000

联系补码的求法立得。

LOG

朴素的想法：找到 bit 串最左边的 1

int clz(uint32_t x) {
    int n = 0;
    if (x <= 0x0000ffff) n += 16, x <<= 16;
    if (x <= 0x00ffffff) n +=  8, x <<= 8;
    if (x <= 0x0fffffff) n +=  4, x <<= 4;
    if (x <= 0x3fffffff) n +=  2, x <<= 2;
    if (x <= 0x7fffffff) n ++;
    return n;
}

void LOG(int i) {
    int res = clz(i);
    printf("%d %d\n", i, 31 - res);
}

另：查表

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat log2.c
#include <stdio.h>
#include <stdint.h>
#include "log2.inc"

int main() {
  for (int i = 0; i < 64; i++) {
    uint64_t x = 1ULL << i;
    printf("%016lx %d\n", x, LOG2(x));
  }
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat log2.inc
#define LOG2(x) ("-01W2?XG3<@kYCHf4-=:AnlLZNDcI\\g_5P-V>F;jBeo9mKMb[^OUEid8Ja]Th7`S6RQ"[(x) % 67] - '0')

注意这里 LOG2 的参数必须为 2 的幂才有效。

而得到这样的表则需要一点点元编程：

#!/usr/bin/python3

import json

n, base = 64, '0'
for m in range(n, 10000):
  if len({ (2**i) % m for i in range(n) }) == n:
    M = { j: chr(ord(base) + i)
      for j in range(0, m)
        for i in range(0, n)
          if (2**i) % m == j }
    break

magic = json.dumps(''.join(
  [ M.get(j, '-') for j in range(0, m) ]
  )).strip('"')

print(f'#define LOG2(x) ("{magic}"[(x) % {m}] - \'{base}\')')

TODO

Hacker’s Delight

各种奇技淫巧

Undefined Behavior

常见的 UB

非法内存访问 (空指针解引用、数组越界、写只读内存等)、被零除、有符号整数溢出、函数没有返回值……

为什么 C/C++ 会有 UB

为了尽可能高效 (zero-overhead)
为了兼容多种硬件体系结构

编译器优化

参考：

警惕整数溢出

-fwrapv 可以强制有符号整数溢出为 wraparound

一个例子：https://godbolt.org/z/XWMrOR

参考：www.cs.utah.edu/~regehr/papers/overflow12.pdf

浮点数 & IEEE 754

一个有关浮点数大小/密度的实验 (float.c)

越大的数字，距离下一个实数的距离就越大

可能会带来相当的绝对误差
因此很多数学库都会频繁做归一化

一个例子：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat sum_float.c
#define SUM(T, st, ed, d) ({ \
  T s = 0; \
  for (int i = st; i != ed + d; i += d) \
    s += (T)1 / i; \
  s; \
})

#include <stdio.h>
#define N 1000000

int main() {
  printf("%.16f\n", SUM(float, 1, N, 1));
  printf("%.16f\n", SUM(float, N, 1, -1));
  printf("%.16lf\n", SUM(double, 1, N, 1));
  printf("%.16lf\n", SUM(double, N, 1, -1));
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
14.3573579788208008
14.3926515579223633
14.3927267228649889
14.3927267228657723

不满足结合律。

小知识：

比较：a == b 需要谨慎判断
零：+0.0, -0.0 的 bit 是不一样的，但 +0.0 == -0.0
Inf: 浮点数溢出很常见，不应该作为 undefined behavior
NaN (0.0/0.0)：能够满足 x != x 表达式的值

另一个例子：一元二次方程求根公式

参考：https://docs.oracle.com/cd/E19957-01/800-7895/800-7895.pdf

综合案例

平方倒数

float Q_rsqrt( float number ) {
  union { float f; uint32_t i; } conv;
  float x2 = number * 0.5F;
  conv.f = number;
  conv.i = 0x5f3759df - ( conv.i >> 1 ); // ???
  conv.f = conv.f * ( 1.5F - ( x2 * conv.f * conv.f ) );
  return conv.f;
}

https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf

https://www.bilibili.com/video/BV1v64y1i7KH

Lab 1

int64_t multimod_fast(int64_t a, int64_t b, int64_t m) {
  int64_t x = (int64_t)((double)a * b / m) * m;
  int64_t t = (a * b - x) % m;
  return t < 0 ? t + m : t;
}

超过某个值后就会失效。

TODO

x86-64 与内联汇编

机器字长

字长 (Word Size)

能直接进行整数/位运算的大小
指针的大小 (索引内存的范围)

ABI

Application Binary Interface (ABI)

区别于 API (Application Programming Interface)

程序源代码中的规范

ABI：约定 binary 的行为

二进制文件的格式
函数调用、系统调用……
- C 语言规范只定义了运行时内存和内存上的计算
- printf 都无法实现，必须借助外部库函数
链接、加载的规范

calling convention

Argument Passing and Naming Conventions:

https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions?view=msvc-160

caller stack frame:

所有参数以数组的形式保存在堆栈上（反序压栈）
然后是返回地址（以指令后的地址相对跳转：考虑电路设计、指令前缀……）

x86-64：寄存器与函数调用

x86-64 扩充了很多寄存器，于是再也不用像 IA32 那样，用堆栈传递参数了：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat a.c
int f(int a, int b) {
  return a + b;
}
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -c -m32 -O2 -fno-pic a.c
vgalaxy@vgalaxy-VirtualBox:~/Templates$ objdump -d a.o

a.o:     file format elf32-i386


Disassembly of section .text:

00000000 <f>:
   0:  f3 0f 1e fb            endbr32
   4:  8b 44 24 08            mov    0x8(%esp),%eax
   8:  03 44 24 04            add    0x4(%esp),%eax
   c:  c3                     ret
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -c -m64 -O2 a.c
vgalaxy@vgalaxy-VirtualBox:~/Templates$ objdump -d a.o

a.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <f>:
   0:  f3 0f 1e fa            endbr64
   4:  8d 04 37               lea    (%rdi,%rsi,1),%eax
   7:  c3                     ret

支持 6 个参数的传递：rdi, rsi, rdx, rcx, r8, r9

callee 可以随意修改这 6 个寄存器的值
编译器有了更大的调度空间（大幅减少堆栈内存访问）

另一个例子：

void plus(int a, int b) {
  fprintf(stdout, "%d + %d = %d\n", a, b, a + b);
}

-O2 -m64 得到可执行目标文件后，反汇编：

0000000000001170 <plus>:
    1170:  f3 0f 1e fa            endbr64
    1174:  44 8d 0c 37            lea    (%rdi,%rsi,1),%r9d
    1178:  89 f9                  mov    %edi,%ecx
    117a:  48 8b 3d 8f 2e 00 00   mov    0x2e8f(%rip),%rdi        # 4010 <stdout@@GLIBC_2.2.5>
    1181:  41 89 f0               mov    %esi,%r8d
    1184:  48 8d 15 79 0e 00 00   lea    0xe79(%rip),%rdx        # 2004 <_IO_stdin_used+0x4>
    118b:  be 01 00 00 00         mov    $0x1,%esi
    1190:  31 c0                  xor    %eax,%eax
    1192:  e9 b9 fe ff ff         jmp    1050 <__fprintf_chk@plt>
    1197:  66 0f 1f 84 00 00 00   nopw   0x0(%rax,%rax,1)
    119e:  00 00

注意到 plus 中的 fprintf 被优化为了 jmp，因为 plus 没有返回值。

并不是调用的 printf，而是调用的有堆栈检查的版本：

准备参数时有 mov $0x1, %esi

直接 jmp 是因为函数末尾默认有一条 ret 指令：

借用了 __fprintf_chk@plt 的 ret 指令返回到 plus 的调用者
如果有返回值，就会生成 call 指令；如果 plus 返回 printf 的结果，依然是 jmp
省的不止是一条指令
- 连续的 ret 对分支预测是很大的挑战

nopw 也许是为了内存对齐。

总体来说，x86-64 是更现代的体系结构，更精简的指令序列。

the thinking of risc

补：

repz retq：https://repzret.org/p/repzret/

CSAPP：x86-64 Machine-Level Programming

Symbolic Execution

Inline Assembly

How to Use Inline Assembly Language in C Code

https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

DSL，领域特定语言

下面的例子编译级别均为 -O2

一个简单的例子

a.c

int foo(int x, int y) {
  x++; y++;
  asm ("endbr64;"
       ".byte 0xf3, 0x0f, 0x1e, 0xfa"
  );
  return x + y;
}

注意是一个字符串，发生了字符串串联。

endbr64 是为了维护控制流的完整性，参见 https://stackoverflow.com/questions/56905811/what-does-the-endbr64-instruction-actually-do。

a.s

    endbr64
#APP
# 3 "a.c" 1
    endbr64;.byte 0xf3, 0x0f, 0x1e, 0xfa
# 0 "" 2
#NO_APP
    leal    2(%rdi,%rsi), %eax
    ret

编译器就做个复制粘贴，不会检查指令合法性。

a.o

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  f3 0f 1e fa            endbr64
   8:  f3 0f 1e fa            endbr64
   c:  8d 44 37 02            lea    0x2(%rdi,%rsi,1),%eax
  10:  c3                     ret

汇编器会检查指令合法性。

与 C 世界交互

a.c

int foo(int x, int y) {
  int z;
  x++; y++;
  asm (
    "addl %1, %2; "
    "movl %2, %0; "
    : "=r"(z) // output
    : "r"(x), "r"(y) // input
  );
  return z;
}

a.s

    endbr64
    addl    $1, %edi
    leal    1(%rsi), %eax
#APP
# 4 "a.c" 1
    addl %edi, %eax; movl %eax, %eax;
# 0 "" 2
#NO_APP
    ret

a.o

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  83 c7 01               add    $0x1,%edi
   7:  8d 46 01               lea    0x1(%rsi),%eax
   a:  01 f8                  add    %edi,%eax
   c:  89 c0                  mov    %eax,%eax
   e:  c3                     ret

下面来解释一下：

  asm (
    "addl %1, %2; "
    "movl %2, %0; "
    : "=r"(z) // output
    : "r"(x), "r"(y) // input
  );

变量 z 对应 %0，变量 x 对应 %1，变量 y 对应 %2
r 表示 register operand constraint，编译器会分配一个通用寄存器存放该变量的值

When the “r” constraint is specified, gcc may keep the variable in any of the available GPRs (General Purpose Registers).

= 作为 constraint modifier，代表只写，不写的话会报错

output operand constraint lacks ‘=’

This operand is write-only for this instruction; the previous value is discarded and replaced by output data.

对于 constraint，还可以匹配已有的寄存器：

  asm (
    "addl %1, %0; "
    : "=r"(z) // output
    : "r"(x), "0"(y) // input
  );

matching constraint

matching constraint not valid in output operand

对应的反汇编：

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  83 c7 01               add    $0x1,%edi
   7:  8d 46 01               lea    0x1(%rsi),%eax
   a:  01 f8                  add    %edi,%eax
   c:  c3                     ret

也可以指定寄存器：

  asm (
    "addl %%ecx, %%edx; "
    : "=c"(z) // output
    : "d"(x), "c"(y) // input
  );

对应的反汇编：

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  8d 57 01               lea    0x1(%rdi),%edx
   7:  8d 4e 01               lea    0x1(%rsi),%ecx
   a:  01 ca                  add    %ecx,%edx
   c:  89 c8                  mov    %ecx,%eax
   e:  c3                     ret

进一步的话题

volatile

假如我们显式的返回另外一个值：

int foo(int x, int y) {
  int z;
  x++; y++;
  asm (
    "addl %1, %2; "
    "movl %2, %0; "
    : "=r"(z) // output
    : "r"(x), "r"(y) // input
  );
  return 0;
}

在较高的编译优化级别下，就会变成：

0000000000000000 <foo>:
   0:  f3 0f 1e fa            endbr64
   4:  31 c0                  xor    %eax,%eax
   6:  c3                     ret

指令序列没有副作用，被优化了

为此，我们可以使用 volatile 关键字：

  asm volatile (
    "addl %1, %2; "
    "movl %2, %0; "
    : "=r"(z) // output
    : "r"(x), "r"(y) // input
  );

clobber list

While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction.

In order to inform the compiler of these changes, list them in the clobber list.

考虑这样一个例子：

#include <stdio.h>

void foo(int x, int y) {
    asm volatile (
            "decl %%edi; "
            "movl %%edi, %%ecx; "
            :
            :
            //: "ecx"
            );
    printf("%d %d\n", x, y);
}

int main() {
    foo(2021, 2);
}

若不使用 clobber list，反汇编结果为：

0000000000001170 <foo>:
    1170:  f3 0f 1e fa            endbr64
    1174:  89 fa                  mov    %edi,%edx
    1176:  89 f1                  mov    %esi,%ecx
    1178:  ff cf                  dec    %edi
    117a:  89 f9                  mov    %edi,%ecx
    117c:  48 8d 35 81 0e 00 00   lea    0xe81(%rip),%rsi        # 2004 <_IO_stdin_used+0x4>
    1183:  bf 01 00 00 00         mov    $0x1,%edi
    1188:  31 c0                  xor    %eax,%eax
    118a:  e9 c1 fe ff ff         jmp    1050 <__printf_chk@plt>
    118f:  90                     nop

联系之前的参数传递，__printf_chk@plt 的四个参数分别为：

flag
format
edi: x -> edx
esi: y -> ecx

然而内联汇编指令等价于将 x-1 的值赋给了 y，所以结果为：

2021 2020

我们使用 clobber list，告知编译器寄存器 ecx 可能会被修改。

因为编译器只会简单的复制粘贴

于是反汇编结果为：

0000000000001170 <foo>:
    1170:  f3 0f 1e fa            endbr64
    1174:  89 fa                  mov    %edi,%edx
    1176:  ff cf                  dec    %edi
    1178:  89 f9                  mov    %edi,%ecx
    117a:  bf 01 00 00 00         mov    $0x1,%edi
    117f:  89 f1                  mov    %esi,%ecx
    1181:  48 8d 35 7c 0e 00 00   lea    0xe7c(%rip),%rsi        # 2004 <_IO_stdin_used+0x4>
    1188:  31 c0                  xor    %eax,%eax
    118a:  e9 c1 fe ff ff         jmp    1050 <__printf_chk@plt>
    118f:  90                     nop

可见 mov %esi,%ecx 指令是在内联汇编指令之后才执行的，所以结果为：

2021 2

constraint modifier &

TODO

调试：理论与实践

金句

机器永远是对的
未测代码永远是错的

`软件工程 & bug`

需求 → 设计 → 代码 → Fault (bug) → Error (程序状态错) → Failure (可观测)

调试理论

如果我们能判定任意程序状态的正确性，那么给定一个 failure，我们可以通过二分查找定位到第一个 error 的状态，此时的代码就是 fault (bug)。

基础工具

gdb
printf

本质上都是 state 的 trace

实践

输入/配置可以看成是程序的一部分。把所有问题当程序来调试。

来看一个例子：

vgalaxy@vgalaxy-VirtualBox:~$ curl baidu.com
<html>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</html>
vgalaxy@vgalaxy-VirtualBox:~$ export PATH=
vgalaxy@vgalaxy-VirtualBox:~$ curl baidu.com
bash: curl: No such file or directory

其中的 error message 是由 perror 打印的。

strace

思考 curl 指令的执行与 PATH 环境变量有关。我们使用 strace 工具：

vgalaxy@vgalaxy-VirtualBox:~$ cat a.sh
curl baidu.com
vgalaxy@vgalaxy-VirtualBox:~$ strace -f bash a.sh 2>&1 | less

其中 2>&1 表示将标准错误重定向到标准输出。

可以从中发现：

newfstatat(AT_FDCWD, "/usr/lib/ccache/bash", 0x7ffc68a50a00, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/bash", 0x7ffc68a50a00, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/bash", 0x7ffc68a50a00, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/bash", 0x7ffc68a50a00, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/bash", {st_mode=S_IFREG|0755, st_size=1404744, ...}, 0) = 0

对应了 PATH 中的查找路径：

vgalaxy@vgalaxy-VirtualBox:~$ echo $PATH
/usr/lib/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

option

gcc -v a.c
make -nB

NEMU

两个层次的状态机：

NEMU 本身 -> gdb
模拟的状态机 -> monitor

再谈`软件工程 & bug`

需求 → 设计 → 代码

不言自明：风格
不言自证：逻辑流

例：AVL 的旋转

flowchart TD t ---> y y ---> x y ---> C x ---> A x ---> B

列表：

left right parent
t
x
y
a
b
c

一目了然。

Fault → Error

测试

例：Lab 1

正确的实现

Error → Failure

断言（即需求）

例：二叉查找树

// 结构约束
assert(u->parent == u ||
       u->parent->left  == u ||
       u->parent->right == u);
assert(!u->left  || u->left->parent  == u);
assert(!u->right || u->right->parent == u);

// 数值约束
assert(!u->left  || u->left->val  < u->val);
assert(!u->right || u->right->val > u->val);

更多的断言：

Address Sanitizer（动态程序分析）

链接与加载选讲

https://www.gnu.org/software/binutils/

https://www.gnu.org/software/coreutils/

静态链接与加载

int foo(int a, int b) {
  return a + b;
}

// b.c
int x = 100, y = 200;

// main.c
extern int x, y;
int foo(int a, int b); // 可以试试 extern int foo;
int main() {
  printf("%d + %d = %d\n", x, y, foo(x, y));
}

-fno-pic 编译：不生成位置无关代码
-static 链接：静态链接

CFLAGS  := -O2 -fno-pic
LDFLAGS := -static

gcc: a.o b.o main.o
    gcc $(LDFLAGS) a.o b.o main.o

ld: a.o b.o main.o
    ld a.o b.o main.o -e main

a.o: a.c
    gcc $(CFLAGS) -c a.c

b.o: b.c
    gcc $(CFLAGS) -c b.c

main.o: main.c
    gcc $(CFLAGS) -c main.c

clean:
    rm -f *.o a.out

gcc

间接调用链接器 ld，详见 ld 部分

我们使用 objdump -d 查看 main.o 对应的反汇编代码，可以发现 mov 指令中通过寄存器 rip 寻址的立即数为 0，也就是为 foo 函数准备的参数 x 和 y：

   8:  8b 35 00 00 00 00      mov    0x0(%rip),%esi        # e <main+0xe>
   e:  8b 3d 00 00 00 00      mov    0x0(%rip),%edi        # 14 <main+0x14>

我们使用 objdump -D 查看 b.o 的 .data 段：

Disassembly of section .data:

0000000000000000 <y>:
   0:  c8 00 00 00            enter  $0x0,$0x0

0000000000000004 <x>:
   4:  64 00 00               add    %al,%fs:(%rax)

可以发现 y 的值确实为 200，x 的值为 100。

那么 main.o 是如何做到重定位的呢？我们使用 readelf -a 观察 main.o：

Relocation section '.rela.text.startup' at offset 0x208 contains 7 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000500000002 R_X86_64_PC32     0000000000000000 y - 4
000000000010  000600000002 R_X86_64_PC32     0000000000000000 x - 4
000000000015  000700000004 R_X86_64_PLT32    0000000000000000 foo - 4
00000000001b  000500000002 R_X86_64_PC32     0000000000000000 y - 4
000000000021  000600000002 R_X86_64_PC32     0000000000000000 x - 4
000000000026  00020000000a R_X86_64_32       0000000000000000 .rodata.str1.1 + 0
000000000035  000800000004 R_X86_64_PLT32    0000000000000000 __printf_chk - 4

可以看到在 main 函数偏移 0x00000000000a 处应填入偏移 y - 4 处的内容。

注意这里的 y 是一个符号，代表一个地址，在这个符号的地址中存放着数据（代码）

由于 Type 表明该条目的大小为 4 字节，即一个 int，并且此时 PC 指向下一条指令，故偏移为 y - 4：

  ↓0xa
-----------------------------------
  |   4B   |               | y |
-----------------------------------
           |←    y - 4    →|
  |←          y           →|

可以反汇编最终的可执行文件：

  401798:       8b 35 52 89 0b 00       mov    0xb8952(%rip),%esi        # 4ba0f0 <y>
  40179e:       8b 3d 50 89 0b 00       mov    0xb8950(%rip),%edi        # 4ba0f4 <x>

已经被填上了。

ld

功能：只对给出的可重定位目标文件，重新建立符号表，合并段，并计算段长度建立映射关系，并进行符号解析和重定位

需要修改 main.c 如下：

#include <stdio.h>

extern int x, y;
int foo(int a, int b);

int main() {
  foo(x, y);
}

否则会报错：

ld: main.o: in function `main':
main.c:(.text.startup+0x35): undefined reference to `__printf_chk'

然而此处得到的可执行文件会报段错误。

使用 readelf -a 观察 a.out，得到程序入口 0x401000，即 main 函数。

通过 gdb 中的 starti 验证

而 main 函数 ret 时 (%rsp) 为不可访问的内存地址：

(gdb) p $rsp
$2 = (void *) 0x7fffffffdf00
(gdb) x 0x7fffffffdf00
0x7fffffffdf00: 0x00000001

所以触发了段错误。

实际上我们可以写一个最小的汇编代码，使用 ld 直接链接并且能正确返回：

rax = 231

rdi = 返回值

syscall 指令执行系统调用

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ cat as.S
mov $231, %rax
mov $99, %rdi
syscall
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ gcc -c as.S
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ objdump -d as.o

as.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:  48 c7 c0 e7 00 00 00   mov    $0xe7,%rax
   7:  48 c7 c7 63 00 00 00   mov    $0x63,%rdi
   e:  0f 05                  syscall
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ ld as.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ ./a.out
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ echo $?
99

TODO：解释

另一方面，上面 gcc 得到的可执行程序的入口函数为 _start，调用链如下：

_start -> __libc_start_main -> ...

我们可以通过添加编译选项来观察 gcc 所做的额外操作：

LDFLAGS := -static -Wl,--verbose

其中 -Wl,--verbose 的作用为：

-Wl,option
  Pass option as an option to the linker.

可以观察到其中使用了一个链接脚本：

using internal linker script:
==================================================
...
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(SORT(.text.sorted.*))
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf.em.  */
    *(.gnu.warning)
  }
...
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }
...
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
...

其中提供了一些段，猜测形成了 _start -> __libc_start_main -> ...

还提供了一个 end 变量，指示 .data 段的末尾：

参考 http://csapp.cs.cmu.edu/3e/ics3/code/data/show-bytes.c

#include <stdio.h>

extern int _end;
extern int end;

typedef unsigned char *byte_pointer;

void show_bytes(byte_pointer start, size_t len) {
    size_t i;
    for (i = 0; i < len; i++)
        printf(" %.2x", start[i]);
    printf("\n");
}

void show_pointer(void *x) {
    show_bytes((byte_pointer) &x, sizeof(void *));
}

int main() {
    show_pointer(&_end);
    show_pointer(&end);
    printf("%p 0x%lx\n", &_end, &_end);
}

链接脚本的下面便是具体的链接过程：

attempt to open a.o succeeded
a.o
attempt to open b.o succeeded
b.o
attempt to open main.o succeeded
main.o
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/libgcc.a succeeded
/usr/lib/gcc/x86_64-linux-gnu/10/libgcc.a
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/libgcc_eh.a succeeded
/usr/lib/gcc/x86_64-linux-gnu/10/libgcc_eh.a
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/libc.a failed
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/libc.a succeeded
/usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/libc.a
(/usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/libc.a)libc-start.o
...
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/crtend.o succeeded
/usr/lib/gcc/x86_64-linux-gnu/10/crtend.o
attempt to open /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/crtn.o succeeded
/usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/crtn.o

上面注释掉的 printf 的实现也在其中。

这里还链接了一些 C RunTime Library，其中包含了一系列 crt 对象：

__attribute__((constructor))
__attribute__((destructor))

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

来看一个例子：

#include <stdio.h>
#include <stdlib.h>

__attribute__((constructor)) void f() {
    puts("start");
}

__attribute__((destructor)) void g() {
    puts("end");
}

void h() {
    puts("exit");
}

int main() {
    atexit(h);
}

运行结果为：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
start
exit
end

strace

操作系统加载并执行可执行文件的全过程：

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7fff12355620 /* 60 vars */) = 0
arch_prctl(0x3001 /* ARCH_??? */, 0x7fffbfc1d830) = -1 EINVAL (Invalid argument)
brk(NULL)                               = 0x1207000
brk(0x1207d80)                          = 0x1207d80
arch_prctl(ARCH_SET_FS, 0x1207380)      = 0
uname({sysname="Linux", nodename="vgalaxy-VirtualBox", ...}) = 0
readlink("/proc/self/exe", "/home/vgalaxy/Downloads/link-tes"..., 4096) = 39
brk(0x1228d80)                          = 0x1228d80
brk(0x1229000)                          = 0x1229000
mprotect(0x4b6000, 16384, PROT_READ)    = 0
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}, AT_EMPTY_PATH) = 0
write(1, "100 + 200 = 300\n", 16100 + 200 = 300
)       = 16
exit_group(0)                           = ?
+++ exited with 0 +++

可以看到其中的 write 系统调用。

动态链接与加载

去掉 -fno-pic 和 -static

这是默认的 gcc 编译链接选项

文件相比静态链接大幅瘦身！

a.o, b.o 没有变化
main.o 里依然有 00 00 00 00
- 但是相对于 rip 的 offset
- relocation 依然是 x - 4, y - 4, foo - 4
a.out 里库的代码都不见了

之前 objdump -d a.out 会有非常长的输出，其中便有我们老朋友 ___printf_chk 的反汇编：

0000000000445610 <___printf_chk>:
  445610:       f3 0f 1e fa             endbr64
  445614:       48 81 ec d8 00 00 00    sub    $0xd8,%rsp
  44561b:       48 89 54 24 30          mov    %rdx,0x30(%rsp)
  445620:       48 89 4c 24 38          mov    %rcx,0x38(%rsp)
  445625:       4c 89 44 24 40          mov    %r8,0x40(%rsp)
  44562a:       4c 89 4c 24 48          mov    %r9,0x48(%rsp)
  44562f:       84 c0                   test   %al,%al
  445631:       74 37                   je     44566a <___printf_chk+0x5a>
  445633:       0f 29 44 24 50          movaps %xmm0,0x50(%rsp)
  445638:       0f 29 4c 24 60          movaps %xmm1,0x60(%rsp)
  44563d:       0f 29 54 24 70          movaps %xmm2,0x70(%rsp)
  445642:       0f 29 9c 24 80 00 00    movaps %xmm3,0x80(%rsp)
  445649:       00
  44564a:       0f 29 a4 24 90 00 00    movaps %xmm4,0x90(%rsp)
  445651:       00
  445652:       0f 29 ac 24 a0 00 00    movaps %xmm5,0xa0(%rsp)
  445659:       00
  44565a:       0f 29 b4 24 b0 00 00    movaps %xmm6,0xb0(%rsp)
  445661:       00
  445662:       0f 29 bc 24 c0 00 00    movaps %xmm7,0xc0(%rsp)
  445669:       00
  44566a:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  445671:       00 00
  445673:       48 89 44 24 18          mov    %rax,0x18(%rsp)
  445678:       31 c0                   xor    %eax,%eax
  44567a:       31 c9                   xor    %ecx,%ecx
  44567c:       85 ff                   test   %edi,%edi
  44567e:       48 8d 84 24 e0 00 00    lea    0xe0(%rsp),%rax
  445685:       00
  445686:       0f 9f c1                setg   %cl
  445689:       48 8b 3d 80 50 07 00    mov    0x75080(%rip),%rdi        # 4ba710 <stdout>
  445690:       48 89 44 24 08          mov    %rax,0x8(%rsp)
  445695:       48 89 e2                mov    %rsp,%rdx
  445698:       01 c9                   add    %ecx,%ecx
  44569a:       48 8d 44 24 20          lea    0x20(%rsp),%rax
  44569f:       c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
  4456a6:       c7 44 24 04 30 00 00    movl   $0x30,0x4(%rsp)
  4456ad:       00
  4456ae:       48 89 44 24 10          mov    %rax,0x10(%rsp)
  4456b3:       e8 a8 60 01 00          call   45b760 <__vfprintf_internal>
  4456b8:       48 8b 54 24 18          mov    0x18(%rsp),%rdx
  4456bd:       64 48 2b 14 25 28 00    sub    %fs:0x28,%rdx
  4456c4:       00 00
  4456c6:       75 08                   jne    4456d0 <___printf_chk+0xc0>
  4456c8:       48 81 c4 d8 00 00 00    add    $0xd8,%rsp
  4456cf:       c3                      ret
  4456d0:       e8 2b 00 00 00          call   445700 <__stack_chk_fail>
  4456d5:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  4456dc:       00 00 00
  4456df:       90                      nop

PIC

从

movl $100, %rdi

到

movl 1234(%rip), %rdi

共享库不必事先决定加载的位置（库函数）
应用程序自己也是（函数，外部变量）
- 新版本 gcc 无论 32/64-bit 均默认 PIC
- 重现课本行为可使用 -no-pie 选项编译

-no-pie
  Don't produce a dynamically linked position independent executable.

然而 i386 并不支持以下代码：

movl $1, 1234(%eip)

于是便有了 __i686.get_pc_thunk.bx，即获取 next PC 到寄存器 ebx，简单的实现如下：

movl (%esp), %ebx
ret

https://stackoverflow.com/questions/6679846/what-is-i686-get-pc-thunk-bx-why-do-we-need-this-call

ldd

print shared object dependencies

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ ldd ./a.out
  linux-vdso.so.1 (0x00007ffd45b9f000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbab3fdf000)
  /lib64/ld-linux-x86-64.so.2 (0x00007fbab41e2000)

linux-vdso.so.1: Virtual Dynamic Shared Object
libc.so.6: A symbolic link that points to the location of the glibc library
/lib64/ld-linux-x86-64.so.2: Linker and Loader

进一步观察 /lib64/ld-linux-x86-64.so.2：

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ file /lib64/ld-linux-x86-64.so.2
/lib64/ld-linux-x86-64.so.2: symbolic link to /lib/x86_64-linux-gnu/ld-2.33.so

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ file /lib/x86_64-linux-gnu/ld-2.33.so
/lib/x86_64-linux-gnu/ld-2.33.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=2238045eb12263c1ea1dd2bb6042dff799d0d127, stripped

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ ldd /lib/x86_64-linux-gnu/ld-2.33.so
  statically linked

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ readelf -d /lib/x86_64-linux-gnu/ld-2.33.so

Dynamic section at offset 0x32e70 contains 19 entries:
  Tag        Type                         Name/Value
 0x000000000000000e (SONAME)             Library soname: [ld-linux-x86-64.so.2]
 0x0000000000000004 (HASH)               0x2f0
 0x000000006ffffef5 (GNU_HASH)           0x3b8
 0x0000000000000005 (STRTAB)             0x790
 0x0000000000000006 (SYMTAB)             0x4a8
 0x000000000000000a (STRSZ)              549 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x34000
 0x0000000000000002 (PLTRELSZ)           96 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0xb90
 0x0000000000000007 (RELA)               0xaa0
 0x0000000000000008 (RELASZ)             240 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffc (VERDEF)             0x9f8
 0x000000006ffffffd (VERDEFNUM)          5
 0x000000006ffffff0 (VERSYM)             0x9b6
 0x000000006ffffff9 (RELACOUNT)          8
 0x0000000000000000 (NULL)               0x0

静态链接，位置无关

ldd 实际上是一个 script：

cat $(which ldd) | less

挨个尝试调用若干 ld-linux.so 候选，并加上一系列环境变量。

我们可以用 --list 选项达到类似效果：

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ /lib64/ld-linux-x86-64.so.2 --list ./a.out
  linux-vdso.so.1 (0x00007ffc337cc000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f76aa390000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f76aa593000)

/lib/x86_64-linux-gnu/libc.so.6 存在的证据：

program header 中有一个 INTERP

/lib64/ld-linux-x86-64.so.2
这个字符串可以直接在二进制文件中看到

另一种程序执行：

vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ /lib64/ld-linux-x86-64.so.2 ./a.out
100 + 200 = 300
vgalaxy@vgalaxy-VirtualBox:~/Downloads/link-test$ ./a.out
100 + 200 = 300

这才是 ./a.out 真正在操作系统里发生的事情，与 sha-bang (#!) 实现机制完全相同：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ cat a.py
#! /usr/bin/env python3
print("HELLO")
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.py
HELLO
vgalaxy@vgalaxy-VirtualBox:~/Templates$ python3 ./a.py
HELLO

动态链接：实现

main.o 在链接时，对 printf 的调用地址就确定了

但 libc 每次加载的位置都不一样啊！

应用程序使用怎样的指令序列调用库函数？

可以在库加载的时候重新进行一次静态链接，但这个方案有一些明显的缺陷
- 各个进程的代码不能共享一个内存副本
- 没有使用过的符号也需要重定位
基本款：使用 -fno-plt 选项可以开启此款

call *table[PRINTF]

在链接时，填入运行时的 table

豪华款：默认选项，使用 PLT

printf@plt:
  jmp *table[PRINTF]
  push $PRINTF
  call resolve

https://stackoverflow.com/questions/5469274/what-does-plt-mean-here

http://dustin.schultz.io/how-is-glibc-loaded-at-runtime.html

Abstract Machine 选讲

复杂系统的构造与解释

复杂系统的演化通常是 evolutionary 的而不是 revolutionary 的。

三要素

代码
指令集
电路

从指令集的层面理解代码，就有了非递归汉诺塔：

#include <stdio.h>
#include <assert.h>
#include <stdint.h>

typedef uint32_t u32;

typedef struct {
  u32 pc, n;
  char from, to, via;
} Frame;

#define call(...) ({ *(++top) = (Frame) { .pc = 0, __VA_ARGS__ }; })
#define ret()     ({ top--; })

void hanoi(int n, char from, char to, char via) {
  Frame stk[64], *top = stk - 1;
  call(n, from, to, via);
  for (Frame *f; (f = top) >= stk; f->pc++) {
    switch (f->pc) {
      case 0: if (f->n == 1) printf("%c -> %c\n", f->from, f->to), f->pc = 3; break;
      case 1: call(f->n - 1, f->from, f->via, f->to);   break;
      case 2: call(       1, f->from, f->to,  f->via);  break;
      case 3: call(f->n - 1, f->via,  f->to,  f->from); break;
      case 4: ret();                                    break;
      default: assert(0);
    }
  }
}

int main() {
    hanoi(10, 'A', 'B', 'C');
}

模拟栈帧、call 指令和 ret 指令。

从电路的层面理解指令集，就有了 NEMU。

计算机系统的构造与解释

图灵机 + 存储程序 = 通用计算
I/O + 中断 = 丰富的应用
中断 + 更大的内存 = 分时多线程
分时多线程 + 虚拟存储 = 进程
多处理器、big.LITTLE、异构处理器 (GPU, NPU, …)
单指令多数据 (MMX, SSE, AVX, …)、虚拟化 (VT; EL0/1/2/3)、安全执行环境 (TrustZone; SGX), …

AbstractMachine

设计

提供最少机制以实现现代计算机系统软件

TRM（程序所需最小的运行环境）

一个简单的例子 baremetal.h 和 hello.c

从 main 开始，一段自由内存 heap，随时可以终止 halt，随时可以输出 putch。

输入便是 mainargs 的参数

IOE（仅提供一些系统无关的设备抽象）

计算机在演化的过程中多了 in 和 out 指令。我们也配上 read 和 write 不就好了。

CTE（简化、统一、相对低效的中断处理）

一个简单的例子 thread-os.c，实现分时多线程

相当于在每条指令之后都插入了异常/中断的检查，我们的代码会自动保存异常/中断返回的处理器状态（上下文）：

// after each instruction
if (has_exception) {
  exception_handler();
}
if (has_interrupt && int_enabled) {
  interrupt_handler();
}

VME（基于页的映射，忽略硬件实现）
MPE（假设 race-freedom、简化的系统模型）

取舍

得到：统一简洁的接口
- 适合教学；跨体系结构存活，甚至有 native
失去：实际系统的特性支持
- 不连续的内存、热插拔内存、Access/Dirty Bit……

演化观点的系统抽象

抽象带来的好处：

写操作系统再也不用汇编了
在正确的抽象层上写代码能减少细节纠结
软件正确性可以互相验证
- 软件在 native 调试，调试好了再上体系结构运行

在抽象层上工作带来的额外好处：

trace
model checking
formal verification

计算机系统的完整故事

本学期 (ICS)

实现 x86/riscv/mips 上的 AbstractMachine API

下学期 (OS)

基于 AbstractMachine 实现多处理器操作系统

PA 的后续

RISC-V 恰如其时地出现了，是时候革了《计算机系统基础》的命了……

代码 - 指令 - 硬件

I/O 设备选讲

处理器 - 设备接口

设备 = 一组寄存器，每次可以交换一定数量的数据。

每个寄存器有特定的含义和格式：

主要功能：读取数据/写入数据/配置设备

CPU 可以直接通过指令读写这些寄存器：

Port-mapped I/O (PMIO)
- I/O 地址空间 (port)
- CPU 直连 I/O 总线

-------       ----------
| CPU |-------| Memory |
-------       ----------
   |
   |
--------------------------------- BUS
  |             |           |
 Reg: status   Reg: cmd    Reg: data

Memory-mapped I/O (MMIO)
- 直观，使用普通内存读写指令就能访问
- 带来一些设计和实现的麻烦：编译器优化、缓存、乱序执行……

使用 volatile 关键字

-------       ----------
| CPU |-------| Memory |
-------   |   ----------
          |
          |
--------------------------------- BUS
  |             |           |
 Reg: status   Reg: cmd    Reg: data

联系 PA2，这些指令便是 inb 和 outb：

static inline uint8_t  inb(uintptr_t addr) { return *(volatile uint8_t  *)addr; }
static inline uint16_t inw(uintptr_t addr) { return *(volatile uint16_t *)addr; }
static inline uint32_t inl(uintptr_t addr) { return *(volatile uint32_t *)addr; }

static inline void outb(uintptr_t addr, uint8_t  data) { *(volatile uint8_t  *)addr = data; }
static inline void outw(uintptr_t addr, uint16_t data) { *(volatile uint16_t *)addr = data; }
static inline void outl(uintptr_t addr, uint32_t data) { *(volatile uint32_t *)addr = data; }

经典回顾，AM 程序通过 io_read 和 io_write 对设备进行读写：

io_read -> ioe_read -> 回调函数 -> inb
io_write -> ioe_write -> 回调函数 -> outb

更复杂的设备

打印机：将字节流描述的文字/图形打印到纸张上

PostScript：一种描述页面布局的 domain-specific language，PDF 是它的 superset

-------       ----------
| CPU |-------| Memory |
-------   |   ----------
          |
          |
--------------------------------- BUS
  |             |           |
 Reg: status   Reg: cmd    Reg: data (.ps)
  |             |           |
  ↓             ↓           ↓
  ---------------------------
  |                         |
  |         Printer         |
  |                         |
  |-------------------------|

可以视打印机为一个微型的计算机，解析 .ps 文件的内容，并完成实际打印的工作。

一个 .ps 文件，以纯文本格式打开：

%!
% http://www.cs.cmu.edu/afs/andrew/scs/cs/15-463/98/pub/www/assts/ps.html

72 72 scale % scale coordinate system so units are inches, not points
2 2 translate   % put origin 2 inches from lower left of page

/Symbol findfont 4 scalefont setfont
% current font is now Symbol about 4 inches high
gsave   % save graphics state (coordinate system & stuff)
.5 setgray
60 rotate
0 0 moveto (abcde) show
grestore    % restore previous graphics state

/Helvetica-Bold findfont .2 scalefont setfont
% current font is now slanted Helvetica about .2 inches high
0 4 moveto (Postscript is good at this stuff) show

/Times-Italic findfont 2 scalefont setfont
% current font is now italic Times about 2 inches high
1 0 0 setrgbcolor
0 6 moveto (yowza!) show

showpage

.pdf 类似

两个特殊的 I/O 设备

总线

实现即插即用……

系统里可能有很多 (甚至是可变的) I/O 设备

总线实现了设备的查找、映射、和命令/数据的转发

CPU 可以只直接连接到总线

总线可以连接其他总线

-------       ----------
| CPU |-------| Memory |
-------   |   ----------
          |
          |
--------------------------------- PCI BUS
  |           |         |
 USB BUS     GPU       SSD
  |
 USB

例子：lspci -t, lsusb -t

vgalaxy@vgalaxy-VirtualBox:~/Downloads$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:02.0 VGA compatible controller: VMware SVGA II Adapter
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:06.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:0d.0 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)
vgalaxy@vgalaxy-VirtualBox:~/Downloads$ lsusb
Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

中断控制器

管理多个产生中断的设备

汇总成一个中断信号给 CPU
支持中断的屏蔽、优先级管理等

             ——— ———    -------        ----------
  Clock------| PIC |----| CPU |--------| Memory |
             ——— ———    -------    |   ----------
                |                  |
                |                  |
------------------------------------------- PCI BUS
  |           |         |
 USB BUS     GPU       SSD
  |
 USB

NES 实现

真的只是一些导线……

2D 图形绘制硬件

理论上一切皆可计算：

for (int i = 1; i <= H; i++) {
  for (int j = 1; j <= W; j++)
    putchar(j <= i ? '*' : ' ');
  putchar('\n');
}

实际上难办的是性能。

Picture Processing Unit

CPU 提供脚本和素材，PPU 负责具体的绘制。

联系之前的打印机

PPU：只执行一个程序的 CPU

for (int x = 0, pos = 0; x < HEIGHT; x++) { // 行扫描
  for (int y = 0; y < WIDTH; y++, pos++) {
    vbuf[pos] = draw(x, y); // 算出 (x,y) 的贴块 (和颜色)
  }
}

并行性极好

奇巧淫技

背景是大图的一部分，通过卷轴操作扩大显示区域，通过快速切换实现背景动画效果
每行的前景块不超过 8 个，脚本描述了这些贴块的摆放方法
通过编码为每个贴块提供额外的信息：

76543210
||||||||
||||||++- Palette
|||+++--- Unimplemented
||+------ Priority
|+------- Flip horizontally
+-------- Flip vertically

如：配色，重叠优先级，水平和垂直的翻转

目的：缩小素材大小，节省显存

卡带：专有的硬件，提高计算能力

更好的 2D 游戏引擎

NES PPU 的本质是和坐标轴平行的贴块块

实现上只需要加法和位运算

更强大的计算能力 = 更复杂的图形绘制

图片的裁剪、拼贴、旋转、缩放
线性变换 -> 浮点运算
材质映射、后处理……

支持 3D

三维空间中的多边形，在视平面上也是多边形
任何 n 边形都可以分解成 n-2 个三角形

3D 图形绘制硬件

正确渲染

需求：三维空间中的三角形需要正确渲染

基本原理：摄像机空间坐标，视平面方程，计算法向的第一个空间多边形，就是实际所看到的画面。

极简叙述……

基本流程

CPU 负责描述，GPU 负责渲染

GPU 收到代码 + 数据 → 运算 → 写回结果到内存 → 发送中断
GPU 中有数千个运算单元实现并行的计算

后处理

Shading Language

可以作用在各个渲染级别上：vertex, fragment, pixel shader
相当于一个 PS 程序，算出每个部分的光照变化
- 全局光照、反射、阴影、环境光遮罩……

系统编程与基础设施

面向浪费时间编程

vim map

对于经典的输入输出 OJ 题，如：

#include <stdio.h>

int main() {
    int a, b;
    scanf("%d %d", &a, &b);
    printf("%d + %d = %d", a, b, a + b);
}

可以将常用的很长的命令映射到一个新的功能键上，一键完成保存、构建、输入……

:map <F9> :w<CR> :!gcc a.c && ./a.out < in.txt<CR>

当然也可以直接 Makefile

export

不想每次都输入 ARCH=riscv32-nemu……

export ARCH=riscv32-nemu

make 并行加速

每次都 make -j8

或者 alias make='make -j8'

在 Makefile 里加一行 MAKEFLAGS += -j 8

基础设施的本质

通过适当的配置、脚本减少思维中断的时间，提高连贯性，保持短时记忆活跃

时间成本分析

non-geek

      -
     -
    -
   -
  -
 -
-

geek

   |------------
   |
   |
   |
   |
---|

造轮子

程序员们就是喜欢造轮子

(日常管理) 效率低 → 熟练使用命令行工具/Python
(项目管理) 你已经会 gcc a.c 了，但没法管理几十个文件 → make, 一键编译/运行/测试
(代码编辑) 在代码里跳来跳去很麻烦 → IDE/配置 Vim/装插件
(错误检查) 很容易犯低级错误 → -Wall, -Werror, fsanitize=address
(代码调试) Segmentation Fault 了 → gdb

Differential Testing

大腿调试法

基本思想：模块替换

原理：同一套接口 (API) 的两个实现应当行为完全一致

指令集的两套实现：大腿同学 vs. 小腿同学

编程语言的两套实现：gcc vs. clang

……

Delta Debugging

模块级 -> 函数级 -> 语句级 -> 指令级

当粒度足够小，就成了 PA 中提供的 Differential Testing

真正的大腿：QEMU

qemu-diff 实现：

启动 QEMU 并配置它进入与 PA 类似的模式
用 gdb 连接 QEMU
用 gdb_si() 在 QEMU 中执行一条指令
比较指令执行后寄存器是否有区别

一些细节：

通过 fork 建立子进程，让 QEMU 和 NEMU 之间可以通信

QEMU 实现了 gdb 的协议，可以使用 gdb 调试运行中的 QEMU

总结

终极梦想：让计算机自动帮我们写程序

告诉计算机需求 → 计算机输出正确的代码
计算机科学的 holy grail 之一

现在我们还在软件自动化的初级阶段

计算机只能提供有限的自动化 (基础设施)
- 集成开发环境 (IDE)
- 静态分析
- 动态分析
但这些基础设施已经从本质上改变了我们的开发效率
- 你绝对不会愿意用记事本写程序的

中断与分时多任务

机制 & 策略

在现代操作系统的结构设计中，经常利用“机制与策略分离”的原理来构造 OS 结构。所谓机制，是指实现某一功能的具体执行机构。而策略，则是在机制基础上，借助于某些参数和算法来实现该功能的优化，或达到不同的功能目标。通常，机制处于一个系统的基层，而策略则处于系统的高层。

Unix/Linux 的接口设计有一句通用的格言——提供机制而不是策略。

机制：中断
策略：分时多任务

中断机制

PIC -> 中断控制器

                 ——— ——— INT --------       ----------
Clock ----|------| PIC |-----| CPU |--------| Memory |
Keyboard -|      ——— ———     --------    |  ----------
Mouse ----|         |                    |
                    |                    |
------------------------------------------- PCI BUS
  |           |         |
 USB BUS     GPU       SSD
  |
 USB

异步中断：如时钟，每秒钟产生 1000 次 32 号中断，硬件实现为振荡器
同步中断（异常）：如 int $0x80，产生 128 号异常

x86-64 中断过程

自动保存 RIP, CS, RFLAGS, RSP, SS, (Error Code)

for Ring 0, kernel

保存在当前堆栈：

|    ...    |
|    xxx    | ---- old rsp
|    xxx    |
|    xxx    |
|    xxx    |
|    xxx    |
|           | ---- new rsp

关于顺序，RTFM

for Ring 3, user

堆栈切换，防止出现缺页故障，触发嵌套中断：

|------|
| SS0  |
| RSP0 |
|------|
    |
    |       |    ...    |
    |-------|    xxx    |
            |    xxx    |
            |    xxx    |
            |    xxx    |
            |    xxx    |
            |           |

根据中断/异常号跳转到处理程序

                      |--------------| --------- handler
cmp  %eax, %ebx ----- |    call      |
                      |  save regs   |
                      |  ..........  |
                      |  ..........  |
                      |  ..........  |
                      | restore regs |
             jz ----- |     ret      |
                      |--------------|

why save and restore -> context

试图关闭中断

CLI 指令 -> Clear Interrupt Flag

int main() {
    asm volatile ("cli");
    while (1);
}

反汇编可执行文件：

0000000000001129 <main>:
    1129:  f3 0f 1e fa            endbr64
    112d:  55                     push   %rbp
    112e:  48 89 e5               mov    %rsp,%rbp
    1131:  fa                     cli
    1132:  eb fe                  jmp    1132 <main+0x9>
    1134:  66 2e 0f 1f 84 00 00   cs nopw 0x0(%rax,%rax,1)
    113b:  00 00 00
    113e:  66 90                  xchg   %ax,%ax

运行，触发段错误（执行越权指令）……

这是因为 Ring 3 关闭中断将导致 Exception(GP(0))

实际上，这里的异常机制由操作系统代码执行，操作系统发送 SIGSEGV 信号给用户程序，看用户程序能否处理：

#define _GNU_SOURCE

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
#include <stdint.h>

void on_sigsegv(int sig, siginfo_t *info, void *ucontext) {
  ucontext_t *ctx = (ucontext_t *)ucontext;
  uint8_t *pc = (void *)ctx->uc_mcontext.gregs[REG_RIP];
  printf("PC = %p  ", pc);
  for (int i = -8; i < 8; i++) {
    printf(i == 0 ? "[%02x] " : "%02x ", pc[i]);
  }
  printf("\n");
  exit(1);
}

int main() {
  struct sigaction act;
  act.sa_sigaction = on_sigsegv;
  sigemptyset(&act.sa_mask);
  act.sa_flags = SA_SIGINFO;
  sigaction(SIGSEGV, &act, NULL);
  asm volatile("cli");
}

这里使用了 sigaction 设置信号处理函数，该函数打印 PC 的值和当前执行指令的周围 16 个字节。可能的输出如下：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
PC = 0x556b07f3d2ed  00 00 00 e8 c3 fd ff ff [fa] b8 00 00 00 00 48 8b
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
PC = 0x561bd85f52ed  00 00 00 e8 c3 fd ff ff [fa] b8 00 00 00 00 48 8b
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
PC = 0x55e47ba5c2ed  00 00 00 e8 c3 fd ff ff [fa] b8 00 00 00 00 48 8b
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
PC = 0x55f2865fa2ed  00 00 00 e8 c3 fd ff ff [fa] b8 00 00 00 00 48 8b
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
PC = 0x55c5075192ed  00 00 00 e8 c3 fd ff ff [fa] b8 00 00 00 00 48 8b

其中 fa 即为 cli 指令编码，PC 的值不同是因为默认生成位置无关代码，最后三位相同是因为代码距离页的偏移固定。

实现分时多任务

中断发生后，操作系统代码将会切换到另一个进程执行

通过调度，让进程公平地共享 CPU，就可以实现分时多任务

代码选讲

Abstract Machine

trap64.S
cte.c

使用 gdb 远程调试迷你操作系统 -> thread-os.c

没有 QEMU，指令集也不同，只能先鸽了……

gdbnotes for CSAPP

优化程序性能选讲

why 首秀

CSAPP Chapter 5

性能优化的两个方面：

选择适合的算法和数据结构
编写能够被编译器有效优化并转化为高效的可执行代码的源码

常见的有效优化

减少计算操作的次数频率
强度降低：移位 < 加减 < 乘除
子表达式的复用

GCC 的优化选项

https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Optimize-Options.html

Optimization Blockers

编译器的优化前提是安全

在语言标准下，优化前后的程序行为一致

Memory aliasing

不同的内存引用可能指向同一块内存：

void f1(int *xp, int *yp) {
    *xp += *yp;
    *xp += *yp;
}

void f2(int *xp, int *yp) {
    *xp += 2 * *yp;
}

若 xp 和 yp 指向同一块内存：

f1 中 xp 的所指向的数会变为原来的 4 倍
f2 中 xp 的所指向的数会变为原来的 3 倍

另一个例子：

https://www.geeksforgeeks.org/strict-aliasing-rule-in-c-with-examples/

#include <stdio.h>

int foo(int* ptr1, long* ptr2) {
    *ptr1 = 10;
    *ptr2 = 11;
    return *ptr1;
}

int main() {
    long a = 11.0;
    int result = foo((int*)&a, &a);
    printf("%d\n", result);
    return 0;
}

下面是不同的优化级别下编译运行的结果：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -O2 b.c
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
10
vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -O1 b.c
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
11

在 O2 下，默认开启选项 -fstrict-aliasing，即假定不同类型的指针不会指向同一块内存。

若我们在 O2 下指定 -fno-strict-aliasing，编译器就不会如此假定了：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -O2 b.c -fno-strict-aliasing
vgalaxy@vgalaxy-VirtualBox:~/Templates$ ./a.out
11

Function calls

函数调用可能有副作用

例如编译器不会将 strlen 自动优化出循环：

void lower1(char *s)
{
    long i;

    for (i = 0; i < strlen(s); i++)
    if (s[i] >= 'A' && s[i] <= 'Z')
        s[i] -= ('A' - 'a');
}

void lower2(char *s)
{
    long i;
    long len = strlen(s);

    for (i = 0; i < len; i++)
    if (s[i] >= 'A' && s[i] <= 'Z')
        s[i] -= ('A' - 'a');
}

对于函数，编译器一般会将其作为黑盒处理。

另一个例子：

#include <stdio.h>

long counter = 0;
long f() {
    return counter ++;
}

long func1() {
    return f() + f() + f() + f();
}

long func2() {
    return 4 * f();
}

int main() {
    counter = 0;
    printf("%ld\n", func1());

    counter = 0;
    printf("%ld\n", func2());
}

编译器不会将 func1 优化为 func2，因为其中的函数调用 f 有副作用。

下面是默认的优化级别下 func1 的反汇编代码：

0000000000001165 <func1>:
    1165:  f3 0f 1e fa            endbr64
    1169:  55                     push   %rbp
    116a:  48 89 e5               mov    %rsp,%rbp
    116d:  53                     push   %rbx
    116e:  b8 00 00 00 00         mov    $0x0,%eax
    1173:  e8 d1 ff ff ff         call   1149 <f>
    1178:  48 89 c3               mov    %rax,%rbx
    117b:  b8 00 00 00 00         mov    $0x0,%eax
    1180:  e8 c4 ff ff ff         call   1149 <f>
    1185:  48 01 c3               add    %rax,%rbx
    1188:  b8 00 00 00 00         mov    $0x0,%eax
    118d:  e8 b7 ff ff ff         call   1149 <f>
    1192:  48 01 c3               add    %rax,%rbx
    1195:  b8 00 00 00 00         mov    $0x0,%eax
    119a:  e8 aa ff ff ff         call   1149 <f>
    119f:  48 01 d8               add    %rbx,%rax
    11a2:  48 8b 5d f8            mov    -0x8(%rbp),%rbx
    11a6:  c9                     leave
    11a7:  c3                     ret

然而，在 O2 下，编译器会尝试使用内联函数替换的方法对 func1 进行优化：

vgalaxy@vgalaxy-VirtualBox:~/Templates$ gcc -Q -O2 --help=opti
The following options control optimizations:
  ...
  -findirect-inlining                   [enabled]
  ...

下面为反汇编代码：

00000000000011d0 <func1>:
    11d0:  f3 0f 1e fa            endbr64
    11d4:  48 8b 05 3d 2e 00 00   mov    0x2e3d(%rip),%rax        # 4018 <counter>
    11db:  48 8d 50 04            lea    0x4(%rax),%rdx
    11df:  48 8d 04 85 06 00 00   lea    0x6(,%rax,4),%rax
    11e6:  00
    11e7:  48 89 15 2a 2e 00 00   mov    %rdx,0x2e2a(%rip)        # 4018 <counter>
    11ee:  c3                     ret
    11ef:  90                     nop

其等价的代码由下面的内联函数替换：

long func1in() {
    long t = counter++; /* +0 */
    t += counter++;     /* +1 */
    t += counter++;     /* +2 */
    t += counter++;     /* +3 */
    return t;
}

转换得到：

long func1opt() {
    long t = 4 * counter + 6;
    counter += 4;
    return t;
}

我们也可以使用 -fno-inline 关闭该选项。

另一方面，我们也可以使用 static inline 关键字建议编译器进行优化。

在 O1 下编译器就会做出优化。

上面是 O2

这里建议 inline 关键字应配合 static 关键字使用，否则若编译器没有进行内联优化，就会报错：

/usr/bin/ld: /tmp/ccRZahyM.o: in function `func1':
a.c:(.text+0x13): undefined reference to `f'
/usr/bin/ld: a.c:(.text+0x20): undefined reference to `f'
/usr/bin/ld: a.c:(.text+0x2d): undefined reference to `f'
/usr/bin/ld: a.c:(.text+0x3a): undefined reference to `f'
/usr/bin/ld: /tmp/ccRZahyM.o: in function `func2':
a.c:(.text+0x55): undefined reference to `f'
collect2: error: ld returned 1 exit status

启示

局部变量的使用：

减少直接访存
存储函数返回值

现代处理器

特性

流水线：程序在执行时多条指令重叠进行操作
多发射：在每个时钟周期取出多条指令，同时发射到不同的译码器或者后续处理流水线中
超标量：每个时钟周期内可以完成一条以上的指令，指令级并行
SIMD：单指令多数据
分支预测：预测分支的跳转，这里有一个例子
……

循环展开

利用指令级并行，尝试通过控制参数，接近吞吐量的最优化

模仿着写了一个测试：

#include <stdio.h>
#include <stdint.h>
#include <sys/time.h>

// typedef double data_t;
// #define OP *
// #define IDENT 1

typedef int data_t;
#define OP +
#define IDENT 0

#define LENGTH 10000000
data_t data[LENGTH];

void combine1(data_t *dest) {
  long i = 0;
  data_t acc = IDENT;
  for (; i < LENGTH; ++i)
      acc = acc OP data[i];
  *dest = acc;
}

void combine2(data_t *dest) {
  long i = 0;
  data_t acc = IDENT;
  long limit = LENGTH - 1;
  for (; i < limit; i += 2)
      acc = acc OP data[i] OP data[i + 1];
  for (; i < LENGTH; ++i)
      acc = acc OP data[i];
  *dest = acc;
}

void combine3(data_t *dest) {
  long i = 0;
  data_t acc = IDENT;
  long limit = LENGTH - 1;
  for (; i < limit; i += 2)
      acc = acc OP (data[i] OP data[i + 1]);
  for (; i < LENGTH; ++i)
      acc = acc OP data[i];
  *dest = acc;
}

void combine4(data_t *dest) {
  long i = 0;
  data_t acc0 = IDENT;
  data_t acc1 = IDENT;
  long limit = LENGTH - 1;
  for (; i < limit; i += 2) {
      acc0 = acc0 OP data[i];
      acc1 = acc1 OP data[i + 1];
  }
  for (; i < LENGTH; ++i)
      acc0 = acc0 OP data[i];
  *dest = acc0 OP acc1;
}

int main() {
  data_t res;
  uint64_t use;
  long i = 0;
  for (; i < LENGTH; ++i) data[i] = (data_t) (i + 1);
  struct timeval t1, t2;

  gettimeofday(&t1, NULL);
  combine1(&res);
  gettimeofday(&t2, NULL);
  use = (t2.tv_sec - t1.tv_sec) * 1000000 + (t2.tv_usec - t1.tv_usec);
  printf("combine1: %lu\n", use);

  gettimeofday(&t1, NULL);
  combine2(&res);
  gettimeofday(&t2, NULL);
  use = (t2.tv_sec - t1.tv_sec) * 1000000 + (t2.tv_usec - t1.tv_usec);
  printf("combine2: %lu\n", use);

  gettimeofday(&t1, NULL);
  combine3(&res);
  gettimeofday(&t2, NULL);
  use = (t2.tv_sec - t1.tv_sec) * 1000000 + (t2.tv_usec - t1.tv_usec);
  printf("combine3: %lu\n", use);

  gettimeofday(&t1, NULL);
  combine4(&res);
  gettimeofday(&t2, NULL);
  use = (t2.tv_sec - t1.tv_sec) * 1000000 + (t2.tv_usec - t1.tv_usec);
  printf("combine4: %lu\n", use);
}

发现 combine4 在乘法运算才有优势……

或许除了运行时间，还需要检查每周期指令数

并行算法

https://zhuanlan.zhihu.com/p/89863627

虚拟存储选讲

虚拟存储：机制

独立地址空间

操作系统中的每个进程都有独立的地址空间，下面的例子可以证明：

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

int main() {
  for (int i = 0; i < 3; i++) {
    fork(); // create some processes
  }

  volatile int *ptr = mmap(
    (void *)0x20000000,          // mapping address (hint)
    1 << 20,                     // mapping size
    PROT_WRITE | PROT_READ,      // read/write mapping
    MAP_PRIVATE | MAP_ANONYMOUS, // mapping flags
    -1,                          // file descriptor
    0                            // offset
  );

  *ptr = getpid();
  printf("#%d sets %p = %d\n", getpid(), ptr, getpid());

  sleep(1);
  printf("#%d *%p = %d\n", getpid(), ptr, *ptr);
}

使用 mmap 申请一段地址空间，让每个进程修改地址空间 0x20000000 的值，可以发现每个进程在 0x20000000 处的值是不同的：

#3930 sets 0x20000000 = 3930
#3933 sets 0x20000000 = 3933
#3932 sets 0x20000000 = 3932
#3931 sets 0x20000000 = 3931
#3934 sets 0x20000000 = 3934
#3936 sets 0x20000000 = 3936
#3935 sets 0x20000000 = 3935
#3937 sets 0x20000000 = 3937
#3930 *0x20000000 = 3930
#3933 *0x20000000 = 3933
#3932 *0x20000000 = 3932
#3931 *0x20000000 = 3931
#3934 *0x20000000 = 3934
#3936 *0x20000000 = 3936
#3935 *0x20000000 = 3935
#3937 *0x20000000 = 3937

对于进程数量的分析：

        Fork         Fork       Fork
─────────┬────────────┬───────────┬─────────►
         │            │           │
         │            │           │
         │            │           └─────────►
         │            │
         │            │
         │            │         Fork
         │            └───────────┬─────────►
         │                        │
         │                        │
         │                        └─────────►
         │
         │
         │           Fork       Fork
         └────────────┬───────────┬─────────►
                      │           │
                      │           │
                      │           └─────────►
                      │
                      │
                      │         Fork
                      └───────────┬─────────►
                                  │
                                  │
                                  └─────────►

https://asciiflow.com/

注意每个子进程会在剩余的循环中继续创建子进程，所以一共是 8 个。

MMU

硬件维护如下的数据结构（映射）：

x \to M(x)

所有内存访问都将执行这个转换
对于不同的进程，该数据结构是不同的
低权限程序无权修改该数据结构
访问未映射的内存产生异常
能够控制内存访问的权限属性 read/write/execute
- 代码：read/execute
- 数据：read/write
- 堆栈：read/write
  - 堆栈不可执行，可以考虑缓冲区代码注入攻击
  - Attack Lab

VME API

bool     vme_init  (void *(*alloc)(int), void (*free)(void *));
void     protect   (AddrSpace *a);
void     unprotect (AddrSpace *a);
void     map       (AddrSpace *a, void *va, void *pa, int prot);
Context *ucontext  (AddrSpace *a, Area kstack, void *entry);

vme_init 会调用 map 建立内核地址空间恒等映射
context_uload 会调用 protect 建立用户态地址空间，并使用 ucontext 为该地址空间创建初始的上下文

虚拟存储：硬件实现

本质上是一个 map<uintptr_t, uintptr_t>

考虑 4GB 的内存，若使用平衡二叉树实现，平均高度在 30 层以上，对于硬件而言，每次地址转换的所需的时钟周期会远大于指令本身执行所需的时钟周期。

分段

内存是分为页面的 → 局部性

我们考虑页面的大小为 4KB，于是我们只要维护基址和偏移量就可以了：

                   PA                                VA
           ┌─────────────────┐              ┌─────────────────┐
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              ├─────────────────┤  0xc0021000
           │                 │              │                 │
           │                 │              │                 │
           │                 │       ┌─────►│                 │
           │                 │       │      ├─────────────────┤  0xc0020000
           │                 │       │      │                 │
0x21000    ├─────────────────┤       │      │                 │
           │                 │       │      │                 │
           │                 ├───────┘      │                 │
           │                 │              │                 │
0x20000    ├─────────────────┤              │                 │
           │                 │              │                 │
           │                 │              │                 │
           │                 │              │                 │
           └─────────────────┘              └─────────────────┘

如果固定偏移量，存放在一些寄存器中，就是分段机制的雏形。

分页

我们考虑让树的分支数增加到 1KB，于是我们需要一个 2 层的 1024 叉树。

在这样的编码下，32 位的虚拟地址空间可以编码为：

┌────────────────┬────────────────┬──────────────────────────┐
│     10-bits    │     10-bits    │         12-bits          │
└────────────────┴────────────────┴──────────────────────────┘

高 10 位为第 1 层树的基数
中间 10 位为第 2 层树的基数
低 12 位为页内偏移，在地址转换表中，我们可以让页面以 4KB 对齐，于是低 12 位全为 0，可以添加一些权限信息

这种数据结构也被称为 Radix Tree

数据结构实现的技巧：

未映射的地址空间不需要分配（按需分配）
根节点可以控制子节点的属性

我们考虑 64 位，4 KB 页面的情形，此时指针的大小为 8 字节，分支节点的低 12 位只能容纳 512 个 8 字节指针。也就是我们需要 512 叉树，目前 64 位处理器的虚拟地址空间大多为 48 位，可以这样分配：

┌───────────┬──────────┬──────────┬──────────┬─────────────────────┐
│   9-bits  │  9-bits  │  9-bits  │  9-bits  │      12-bits        │
└───────────┴──────────┴──────────┴──────────┴─────────────────────┘

于是我们需要 4 层的 512 叉树。

操作系统支持

MIPS

维护一张查找表，相当于 TLB：

基地址
尾地址
偏移

在每次内存访问时由硬件负责查表，若查找失败，则触发异常，让操作系统重填，实际上真正的页表在操作系统中

这是可编程 MMU (MIPS)
非常灵活，可以实现任意页表
但 TLB miss 重填相对较慢

哈希表

维护如下的数据结构：

(x,asid) \to M(x)

也就是说所有进程共用一张表：

只要系统里所有进程都 ASLR，hash 冲突就会很少
J Huck, J Hays. Architectural support for translation table management in large address space machines. In Proc. of ISCA, 1993.

存储器体系结构

寄存器

寄存器其实是地址空间非常小的内存，从反汇编代码中可以看出一些编码的连续性：

0000000000000000 <.text>:
   0:  48 c7 c0 00 00 00 00   mov    $0x0,%rax
   7:  48 c7 c1 00 00 00 00   mov    $0x0,%rcx
   e:  48 c7 c2 00 00 00 00   mov    $0x0,%rdx
  15:  48 c7 c3 00 00 00 00   mov    $0x0,%rbx
  1c:  48 c7 c4 00 00 00 00   mov    $0x0,%rsp
  23:  48 c7 c5 00 00 00 00   mov    $0x0,%rbp
  2a:  48 c7 c6 00 00 00 00   mov    $0x0,%rsi
  31:  48 c7 c7 00 00 00 00   mov    $0x0,%rdi
  38:  49 c7 c0 00 00 00 00   mov    $0x0,%r8
  3f:  49 c7 c1 00 00 00 00   mov    $0x0,%r9
  46:  49 c7 c2 00 00 00 00   mov    $0x0,%r10
  4d:  49 c7 c3 00 00 00 00   mov    $0x0,%r11
  54:  49 c7 c4 00 00 00 00   mov    $0x0,%r12
  5b:  49 c7 c5 00 00 00 00   mov    $0x0,%r13
  62:  49 c7 c6 00 00 00 00   mov    $0x0,%r14
  69:  49 c7 c7 00 00 00 00   mov    $0x0,%r15

实际上，会有更大的 Physical Register File (PRF)

我们执行的指令会被重命名，即分配 PRF 的地址
这促成了：乱序执行、投机执行（预测执行）、异常处理……

考虑我们有 256 个寄存器（程序员可见的通用寄存器远小于这个数字），编码从 0 开始

假设目前 rax 寄存器对应 0，rcx 寄存器对应 1

考虑如下的代码：

if (x == 1)
  %rax++;
else
  %rcx++;

CPU 会在判断出分支之前将 rax 寄存器和 rcx 寄存器加 1 的结果置入两个空闲的寄存器，如 rax’ 对应 100，如 rcx’ 对应 101，若分支结果为 rax 寄存器加 1，则 CPU 置 rax 寄存器对应 100，并丢弃 rcx’ 对应的 101。

当然若判断分支需要访存，且出现访存错误，那么 rax’ 和 rcx’ 都会被丢弃

MMU

https://developer.arm.com/documentation/101811/0102/The-Memory-Management-Unit-MMU

缓存 (Cache)

内存访问和服局部性原理，于是 CPU 和内存之间可以有一个缓存，是一个从地址到数据的映射

问题是我们应该缓存物理地址还是虚拟地址，也就是缓存在 MMU 前，还是在 MMU 后：

缓存虚拟地址
- 每次切换进程都要清空缓存
- 进程的别名问题
缓存物理地址
- 实现简单
- 每次访问缓存都要等地址翻译（虚拟地址转物理地址）

目前似乎都是缓存物理地址

一方面，根据内存访问的历史，通常能较为准确地预测未来可能访问的内存，并且访问临近内存居多

另一方面，缓存也会对编程带来一些影响，如通过 MMIO 方式写入设备时，写入的数据可能会先保存在缓存中，在不确定的时刻才会被写入内存（设备）中，即写回。

发射 1000 枚导弹，*ptr = xxx 循环 1000 次，首先要注意对 ptr 指针使用 volatile 修饰，其次，由于缓存的使用，最后可能只会发射 1 枚导弹……

也就是说，缓存对寄存器或处理器而言并不是完全透明的。

内存

http://www.qdpma.com/ServerSystems/DRAM.html

内存是一个三维数组

新的存储技术：非易失性

NVRAM

Phase-change memory (PCM)

外部存储器

SSD Flash; 磁盘

更大的延迟
更大的容量
更低的成本

Meltdown

引子

下面的代码可以帮助理解操作系统和用户程序之间的交互：

char buf[SIZE];
cin >> buf;

当执行 cin >> buf; 时，用户态会执行指令 int $0x80，其中 int 代表中断，0x80 代表中断号。在 Linux 中，该中断号对应的中断处理程序为内核，于是程序流会转移到内核代码中，并且处理器自动切换运行级别为 Ring 0。内核代码会接受外界的输入，并将数据写回到 buf 中。

这里的关键是内核和用户进程会映射到同一地址空间，但访问权限不同。也就是说，用户进程是无法直接访问内核的代码和数据，否则触发 segmentation fault。

参考：《计算机系统基础》异常控制流

中断 + 虚存 + 缓存 + 投机执行 = Meltdown

https://meltdownattack.com/

// %rcx: 无权限访问的地址; %rbx: 未载入缓存的数组
        xorq   %rax, %rax
retry:  movzbq (%rcx), %rax   // 非法访问; Page Fault
        shlq   $0xc, %rax
        jz     retry
        movq   (%rbx, %rax), %rbx // (%rbx + (%rcx) * 4096)

假设寄存器 rcx 中存放了某个无权限访问的地址，使用指针 ptr 表示，指向一个 byte。

于是等价的程序为：

do {
  rax = *ptr;
  rax << 12;
} while (rax == 0);

rbx = rbx[rax];

由于投机执行，处理器会在不检查权限的前提下读取指针 ptr 表示的地址处的值，并将数组的相应位置进行缓存。注意这里的单位是一个页面的大小。

当操作系统捕获了 segmentation fault 后，缓存并没有被清除。

于是只要遍历数组，看访问哪个页面比较快，根据页面的索引就可以读取到*ptr。

如何修复：减少内核代码和数据的在地址空间中的映射

“造轮子”的方法与乐趣

a.c ─► a.out

这个过程究竟经历了什么？

preliminary analysis

我们观测到，键入 gcc a.c 后，生成了一个可执行文件 a.out

然而，由于程序是个状态机，而状态机的 TRM 部分只能进行简单的计算

所以可以断言，gcc 和 OS 之间是存在着某种交互的

strace

为此，我们使用 strace 工具进行观察：

PS:

我们可能尝试使用 less 将输出重定向，然而内容一滚动就消失了（或者 vim 里面什么都没有）

可以判断 strace 会将结果写入标准错误

$ strace 2>&1 gcc a.c | vim -

其中 - 代表从标准输入读取内容，标准错误的内容通过管道显示在 vim 中

PS:

vim 中一些基本的内容整理，如：

取消自动换行 :set nowrap

删去一些无关的内容 :g/^newfstatat/norm dd

其范式为 :{range}norm[al][!] {commands}

其中 range 为全文中以 newfstatat 开头的行

或者使用 :g/^newfstatat/d

区分在于

d 为 editor command

dd 为 normal mode command

然后我们观察到最后unlink：

unlink("/tmp/ccPSr2ZR.res")             = 0
unlink("/tmp/ccocFDMt.o")               = 0
unlink("/tmp/ccezma3p.s")               = 0
exit_group(0)                           = ?
+++ exited with 0 +++

其描述如下：

unlink - call the unlink function to remove the specified file

也就是说我们在其中生成了三个临时文件，最后删除了

我们寻找创建它们的地方：

openat(AT_FDCWD, "/tmp/ccezma3p.s", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
close(4)                                = 0
access("/usr/lib/gcc/x86_64-linux-gnu/10/cc1", X_OK) = 0
pipe2([4, 5], O_CLOEXEC)                = 0
vfork()                                 = 4694
close(5)                                = 0
read(4, "", 16)                         = 0
close(4)                                = 0
wait4(4694, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 4694

这里看起来很怪，父进程创建了文件之后就关闭了，然后建立了一个管道，并创建了一个子进程

这里的细节不表，大概意思是父进程创建子进程并阻塞，子进程将一些内容写入文件，子进程结束后，控制再转移到父进程

为此，我们需要让 strace 一并追踪这些子进程的系统调用

RTFM - process，可以找到 -f 选项

child processes

同样定位临时文件：

openat(AT_FDCWD, "/tmp/ccfzJbeR.s", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
close(4)                                = 0
newfstatat(AT_FDCWD, "/usr/lib/gcc/x86_64-linux-gnu/10/cc1", {st_mode=S_IFREG|0755, st_size=25697696, ...}, 0) = 0
access("/usr/lib/gcc/x86_64-linux-gnu/10/cc1", X_OK) = 0
pipe2([4, 5], O_CLOEXEC)                = 0
vfork( <unfinished ...>
[pid  5870] close(4)                    = 0
[pid  5870] execve("/usr/lib/gcc/x86_64-linux-gnu/10/cc1", ["/usr/lib/gcc/x86_64-linux-gnu/10"..., "-quiet", "-imultiarch", "x86_64-linux-gnu", "a.c", "-quiet", "-dumpbase", "a.c", "-mtune=generic", "-march=x86-64", "-auxbase", "a", "-fasynchronous-unwind-tables", "-fstack-protector-strong", "-Wformat", "-Wformat-security", "-fstack-clash-protection", "-fcf-protection", "-o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */ <unfinished ...>
...

可以发现子进程通过系统调用 execve 执行了 cc1

为什么父进程不直接 execve 呢？

并行优化吗？

cc1 还有一堆参数，可以断言这便是编译器，使 a.c ─► a.s

下面可以观察到子进程打开了临时文件：

[pid  5870] openat(AT_FDCWD, "/tmp/ccfzJbeR.s", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
...
[pid  5870] write(4, "\t.file\t\"a.c\"\n\t.text\n\t.globl\tmain"..., 552) = 552

并写入了一些内容

还可以观察到 cc1 打开了一些文件参与编译：

[pid  5870] openat(AT_FDCWD, "a.c", O_RDONLY|O_NOCTTY) = 4
...
[pid  5870] openat(AT_FDCWD, "/usr/include/stdc-predef.h", O_RDONLY|O_NOCTTY) = 5
...

下面是汇编器 as 的部分：

[pid  5871] execve("/usr/lib/ccache/as", ["as", "--64", "-o", "/tmp/ccbXLqzq.o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */) = -1 ENOENT (No such file or directory)
[pid  5871] execve("/usr/local/sbin/as", ["as", "--64", "-o", "/tmp/ccbXLqzq.o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */) = -1 ENOENT (No such file or directory)
[pid  5871] execve("/usr/local/bin/as", ["as", "--64", "-o", "/tmp/ccbXLqzq.o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */) = -1 ENOENT (No such file or directory)
[pid  5871] execve("/usr/sbin/as", ["as", "--64", "-o", "/tmp/ccbXLqzq.o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */) = -1 ENOENT (No such file or directory)
[pid  5871] execve("/usr/bin/as", ["as", "--64", "-o", "/tmp/ccbXLqzq.o", "/tmp/ccfzJbeR.s"], 0xe6c030 /* 72 vars */ <unfinished ...>

可以观察到在子进程中，根据环境变量尝试执行 as 不同的可执行文件

最后是链接的部分，调用了如下两个二进制工具：

collect2
ld

对于 collect2 的作用，参阅：https://gcc.gnu.org/onlinedocs/gccint/Collect2.html

gdb

我们可以使用 gdb 进一步观察临时文件的创建和内容写入

为了调试这种会创建子进程的程序，需要设置一些参数，于是 jyy 写了一个调试脚本：

set detach-on-fork off
set follow-fork-mode child
set breakpoint pending on

file gcc

break execve
commands
  silent
  echo \n\n\n---------- EXECVE ----------\n
  printf "program to execute: %s\n", $rdi
  set $i = 0
  set $argv = (char **)$rsi
  while ($argv[$i])
    printf "argv[%d] = %s\n", $i, $argv[$i]
    set $i = $i + 1
  end
  echo \n
end

run a.c

该脚本在 execve 系统调用处打上断点，并打印出系统调用的参数

可以这样使用：

$ gdb -x gdb-internals.txt

首先可以发现每个程序的开始都是 execve 系统调用，包括 gcc 这个胶水程序：

---------- EXECVE ----------
program to execute: /usr/bin/gcc
argv[0] = /usr/bin/gcc
argv[1] = a.c

continue 后，触发断点：

---------- EXECVE ----------
program to execute: /usr/lib/gcc/x86_64-linux-gnu/10/cc1
argv[0] = /usr/lib/gcc/x86_64-linux-gnu/10/cc1
argv[1] = -quiet
argv[2] = -imultiarch
argv[3] = x86_64-linux-gnu
argv[4] = a.c
argv[5] = -quiet
argv[6] = -dumpbase
argv[7] = a.c
argv[8] = -mtune=generic
argv[9] = -march=x86-64
argv[10] = -auxbase
argv[11] = a
argv[12] = -fasynchronous-unwind-tables
argv[13] = -fstack-protector-strong
argv[14] = -Wformat
argv[15] = -Wformat-security
argv[16] = -fstack-clash-protection
argv[17] = -fcf-protection
argv[18] = -o
argv[19] = /tmp/ccv9Jkb6.s

可以观察目前在子进程：

(gdb) info inferiors
  Num  Description       Connection           Executable
  1    process 8243      1 (native)           /usr/bin/x86_64-linux-gnu-gcc-10
  is vfork parent of inferior 2
* 2    process 8556      1 (native)           /usr/bin/x86_64-linux-gnu-gcc-10
  is vfork child of inferior 1

可以观察到临时文件的创建和内容写入：

(gdb) !ls -l /tmp/ccv9Jkb6.s
-rw------- 1 vgalaxy vgalaxy 0 12月 19 22:07 /tmp/ccv9Jkb6.s
(gdb) c
Continuing.
process 8556 is executing new program: /usr/lib/gcc/x86_64-linux-gnu/10/cc1
[Inferior 2 (process 8556) exited normally]
(gdb) !ls -l /tmp/ccv9Jkb6.s
-rw------- 1 vgalaxy vgalaxy 552 12月 19 22:11 /tmp/ccv9Jkb6.s

此时子进程已经寄了，需要显式切换到父进程：

(gdb) info inferiors
  Num  Description       Connection           Executable
  1    process 8243      1 (native)           /usr/bin/x86_64-linux-gnu-gcc-10
* 2    <null>                                 /usr/lib/gcc/x86_64-linux-gnu/10/cc1
(gdb) inferior 1
[Switching to inferior 1 [process 8243] (/usr/bin/x86_64-linux-gnu-gcc-10)]
[Switching to thread 1.1 (process 8243)]
#0  vfork () at ../sysdeps/unix/sysv/linux/x86_64/vfork.S:41
41  ../sysdeps/unix/sysv/linux/x86_64/vfork.S: No such file or directory.

summary

默认情况下，gcc 没有经历预编译过程：

cc1: a.c ─► a.s
as: a.s ─► a.o
collect2 / ld: a.o ─► a.out

不过我有点在意剩下一个临时文件 .res …

Reinvent Wheels

编译器

在 PA1 中已经实现了其基本思想——表达式求值问题

输入一个字符串，输出计算它的汇编代码

(a + b) * (c + d)

push $a; push $b; add; push $c; push $d; add; mul

stack machine

汇编器

本质上就是 NEMU 译码的逆过程：

根据指令集手册规定翻译
生成符合 ELF 规范的二进制目标文件

链接器

按照 ld script 指示完成二进制文件构造、符号解析和重定位

可以通过如下的命令观察：

$ gcc a.c -Wl,--verbose

其基本结构为：

SECTIONS {
  . = 0x100000;
  .text   : { *(entry) *(.text*) }
  etext = .; _etext = .;
  .rodata : { *(.rodata*) } .data : { *(.data*) }
  edata = .; _edata = .;
  .bss    : { *(.bss*) }
  _start_start = ALIGN(4096);
  . = _start_start + 0x8000;
  _stack_pointer = .;
  end = .; _end = .;
  _heap_start = ALIGN(4096); _heap_end = 0x8000000;
}

. 代表当前位置
基本功能
- 粘贴
- 定义符号

加载器

PA 3 / PA 4

Loader

单元测试框架

YEMU 中的神奇代码：

#define TESTCASES(_) \
  _(1, 0b11100111, random_rm, ASSERT_EQ(newR[0], oldM[7]) ) \
  _(2, 0b00000100, random_rm, ASSERT_EQ(newR[1], oldR[0]) ) \
  _(3, 0b11100101, random_rm, ASSERT_EQ(newR[0], oldM[5]) ) \
  _(4, 0b00010001, random_rm, ASSERT_EQ(newR[0], oldR[0] + oldR[1]) ) \
  _(5, 0b11110111, random_rm, ASSERT_EQ(newM[7], oldR[0]) )

当项目规模扩大时（比如编译运行要几个小时），单元测试就非常重要了

所以 PA 其实就是一个很小的项目……

Differential Testing

设置好状态
各走一步

调试器 GDB

考虑核心功能：在任意 PC 的断点

我们考虑其实现：

NEMU -> monitor

在 while 的大循环（单步执行）里添加亿点代码就可以了……

真实的 CPU
- debug mode, debug registers, exception
- 让 gdb 拥有改变段权限的权利，在目标 PC 处替换为 int3 指令（单字节，防止在页边界出问题），触发异常后 gdb 再将原来的指令填回去……

int $0x3 -> int3

Trace / Profiler

strace 如何实现：操作系统提供了一个系统调用 ptrace

静态分析工具 Lint

如 gcc -Wall -Werror

For C, Cert C Coding Standard provides checker.

动态分析工具 Sanitizer

本质：运行时额外增加的 assert()

调试理论：fault -> failure

jyy 的想法：

对程序中的变量进行检查，判断出变量可能的取值范围

如指针指向的地址应该在用户区的范围内，某个像素点的取值范围为 0 ~ 255
assert(IS_GUESTPTR(ptr));
    (PMEM_MAP_START <= (ptr) && (ptr) < PMEM_MAP_END)
assert(IS_SMALLINT(x));
    (0 <= (x) && (x) <= 100)
assert(IS_BOOL(ptr));
    ((x) == 0 || (x) == 1)

计算机系统基础 习题课

TOC

ICS EX 习题课

The Missing Course of Your CS Education

UNIX 哲学

拥抱开源社区

C 语言拾遗：机制

预编译

编译和链接

加载：进入 C 语言的世界

C 语言拾遗：编程实践

The Art of Readable Code

一个反例

一个可能会碰到的例子

计数器（数码管显示）

logisim.c

display.py

YEMU

软件设计——以寄存器为例

软件设计——以指令实现为例

一览 Makefile

如何阅读全都是宏的程序

更多的计算机系统模拟器

NEMU 框架选讲 (1): 编译运行

The UNIX-Hater’s Handbook

Git

.gitignore

.git

git trace

make

yemu

Lab

NEMU

static

dynamic

AbstractMachine

Summary

NEMU 框架选讲 (2): 代码导读

项目规模

main

static & inline

static

static inline

assert

格式控制字符串

VS Code

Segmentation Fault

再谈如何阅读全都是宏的程序

数据的机器级表示

位运算

单指令多数据

Bit Set

基本操作

集合的大小

lowbit

LOG

Hacker’s Delight

Undefined Behavior

常见的 UB

为什么 C/C++ 会有 UB

编译器优化

警惕整数溢出

浮点数 & IEEE 754

综合案例

平方倒数

Lab 1

x86-64 与内联汇编

机器字长

ABI

calling convention

x86-64：寄存器与函数调用

Inline Assembly

一个简单的例子

与 C 世界交互

进一步的话题

调试：理论与实践

金句

软件工程 & bug

调试理论

基础工具

计算机系统基础习题课

`软件工程 & bug`

再谈`软件工程 & bug`