Lab2: User Programs

在 Lab1 Alarm Clock 的基础上进行

File System

创建虚拟磁盘

pintos-mkdisk filesys.dsk --filesys-size=2

格式化

pintos -- -f -q

拷贝文件并重命名

pintos -p ../../examples/echo -a echo -- -q

运行拷贝的文件

pintos -- -q run 'echo PKUOS'

将上面四步合并为一步，创建临时的虚拟磁盘，并运行拷贝的文件

pintos --filesys-size=2 -p ../../examples/echo -a echo -- -f -q run 'echo PKUOS'

其他一些命令

pintos -- -q ls
pintos -- -q cat echo
pintos -- -q rm echo

Memory Allocation

freed -> all of its bytes are cleared to 0xcc

Page allocator

分配一个新的物理页

区分内核内存池和用户内存池

Block allocator

以 Page allocator 为基础

其 API 与 malloc 等一致

Virtual Addresses

layout 如下

               31               12 11        0
              +-------------------+-----------+
              |    Page Number    |   Offset  |
              +-------------------+-----------+
                       Virtual Address

映射关系示意图如下

对于 uproc 而言，load 函数会通过 pagedir_create 创建用户地址空间，包含上面对应的内核恒等映射

并通过 load_segment 在用户地址空间添加 elf 文件中 PT_LOAD 段描述的地址空间，通常为代码段和数据段

框架代码给出的加载器只能用于加载静态链接的程序

最后为用户栈映射一个物理页

目前不支持 stack growth

基于此，就可以实现 Task 3 的内容

核心是在 syscall_handler 中获得 uaddr 后，需要调用 pagedir_get_page 获取对应的物理地址

需要注意对 uaddr 的解引用可能超过一个 byte，也可能跨页

所以需要检查这一段虚拟地址的开头和结尾是否都在用户地址空间内

Argument Passing

DATA STRUCTURES

A1: Copy here the declaration of each new or changed struct or struct member, global or static variable, typedef, or enumeration. Identify the purpose of each in 25 words or less.

没有使用额外的数据结构

ALGORITHMS

A2: Briefly describe how you implemented argument parsing. How do you arrange for the elements of argv[] to be in the right order? How do you avoid overflowing the stack page?

在 start_process 中实现

其参数包含了整个 arguments

使用 strtok_r 解析 tokens

成功 load 之后初始化用户栈即可，可以参考 https://alfredthiel.gitbook.io/pintosbook/project-description/lab2-user-programs/background#argument-passing

使用宏简化代码

#define push_int(stack, val)                                                  \
  do                                                                          \
    {                                                                         \
      stack -= 4;                                                             \
      *(uint32_t *)stack = (uint32_t)val;                                     \
    }                                                                         \
  while (0)

RATIONALE

A3: Why does Pintos implement strtok_r() but not strtok()?

参考 https://alfredthiel.gitbook.io/pintosbook/appendix/coding-standards#unsafe-string-functions，可知 strtok 使用了全局变量，这在内核多线程的环境中是危险的

strtok_r 通过引入局部变量标识解析位置避免了这一点

A4: In Pintos, the kernel separates commands into a executable name and arguments. In Unix-like systems, the shell does this separation. Identify at least two advantages of the Unix approach.

KISS，简洁的模型
stringify，通用的字符串处理

最基本的内容实现后，可以通过如下测试

pass tests/userprog/args-none
pass tests/userprog/args-single
pass tests/userprog/args-multiple
pass tests/userprog/args-many
pass tests/userprog/args-dbl-space

pass tests/userprog/sc-bad-sp
pass tests/userprog/sc-bad-arg
pass tests/userprog/sc-boundary
pass tests/userprog/sc-boundary-2
pass tests/userprog/sc-boundary-3

pass tests/userprog/halt
pass tests/userprog/exit

pass tests/userprog/bad-read
pass tests/userprog/bad-write
pass tests/userprog/bad-read2
pass tests/userprog/bad-write2
pass tests/userprog/bad-jump
pass tests/userprog/bad-jump2

System Calls

下面实现 exec 和 wait 系统调用

首先考虑 exec 系统调用的参数

需要解引用 argv 的前 4 个字节得到 command 的虚拟地址，然后再将这个虚拟地址转换为物理地址

这里的难点在于得到虚拟地址后，由于不清楚 command 的长度，在转换为物理地址时就无法知道虚拟地址的结尾

目前的实现是只考虑前 2 个字节，从而能够通过 exec-bound-3

然后是 wait 信息的传递问题

目前的实现是构造一个链表记录每次 exec 父子进程的 tid 以及子进程是否 exit 和 status

使用一个全局 lock 保护该数据结构

父进程 wait 时去链表中查找，然后不断 thread_yield 直到子进程 exit

在 start_process 中子进程开始时向链表中添加信息

子进程 exit 时更新信息，为此需要替换 thread_exit 为 process_exit_helper，尤其是 exception.c 中 kill user exception 时

在 process_exit_helper 中还打印了 Process Termination Messages

需要注意信息应存放在堆区，若存放的子进程栈中，可能 wait 时栈已经被回收了

exec 信息的的传递问题

很 tricky 的一个地方

在 process_execute 中调用 thread_create 创建子进程后，若未调度到子进程执行，父进程就无法得知子进程在 start_process 中的 load 是否成功，因为 load 失败时，process_execute 应返回 TID_ERROR

所以应该使用信号量进行同步，父进程在 process_execute 返回前 P，子进程在 load 之后 V

所有信息通过 aux 参数传递

实现之后，可以通过如下测试用例

pass tests/userprog/exec-once
pass tests/userprog/exec-arg
pass tests/userprog/exec-bound
pass tests/userprog/exec-bound-2
pass tests/userprog/exec-bound-3
pass tests/userprog/exec-multiple
pass tests/userprog/exec-missing
pass tests/userprog/exec-bad-ptr

pass tests/userprog/wait-simple
pass tests/userprog/wait-twice
pass tests/userprog/wait-killed
pass tests/userprog/wait-bad-pid

pass tests/userprog/multi-recurse

下面考虑实现文件系统相关的系统调用

由于文件系统的实现，在访问文件系统代码时，必须要获取一把全局的锁，注意 start_process 中的 load 也访问了文件系统

关键问题在于如何维护文件描述符这一层抽象

实验指南中指出，子进程不会继承父进程的文件描述符，并且多次打开同一个文件应返回不同的文件描述符，它们之间不会共享 offset，可以判断文件描述符和进程之间并非强耦合关系

于是类似上文，使用一个链表维护 tid / fd / file，信息同样存放在堆区

在 open 系统调用中，通过 filesys_open 获取到 file 后，向链表中添加信息

可以通过遍历的方式由 fd 获取到 file

在 close 系统调用中，向链表中删除信息，并调用 file_close 即可

注意每个进程 exit 时需要关闭其对应的全部文件描述符

另外在 read 和 write 系统调用中有一个缓冲区参数，由于给出了 length，所以将虚拟地址转换为物理地址就没有问题了，注意转换时不必考虑是否可读可写，因为有 page fault handler 兜底

实现之后，只有三个测试用例无法通过

tests/userprog/rox-simple
tests/userprog/rox-child
tests/userprog/rox-multichild

最后的任务点是，拒绝写入正在运行的可执行程序源文件

首先需要细究一下文件系统的实现，可以发现在 inode 之上，还有 file 这个数据结构做了进一步的抽象

struct file
  {
    struct inode *inode;        /**< File's inode. */
    off_t pos;                  /**< Current position. */
    bool deny_write;            /**< Has file_deny_write() been called? */
  };

这个数据结构主要是维护文件 offset 和 deny_write 两方面的属性

所以多次打开同一个文件，其内部的 inode 是一致的

而且注意到这里的 deny_write 并不是来直接控制是否拒绝写入的，实际上是由 inode 的 deny_write_cnt 来控制的，对于同一个文件的多个文件描述符，只要其中有一个 file_deny_write，对这些文件描述符都会拒绝写入

理解了原理后，实现就很简单了，只要在 load 成功后不关闭文件，调用 file_deny_write 并向链表中添加信息即可

DATA STRUCTURES

B1: Copy here the declaration of each new or changed struct or struct member, global or static variable, typedef, or enumeration. Identify the purpose of each in 25 words or less.

除了上述引入的数据结构外，syscall handler 本身的实现并未引入额外的数据结构

B2: Describe how file descriptors are associated with open files. Are file descriptors unique within the entire OS or just within a single process?

一一对应关系，在 int 范围内，文件描述符在整个内核内是唯一的

ALGORITHMS

B3: Describe your code for reading and writing user data from the kernel.

直接贴代码

static void *
uaddr_to_paddr (const void *uaddr, int length)
// ignore writable checking
// ignore stack growth
{
  if (!uaddr)
    goto done;
  if (!is_user_vaddr (uaddr))
    goto done;
  struct thread *t = thread_current ();
  void *kaddr_st = pagedir_get_page (t->pagedir, uaddr);
  if (!kaddr_st)
    goto done;
  void *kaddr_ed = pagedir_get_page (t->pagedir, uaddr + length - 1);
  if (!kaddr_ed)
    goto done;

  return kaddr_st;
done:
  process_exit_helper (-1);
}

这里忽略了对可读可写的检查和潜在的用户栈跨页的情形，也就是说目前用户栈的大小只有 4 KB

再次强调，之所以需要进行这一步转换，是因为 syscall handler 是运行在内核态的，其 CR3 寄存器中只包含了内核恒等映射，而没有用户地址空间的映射

B4: Suppose a system call causes a full page (4,096 bytes) of data to be copied from user space into the kernel. What is the least and the greatest possible number of inspections of the page table (e.g. calls to pagedir_get_page()) that might result? What about for a system call that only copies 2 bytes of data? Is there room for improvement in these numbers, and how much?

考虑到潜在的跨页情形，最多需要调用 2 次 pagedir_get_page

实际上可以根据 kaddr_st 的 offset 来判断是否需要多一次调用，但上述算法并未实现

B5: Briefly describe your implementation of the “wait” system call and how it interacts with process termination.

上文已经解释过了

B6: Any access to user program memory at a user-specified address can fail due to a bad pointer value. Such accesses must cause the process to be terminated. System calls are fraught with such accesses, e.g. a “write” system call requires reading the system call number from the user stack, then each of the call’s three arguments, then an arbitrary amount of user memory, and any of these can fail at any point. This poses a design and error-handling problem: how do you best avoid obscuring the primary function of code in a morass of error-handling? Furthermore, when an error is detected, how do you ensure that all temporarily allocated resources (locks, buffers, etc.) are freed? In a few paragraphs, describe the strategy or strategies you adopted for managing these issues. Give an example.

在内核态访问用户态的内存前应充分的检查，如检查整个缓冲区是否有权访问

使用宏来简化检查部分的代码，例如获取系统调用参数时

#define deref_argv(argv) *(uint32_t *)uaddr_to_paddr (argv, 4)

#define deref_argv_then_to_paddr(uaddr, length)                               \
  uaddr_to_paddr ((const char *)deref_argv (uaddr), length)

大多数情形下，process_exit_helper 这个兜底函数可以释放掉用户进程的资源，但有一些例外

uaddr_to_paddr 中忽略的情形若被 page fault handler 兜底，一些锁可能无法被释放
- 考虑检查锁持有者是否已经 exit
父进程若不 wait 就 exit 了，子进程的 wait 信息将会内存泄漏
- 可能的解决方案是 reparent
……

SYNCHRONIZATION

B7: The “exec” system call returns -1 if loading the new executable fails, so it cannot return before the new executable has completed loading. How does your code ensure this? How is the load success/failure status passed back to the thread that calls “exec”?

上文已经解释过了

B8: Consider parent process P with child process C. How do you ensure proper synchronization and avoid race conditions when P calls wait(C) before C exits? After C exits? How do you ensure that all resources are freed in each case? How about when P terminates without waiting, before C exits? After C exits? Are there any special cases?

通过维护的数据结构，在堆区进行同步

除了上述的额外情形，其余情形均不会出现资源泄露

RATIONALE

B9: Why did you choose to implement access to user memory from the kernel in the way that you did?

简单，并且不太会汇编

B10: What advantages or disadvantages can you see to your design for file descriptors?

若要实现在父子进程之间共享文件描述符，可能比较麻烦

B11: The default tid_t to pid_t mapping is the identity mapping. If you changed it, what advantages are there to your approach?

实现中 tid_t 和 pid_t 一一对应

若并非如此，也许可以实现线程的复用

Test

全部测试用例通过

Pintos Lab 用户进程

TOC

Lab2: User Programs

File System

Memory Allocation

Virtual Addresses

Argument Passing

DATA STRUCTURES

ALGORITHMS

RATIONALE

System Calls

DATA STRUCTURES

ALGORITHMS

SYNCHRONIZATION

RATIONALE

Test