Building a PE from scratch

2021-03-11

A little while ago I got the idea that it might be fun to try and create a PE from scratch using Python. I guess my idea of having fun is pain, but also it's useful to be able to create a PE from scratch when you want to debug some shellcode, another one of those fun things.

This post will document my thought process and might be just for my own reference, and it might help people that want to have fun like I did as well.

PE Format

While I'll try not to bore you with explaining the full PE format, it's still important to go into the different components that make up the PE format to get an overview of the stuff we need to build it, and to end up with something that a debugger like x64dbg understands.

The PE format can be divided in the following components:

DOS Header
COFF File Header
Optional Header
Section Table

While not all fields need to have a value for the PE to be valid (which TinyPE shows pretty nicely, your browser will mark this as a harmful website btw), I am going to try and be as complete as I can while building this. The fields which can be left with a null value will likely be skipped for the sake of the length of this post, please refer to this Microsoft page that will answer all of your PE questions.

DOS Header

The DOS header part is actually fairly easy since only a couple of things are really important if you want your PE to be valid:

Magic number
Location of the COFF File Header

The magic number is nothing more than MZ , or 4D5A when talking hex. There's not much reason for this to be MZ other than that these are the initials of Mark Zbikowski, the godfather of the format. For a small introduction to the MZ header you can refer to this blog. The magic number consists of the first two bytes in the DOS header.

The other important field to include is the location of the file header, also known as the PE header. This value is actually a pointer to the offset where this PE string can be found in the binary. For the purposes of keeping everything simple and stupid I set the offset to 0x100

I will be using the Python struct library to pack everything while building the final binary. Please note that the endianness is little.

dos_header = b'MZ'                       # Magic number
dos_header += struct.pack('<H', 0x0)     # Bytes on last page of file
dos_header += struct.pack('<H', 0x0)     # Pages in file
dos_header += struct.pack('<H', 0x0)     # Relocations
dos_header += struct.pack('<H', 0x0)     # Size of headers in paragraphs
dos_header += struct.pack('<H', 0x0)     # Minimum extra paragraphs needed
dos_header += struct.pack('<H', 0x0)     # Maximum extra paragraphs needed
dos_header += struct.pack('<H', 0x0)     # Initial SS value
dos_header += struct.pack('<H', 0x0)     # Initial SP value
dos_header += struct.pack('<H', 0x0)     # Checksum
dos_header += struct.pack('<H', 0x0)     # Initial IP value
dos_header += struct.pack('<H', 0x0)     # Initial CS value
dos_header += struct.pack('<H', 0x0)     # File address of relocation table
dos_header += struct.pack('<H', 0x0)     # Overlay number
dos_header += b'\x00' * 8                # Reserved words
dos_header += struct.pack('<H', 0x0)     # OEM Identifier
dos_header += struct.pack('<H', 0x0)     # OEM Information
dos_header += b'\x00' * 20               # Reserved words
dos_header += struct.pack('<I', 0x100)   # location of file header (PE\0\0)

dos_header += DOS_STUB

Followed at the end I included the DOS stub, this is a legacy text to mark that this executable will not run in DOS mode (click for more information).

COFF File Header

This header is a little more interesting, but only a little.

We don't need to provide a value for all of these fields as well, the following fields are necessary:

Signature
Machine
NumberOfSections
SizeOfOptionalHeader
Characteristics

Signature

The Signature is a simple field and just denotes PE\0\0 this value needs to be at the offset we provided when we created the DOS header.

Machine

Since we're creating an executable for a 32-bit machine we will use the 0x14c value to specify this in the Machine field.

NumberOfSections

The NumberOfSections field is used to specify how many sections we are including in this executable. Sections are used for different purposes, the most widely known sections are probably the .text section which holds the executable code, and the .idata section which holds the imports that are being used. Since we're only interested in building a debuggable executable with the shellcode of our choice, we will only include one section, the .text section. If you do want to learn more about sections while keeping a semi-highlevel overview, please refer to this article.

SizeOfOptionalHeader

Next up we provide the size that the optional header has, this is a trick question because as you can see in this nice PE format overview, the optional header comes after the COFF file header... I cheated a bit and provided my Python code with the size since I build each part of the header individually, but if you do want to play it safe you can just put 224 bytes as size and make sure to include every field in your executable.

We'll get to what makes it this size in a little bit.

Characteristics

Then it's time to specify a value for the Characteristics field, a field that determines your executable's faith. There's a lot of different values that you can specify, and even combine. For the sake of this post I will just spoil that the actual value I used is 0x02 , telling the machine this is running on that it's an executable image. Please refer to this table to see a list with possible values and their definitions.

file_header += b'\x00' * (0x100 - len(dos_header))        # Align PE to 0x100

optional_header_size = len(optional_header) + len(data_directories)

file_header += b'PE\x00\x00'                              # Signature
file_header += struct.pack('<H', 0x14c)                   # 0x8664 / 0x14c -> x64 / x86
file_header += struct.pack('<H', 0x1)                     # NumberOfSections
file_header += struct.pack('<I', 0x0)                     # TimeDateStamp
file_header += struct.pack('<I', 0x0)                     # PointerToSymbolTable
file_header += struct.pack('<I', 0x0)                     # NumberOfSymbols
file_header += struct.pack('<H', optional_header_size)    # SizeOfOptionalHeader
file_header += struct.pack(
    '<H',
    CHARACTERISTICS.IMAGE_FILE_EXECUTABLE_IMAGE.value
)                                                         # Characteristics

Optional Header

The optional header is without a doubt the most interesting header in my opinion. I had a lot of fun (pain) getting this right, and I probably made a mistake describing certain things (please let me know).

Magic

Now this seems to be a trend at this point, but this header too has a magic value. We specify 0x10b for 32-bit executables, and 0x20b for 64-bit executables.

Linker Major/Minor version

To be completely honest I'm at the time of writing unsure what these values entail exactly other than the version of the linker, which seems to not do anything as leaving it 0 is fine as well. Do note that these are single byte values.

SizeOfCode

The size of code is... The size of the code.

AddressOfEntrypoint and BaseOfCode

I was very confused by these fields until I looked at other executables with PE-bear (great tool btw, s/o to @hasherezade). In our case we can leave this the same, but in some cases it could be that the entry point of the code is not at the start of the code section.

BaseOfData

I used a fake value for the BaseOfData field since we don't use a .data section, if you were to use one, use a value that corresponds with that RVA.

ImageBase

ImageBase is kind of a funny field since it only specifies the preferred starting address of the executable when loaded into memory. This means, it happens only when your machine feels like doing so. Jokes aside though, in my testing it has always loaded the executable at this address in memory.

SectionAlignment and FileAlignment

These two fields are kind of important since they dictate how you should align different parts in your executable, meaning it will need to round upwards of this value if you're no aligned to this value. The values chosen in my scenario are kind of generic;

Section alignment of 4096 bytes
File alignment of 512 bytes

OS Version Major/Minor

These fields just specify the required operating system. Refer to this table for a nice overview of the versions out there.

Major/MinorSubsystemVersion

Indicates the Windows NT Win32 subsystem, this uses the same values as depicted in the table that was linked for the OS major/minor versions.

SizeOfImage

The complete size of the executable when it is loaded into memory, take into account that this value is a multiple of the SectionAlignment value. To calculate this value we use the VirtualAddress + VirtualSize of the last section in the sections table. Because I'm only going to use 1 section (the .text section), the only value which will be different depending on the shellcode is the VirtualSize :

size_of_image = 0x1000 * (int((va_base + virtual_size) / 0x1000) + 1)

If we would build everything in chronological order we would not know the VirtualSize value, in the final script the .text section will be build beforehand so we can use the size of this section to calculate the SizeOfImage value. The VirtualAddress field can be set to anything really as this just marks the relative address when loaded into memory, as long as this value is used for the VirtualAddress field in the .text section.

SizeOfHeaders

The combined size of an MS-DOS stub, PE header, and section headers rounded up to a multiple of FileAlignment value.

SizeOfStackReserve, SizeOfHeapReserve and SizeOfHeapCommits

To be completely honest, I copied these values from other executables. I am not sure if an executable can be valid without these set to a value, in my testcases it would end up in a non-debuggable executable.

NumberOfRvaAndSizes

This just indicates the number of data directories that are included in this executable, you can set this to 0 if you don't plan to include any data directories like the ImportDirectory etc.

In my case I set this to 10 which means I include all 16 data directories, this doesn't mean that I actually will, I only need to take into account that I add enough zeroes after the NumberOfRvaSizes to make up the size it would normally take for these data directories.

At the time of writing I don't plan to include a way to add imports, I might add this in another post as I'm currently still struggling with wrapping my head exactly as to how this process would end up working when building it from scratch (I'm sure there's people reading this shaking their head because it will turn out to be simple).

optional_header += struct.pack('<H', 0x10b)     # Magic 0x10b / 0x20b -> PE32 / PE32+
optional_header += b'\x00'                      # Linker version major
optional_header += b'\x00'                      # Linker version minor
optional_header += struct.pack('<I', len(text_section))    # SizeOfCode
optional_header += struct.pack('<I', 0x0)       # SizeOfInitializedData
optional_header += struct.pack('<I', 0x0)       # SizeOfUninitializedData
optional_header += struct.pack('<I', 0x1000)    # AddressOfEntryPoint
optional_header += struct.pack('<I', 0x1000)    # BaseOfCode
optional_header += struct.pack('<I', 0x4000)    # BaseOfData
optional_header += struct.pack('<I', 0x400000)  # Imagebase
optional_header += struct.pack('<I', 0x1000)    # SectionAlignment
optional_header += struct.pack('<I', 0x200)     # FileAlignment
optional_header += struct.pack('<H', 0x5)       # OS version major
optional_header += struct.pack('<H', 0x0)       # OS version minor
optional_header += struct.pack('<H', 0x0)       # Image version major
optional_header += struct.pack('<H', 0x0)       # Image version minor
optional_header += struct.pack('<H', 0x5)       # MajorSubsystemVersion
optional_header += struct.pack('<H', 0x0)       # MinorSubsystemVersion
optional_header += struct.pack('<I', 0x0)       # Win32 version
optional_header += struct.pack('<I', 0x4000)    # SizeOfImage
optional_header += struct.pack('<I', 0x200)     # SizeOfHeaders
optional_header += struct.pack('<I', 0x0)       # Checksum
optional_header += struct.pack('<H', 0x2)       # Subsystem
optional_header += struct.pack('<H', 0x0)       # Dllcharacteristics
optional_header += struct.pack('<I', 0x100000)  # SizeOfStackReserve
optional_header += struct.pack('<I', 0x1000)    # SizeOfStackCommit
optional_header += struct.pack('<I', 0x100000)  # SizeOfHeapReserve
optional_header += struct.pack('<I', 0x1000)    # SizeOfHeapCommits
optional_header += struct.pack('<I', 0x0)       # LoaderFlags
optional_header += struct.pack('<I', 0x10)      # NumberOfRvaAndSizes

Data Directories

I will not go over each of the data directories to explain what they do exactly as they're currently not necessary for the working executable, I will just show you the layout of this.

optional_header += b'\x00' * 8   # ExportDirectory
optional_header += b'\x00' * 8   # ImportDirectory
optional_header += b'\x00' * 8   # ResourceDirectory
optional_header += b'\x00' * 8   # ExceptionDirectory
optional_header += b'\x00' * 8   # SecurityDirectory
optional_header += b'\x00' * 8   # BaseRelocationTable
optional_header += b'\x00' * 8   # DebugDirectory
optional_header += b'\x00' * 8   # ArchitectureData
optional_header += b'\x00' * 8   # RVAOfGlobalPointer
optional_header += b'\x00' * 8   # TLS Directory
optional_header += b'\x00' * 8   # LoadConfigurationDirectory
optional_header += b'\x00' * 8   # BoundImportDirectoryHeaders
optional_header += b'\x00' * 8   # ImportAddressTable
optional_header += b'\x00' * 8   # DelayReloadImportDescriptors
optional_header += b'\x00' * 8   # .NETHeader
optional_header += b'\x00' * 8   # Reserved

The way I described it above is actually not the way it is supposed to be included. In a normal executable that will use the ImportDirectory for example, the structure would look like this;

data_directories += struct.pack('<I', import_descriptor_rva)   # ImportDirectory RVA
data_directories += struct.pack('<I', shellcode_size)          # ImportDirectory size

Each data directory entry consists of the RVA, which is the offset relative to the start of the executable when it is loaded into memory, and the size of the data directory.

Section Table

We're almost done building our executable from scratch, we just need to add one very important section table; the .text section, this will hold our own shellcode.

The way the sections table is build up is easiest to understand when looking at the C struct which I stole from here;

struct IMAGE_SECTION_HEADER 
{
// short is 2 bytes
// long is 4 bytes
  char  Name[IMAGE_SIZEOF_SHORT_NAME]; // IMAGE_SIZEOF_SHORT_NAME is 8 bytes
  union {
    long PhysicalAddress;
    long VirtualSize;
  } Misc;
  long  VirtualAddress;
  long  SizeOfRawData;
  long  PointerToRawData;
  long  PointerToRelocations;
  long  PointerToLinenumbers;
  short NumberOfRelocations;
  short NumberOfLinenumbers;
  long  Characteristics;
}

So in our case we would create our .text section something like this:

# Align VirtualSize to FileAlignment (0x200)
virtual_size = 0x200 * (int(len(text_section) / 0x200) + 1)

# Size of the .text section fields is 40 bytes
text_section_size = shellcode_size + 40

# .text
text_section = b'.text\x00\x00\x00'                    # Name
text_section += struct.pack('<I', virtual_size)        # VirtualSize
text_section += struct.pack('<I', self.va_base)        # VirtualAddress
text_section += struct.pack('<I', text_section_size)   # SizeOfRawData
text_section += struct.pack('<I', raw_data_ptr)        # PointerToRawData
text_section += struct.pack('<I', 0x0)                 # PointerToRelocations
text_section += struct.pack('<I', 0x0)                 # PointerToLinenumbers 
text_section += struct.pack('<H', 0x0)                 # NumberOfRelocations 
text_section += struct.pack('<H', 0x0)                 # NumberOfLinenumbers 
# Set memory characteristics to RWX for executable code section
text_section_flags = (
    SECTION_FLAGS.IMAGE_SCN_CNT_CODE.value 
    + SECTION_FLAGS.IMAGE_SCN_MEM_EXECUTE.value 
    + SECTION_FLAGS.IMAGE_SCN_MEM_READ.value 
    + SECTION_FLAGS.IMAGE_SCN_MEM_WRITE.value
)
text_section += struct.pack('<I', text_section_flags)

char Name -> this is set to .text\x00\x00\x00 because we need this to be 8 bytes in size;
PyshicalAddress -> this is something that still confuses me, but apparently this is not explicitly specified in the PE(?);
VirtualSize -> this corresponds to the size that will be needed in memory to store the shellcode, be aware that this needs to be aligned to whatever value you set for FileAlignment;
VirtualAddress -> the relative offset from the start of the executable in memory, I have set this to 0x1000 but you can set this to whatever suits your needs;
SizeOfRawData -> the size of your shellcode;
PointerToRawData -> the absolute offset for your shellcode in the binary itself, I have set this to 0x400 and made sure to let my shellcode start at this offset;
Characteristics -> the memory attributes to set the initialized memory segment to when loading the shellcode into memory. I have set this to RWX which stands for Read, Write, and Execute. For a full list of possible values you can refer to this table;

If you want to add more sections you can do so by repeating the above structure and specifying the right offsets and values needed as Characteristics

Shellcode

Now we just need to add some shellcode and test the thingy right? Yes, yes that's exactly what we're going to do.

For my first testing scenario I've developed the sophisticated shellcode of a function prologue, some instructions, and a function epilogue. Click here if you're not familiar with the epilogue/prologue terms.

# Function prologue
text_section += b'\x55'        # push ebp
text_section += b'\x8b\xec'    # mov ebp, esp
# nation-state-like shellcode
text_section += b'\x41' * 16
text_section += b'\x90' * 100
# Function epilogue
text_section += b'\x5d'        # pop ebp
text_section += b'\xc3'        # ret

After adding this amazing piece of shellcode we can verify with PE-bear that it is present at the offset we provided in the .text section:

Ok, but can you debug it?

Popping calc.exe

No proof-of-concept is complete without a calc.exe process, to demonstrate our life's achievements to our significant others, we show them a calculator.

To spawn a calc.exe process we simply need to replace our nation-state shellcode with some calculator shellcode, luckily packetstormsecurity got us covered. Replace the old with the new;

shellcode = b"\x89\xe5\x83\xec\x20\x31\xdb\x64\x8b\x5b\x30\x8b\x5b\x0c\x8b\x5b"
shellcode += b"\x1c\x8b\x1b\x8b\x1b\x8b\x43\x08\x89\x45\xfc\x8b\x58\x3c\x01\xc3"
shellcode += b"\x8b\x5b\x78\x01\xc3\x8b\x7b\x20\x01\xc7\x89\x7d\xf8\x8b\x4b\x24"
shellcode += b"\x01\xc1\x89\x4d\xf4\x8b\x53\x1c\x01\xc2\x89\x55\xf0\x8b\x53\x14"
shellcode += b"\x89\x55\xec\xeb\x32\x31\xc0\x8b\x55\xec\x8b\x7d\xf8\x8b\x75\x18"
shellcode += b"\x31\xc9\xfc\x8b\x3c\x87\x03\x7d\xfc\x66\x83\xc1\x08\xf3\xa6\x74"
shellcode += b"\x05\x40\x39\xd0\x72\xe4\x8b\x4d\xf4\x8b\x55\xf0\x66\x8b\x04\x41"
shellcode += b"\x8b\x04\x82\x03\x45\xfc\xc3\xba\x78\x78\x65\x63\xc1\xea\x08\x52"
shellcode += b"\x68\x57\x69\x6e\x45\x89\x65\x18\xe8\xb8\xff\xff\xff\x31\xc9\x51"
shellcode += b"\x68\x2e\x65\x78\x65\x68\x63\x61\x6c\x63\x89\xe3\x41\x51\x53\xff"
shellcode += b"\xd0\x31\xc9\xb9\x01\x65\x73\x73\xc1\xe9\x08\x51\x68\x50\x72\x6f"
shellcode += b"\x63\x68\x45\x78\x69\x74\x89\x65\x18\xe8\x87\xff\xff\xff\x31\xd2"
shellcode += b"\x52\xff\xd0"

If we now run our Python script, step until we reach the instruction after call EAX which calls the WinExec function, a calc.exe process is started:

Conclusion

Researching this was quite fun and turned out to be actually useful for whenever I need to load some shellcode into x64dbg without having to attach it after using blobrunner (great tool btw).

I have published my code to my github repository, feel free to shoot me a message if something I described is wrong, or could be described better :)

Last updated 7 months ago