Tuesday, February 14, 2017

PHDays CTF Quals – BINARY 500 or Hiding Flag Six Feet Under MBR Bootkit Intel VT x

PHDays CTF Quals – BINARY 500 or Hiding Flag Six Feet Under MBR Bootkit Intel VT x


PHDays CTF Quals took place on December 15-17, 2012. More than 300 teams participated in this event and fought to become a part of PHDays III CTF, which is going to be held in May 2013. Our team had been developing the tasks for this competition for two months. And this article is devoted to the secrets of one of them – Binary 500. This task is very unusual and hard-to-solve, so nobody could find its flag.

This executable file is an MBR bootkit, which uses hardware virtualization (Intel VT-x). Due to the program’s specific features, we decided to warn users that this program should be executed on a virtual machine or an emulator only.


 Warning and license agreement

Dropper

Let’s start with the dropper overview. The main goal of this module is very simple. It is to write files extracted from a resource section into a self-made hidden file system and replace original MBR with a self-made one. It also saves original MBR in the file system. There are few things, which complicate the dropper analysis. First of all, it is written in C++ using STL, OOP, and virtual functions. That’s why all the calls are indirect.



 Virtual function calls in IDA Pro

Secondly, all the disk operations are carried out via the SCSI controller. Instead of the usual ReadFile/WriteFile functions, we use DeviceIoControl with the control code SCSI_PASS_THROUGH_DIRECT, which allows us to communicate with the hard drive on a lower level.

All the files from the resources are encrypted using RC4 and a 256-bit key.

The next thing is the hidden file system. Its structure is pretty simple. The system grows from the end and is written two sectors before the end of the hard drive. First DWORD is a number of files XORed with constant 0x8FC54ED2. Then a directory with information about the files goes:
struct MiniFsFileEntry
{
DWORD fileIndex;
DWORD fileOffset;
DWORD fileSize;
};
The file index is just a constant related to a specific file. Offset is counted in bytes relative to the file system start.

 MiniFs file system structure

MBR

After the dropper ends its operation, it becomes obvious that we have nothing left to do with the operating system and just need to reboot and start debugging the master boot record. There are several ways to debug MBR. There’s no doubt we can analyse it on a real machine using a hardware debugger, but it’s inconvenient and expensive. That is why we recommend to use the VMWare virtual machine (you need to configure an image configuration file at first) connecting to it with the help of the GDB debugger (this method has significant drawbacks, which will be described later) or the Bochs emulator. The main advantage of these methods is that you can use the IDA Pro debugger for analysis and it’s very convenient!

Having chosen our instruments, we are able to get started. The first part of MBR is really simple, and there shouldn’t be any problems with its analysis. It only reads the second part of our MBR (Extended MBR) from the hard drive and writes it to the memory at address 0x7e00 (right after the first part). This operation is important because BIOS maps just the first 512 bytes of MBR and our code exceeds this size.

Analyzing extended MBR, a good specialist will immediately understand that something is wrong, namely that the loader is obfuscated.


Comparison of MBR source code with the IDA Pro analysis

Obfuscation is complicated mainly by indirect function calls. At the very beginning AX registers the address of a function, which scans a specific table (containing function indexes and related offsets) to get the offset of a function to be called. After the function is fulfilled, the control is returned right after the function index constant (return address + 2).


Function table in MBR
MBR obfuscation algorithm

MBR code is pretty simple:

  1. Retrieves hard drive features.
  2. Reads original MBR from the hidden file system.
  3. Replaces our MBR with original MBR at the 0x7c00 address.
  4. Reads and decrypts a hypervisor loader from the file system.
  5. Reads and decrypts a hypervisor body from the file system.
  6. Prepares parameters and passes control to the hypervisor loader.

It should be mentioned that a set of bytes of Bochs BIOS was used for encryption of the hypervisor loader and body. It makes the program system-specific, because it runs correctly only on the Bochs emulator. We decided to use this method for several reasons. Firstly, debugging of Intel VT-x hardware virtualization is possible only on a real machine or using Bochs 2.4.5 or later (so we are already tied to this emulator). Secondly, we didnt want the participants to find encryption keys in the program and decrypt all the hypervisor parts using the static analysis without the debugger. Thirdly, this method prevents users from damaging systems on real machines.

To help the participants, we had published information that they would need Bochs emulator with a working OS image to solve one of the tasks in advance.

VMX Loader

Hardware virtualization is not a new term. It started to spread in 2006 – 2007 when the most well-known CPU developers (Intel and AMD) released processors, which could support related instruction sets. Details on the virtual machine monitor will be provided in the next section. This section will touch upon the methods how to prepare the system for the hardware hypervisor.

As it was mentioned above, it is possible to debug an application, which uses Intel VT-x virtualization, only on real machine or using Bochs 2.4.5 or above, but it is not the only problem. The default emulator build does not support hardware virtualization. That is why we had to compile our own build of Bochs and provide a link to it in the first hint to the task.

The main goal of the hypervisor loader is to move the hypervisor’s body above the first megabyte and transfer control to its entry point. However, it carries out some non-trivial operations, which will be covered below.

There are several input parameters including a base address, which is used as a code segment base. It is set by a far jump.

Then the CPUID instruction checks that code is executed on the Intel system (zero function) and that hardware virtualization is supported by the processor (first function). Let’s take a closer look. Firstly, we call CPUID with value 1 in the EAX register. After the execution, the fifth bit of the ECX register (VMX flag) should be checked. If it is set, then hardware virtualization is supported. To check if virtualization is blocked on the early boot stages (BIOS), we need to read 0x3A MSR register. If the first bit of the EAX register is set after RDMSR instruction execution and the second bit is clear then virtualization is blocked.

Then the loader calls a function, which reads the system memory map. This is achieved by calling interrupt 0x15 in the cycle with the 0xE820 value in the EAX register. That’s how the buffer is filled with records of memory regions. Then the memory map is checked for a free area suitable for the monitor body. If such a memory is found, it is marked as reserved.

To move monitor body above the first megabyte, we need to switch the processor from a real mode to a protected or long mode. We decided to switch directly to the long mode as the hypervisor body works in it. We need to satisfy several conditions: prepare paging structures (PML4, PDPT, a number of PDs for 2MB pages), set PAE bit in the CR4 register, load the PML4 address to the CR3 register, set up GDTR with the long-mode segment registers, set the LMA bit in the MSR EFER register and set the PG and PE bits in the CR0 register. After these operations, we should make a far jump to switch the processor to the long mode.

We noticed at this moment that the IDA Pro 6.1 debugger has a bug, which prevents it from calculating a correct far address, and it shows users some garbage data (this bug is fixed in IDA 6.3). It seems that IDA does not use register values from the Bochs debugger and makes wrong calculations by itself. That is why we recommended the participants to use the built-in Bochs debugger.

The last step is to copy the body to the destination address and transfer control to the entry point.

VMX Hypervisor

Specifically for this task we wrote a thin hypervisor, which:

  • Enters the VMX-root mode.
  • Sets the VMCS structure to start the guest system in the real mode starting from the 0x7c00 address.
  • Sets up guest exit handlers.
  • Starts a guest by executing the VMLAUNCH instruction.

The main goal of a participant is to find a guest system exit handler and analyze its code.

Flag

Obtaining the virtual machine exit handler, a participant came to the final stretch, and only a small task was needed to be solved.

It is obvious from handlers code that if  the CPUID instruction causes an exit and the EIP register contains a specific value then the handler creates an array (32 bytes) from the values of the registers EAX, ECX, EDX, EBX, ESI, EDI, ESP, EBP and then this array is checked for validity. The handler inserts vector (x_0,…,x_31 ) to the set of equations of the following type:
If the equality is satisfied then the vector is valid and used as a key for buffer decryption. Therefore, a participant needs to solve a set of 32 equations with 32 variables. The only thing that complicates the analysis is that the validation algorithm uses a floating point unit (FPU) instruction set.

There is one more (final) MBR in the encrypted buffer which contains a plaintext flag. This bootstrap substitutes the original MBR, and its goal is to display the flag on the screen.


Example of a displayed flag

Test application

Specifically for testing, we have developed an application, which allocates memory to a given address, writes CPUID and a few other instructions with regard to a specific offset (address + offset = the needed EIP value), sets up registers and passes control to the given address. Therefore, when the CPUID instruction is carried out, the hypervisor takes control over, checks the register values, and reboots the system displaying the flag on the screen.


Example of a test application

Conclusion

Developing this application, we wanted to create something unusual, a program which would be interesting for the whole team, because to solve this task, the participants needed to have skills in Win32 reverse engineering, analysis of MBR executed in the real mode, encryption and obfuscation algorithms analysis. This task required both static and dynamic analyses. The participants needed to have basic knowledge of hardware virtualization and assembler x86-64; to use their mathematical skills to obtain the flag.

We really hope that we managed to interest both the participants and the readers of this review!

From the authors

We decided to write this task three weeks before the start of the qualifications and were absolutely sure that would finish very soon, but our expectations were not met. We had finished the task just a few hours before PHDays CTF Quals started and did not have any time to test it or fix the bugs. We were only sure that it was possible to obtain the flag, but the operating system ran not so well in the virtual environment. It displayed blue screens of death from time to time and didnt want to boot after resetting the system. While writing this article, we had some time to fix the bugs and release a more stable task. Unfortunately, this time was not enough either to regulate the operating system. Follow the links to download the last version of the task and watch the video demonstrating the task and test application operation.

Thanks to everybody!

Task Archive

Max Grigoryev, Sergey Kovalev, Positive Research 

Available link for download