Abusing Forced Inline Part 2: Breakpoints

Abstract

In the last blogpost we’ve discussed Forced Inline, and how one could use
it to obfuscate binaries (without obfuscating the source code itself.)

For a complete introduction to all of the basics that will be referenced
to in this article (as well as the first three methods of obfuscation),
go ahead and read the
first part of this post.

Throughout the next three paragraphs we will discuss three methods which
do not obfuscate, but make debugging harder.

Finally, we present the source and binaries of the methods explained here.

In a few days to a week, a crackme (which combines the six methods
discussed so far), will be released, so make sure you understand
everything

Method IV: Ensuring Memory Integrity

This technique is actually not meant to obfuscate source code further,
instead it’s a simple, but very effective, method which detects
modifications to our machine code. In other words, when a Reverse Engineer
is analyzing your binary dynamically (that is, using a debugger), then the
Reverse Engineer will usually set so-called Software Breakpoints
^[1]. When a Software Breakpoint
is set on a particular address, the byte at this address will be replaced
by an interrupt three instruction (“int 3″, or 0xcc in machine code.) Note
that it is possible to use other machine code representations to achieve
the same effect, but 0xcc is by far the most commonly used one.

Now we know this, we can implement a very easy check which hashes a part
of memory. If we, for example, hardcode the correct hash, then we can
check the calculated hash against the hardcoded hash. If they don’t match
then that means that the memory was altered, e.g. a Software Breakpoint
was set. From here on one would usually do something like terminating the
process, or crashing it by reading from an invalid address, etc.

As usual, a snippet says more than a 1000 words, so here we go.

void calculate_data(int in, int *out)
{
    // stores five times the input `in' into `out'
    *out = in * 5;
}

FORCEDINLINE void calculate_data_obf(int in, int *out)
{
    static unsigned long hash = 0;

    // calculate the hash of the calculate_data function
    // and whatever is behind it (e.g. main)
    unsigned long h = calculate_hash(&calculate_data, 0x200);

    // first time we check?
    if(hash == 0) {
        // then set the hash
        hash = h;
    }
    // hash doesn't match?
    else if(hash != h) {
        printf("memory has been altered (in: %d, out: 0x%08x)!\n",
            in, out);
        return;
    }
    // hash checks out, call the original function.
    calculate_data(in, out);
}

void overwrite_function()
{
    // windows only, doh.
    DWORD old;
    VirtualProtect(&calculate_data, 1, PAGE_EXECUTE_READWRITE, &old);

    // overwrite the first instruction of `calculate_data' with a
    // return instruction, this way the `out' variable will not
    // be set.
    *(unsigned char *) &calculate_data = 0xc3;
}

#define calculate_data calculate_data_obf

int main(void)
{
    for (int i = 0, out; i < 25; i++) {
        calculate_data(i, &out);
        printf("%d -> %d\n", i, out);

        // the 10th time we will patch the function
        if(i == 10) {
            overwrite_function();
        }
    }
}

The interesting stuff happens in the calculate_data_obf function,
as you can see. In this function a hash is calculated based on the machine
code of the calculate_data function. The first time the hash is
calculated, it will be stored. If the function is called at a later time,
and the hash doesn’t match, then a printf call is executed and the
original calculate_data function call is omitted.

Before we proceed, let’s take a look at the output of this program.

$ ./memory_integrity
0 -> 0
1 -> 5
2 -> 10
3 -> 15

... snip ...

9 -> 45
10 -> 50
memory has been altered (in: 11, out: 0x003afcf8)!
11 -> 50
memory has been altered (in: 12, out: 0x003afcf8)!
12 -> 50
memory has been altered (in: 13, out: 0x003afcf8)!

... snip ...

23 -> 50
memory has been altered (in: 24, out: 0x003afcf8)!
24 -> 50

What happened here is the following. The first ten times the calculated
hash was correct, however, the 10th iteration we called the
overwrite_function method which overwrites the
first byte of the calculate_data function. This results in an
invalid hash being calculated by the calculate_hash function, and
we land at the printf call.

Although the
snippet above uses a hash calculated the first time it is ran, one would
normally use a hardcoded value, or even better, do a mathematical
calculation with it and use the result as address to read from, etc.

This technique can be really effective when used correctly, e.g. one could
do memory checks on other memory checks so a Reverse Engineer cannot
easily alter hardcoded hashes, etc.
You can make it as complex as you like

Method V: Defeating Hardware Breakpoints

Yet another method to defeat Breakpoints, this time targetting Hardware
Breakpoints ^[2]. Hardware
Breakpoints are different compared to Software Breakpoints. Where Software
Breakpoints alter the machine code, Hardware Breakpoints alter the Debug
Registers. Besides that, Debug Registers are specific to one thread
(they’re registers after all), there is only place for four breakpoints
(per thread), and you can only read or set them using an API.

Unfortunately this brings us some limitations, for example, because we
need an API to read the Debug Registers, these registers can be spoofed
(by hooking that particular API.) In a future post we will be discussing
techniques to bypass these kind of hooks (as far as usermode goes), but
for now we will stick to the normal API.

So, now we have learned the basics about Hardware Breakpoints, time for
some example code.

void calculate_data(int in, int *out)
{
    // stores five times the input `in' into `out'
    *out = in * 5;
}

FORCEDINLINE void calculate_data_obf(int in, int *out)
{
    // check if any hardware breakpoints were set
    CONTEXT ctx = {CONTEXT_DEBUG_REGISTERS};

    // Debug Register 7 contains the flags (type of breakpoint
    // and amount of bytes), so checking that against nonzero
    // provides enough information for us
    if(GetThreadContext(GetCurrentThread(), &ctx) == FALSE ||
            ctx.Dr7 != 0) {
        printf("No Hardware Breakpoints please.\n");
        return;
    }

    // no hardware breakpoints were found, let's
    // continue with the real function.
    calculate_data(in, out);
}

#define calculate_data calculate_data_obf

int main(void)
{
    for (int i = 0, out; i < 25; i++) {
        calculate_data(i, &out);
        printf("%d -> %d\n", i, out);

        // the 5th time we set a hardware breakpoint
        if(i == 5) {
            CONTEXT ctx = {CONTEXT_DEBUG_REGISTERS};
            ctx.Dr0 = (DWORD) &calculate_data;
            ctx.Dr7 = 0x00000001;
            SetThreadContext(GetCurrentThread(), &ctx);
        }
    }
}

This example has the same structure as the example snippet from
Method IV, so most of the code should be
fairly obvious. The only difference is that this example checks for Debug
Registers in the inlined obfuscation handler, and that it enables a
Hardware Breakpoint after the fifth iteration. If you’d like to know more
about the magic value that initializes Dr7 please check out
these few functions.
The Dr0 Debug Register contains the address that we want a
hardware breakpoint on.

We will omit the example output, as it is very similar to that of Method
IV’s.

It should be noted as well that the example snippet above will not work
out of the box, in order to actually use the Hardware Breakpoint an
Exception Handler should be installed. In order to do this one could use
SetUnhandledExceptionFilter
or
AddVectoredExceptionHandler.
From there code to process the thrown exceptions (Hardware Breakpoints
throw exceptions when they’re hit) can be implemented.

Method VI: Beating Page Protection

Besides Software and Hardware Breakpoints there is yet another method to
trap memory access (that is, execution as well as reading and writing.)
This
technique utilizes the Access Protection
^[3]
of a page. In windows, pages
usually have a size of 4KB, and one can specify different access
protection for each of these pages. It is, for example, possible to set a
page to read only, execute only, exception on access, etc.

Therefore we have to check these access protections. As the code is, yet
again, fairly easy, we go straight to the example snippet.

void calculate_data(int in, int *out)
{
    // stores five times the input `in' into `out'
    *out = in * 5;
}

FORCEDINLINE void calculate_data_obf(int in, int *out)
{
    MEMORY_BASIC_INFORMATION mbi;

    // by default the code section is read+execute
    if(VirtualQuery(&calculate_data, &mbi, sizeof(mbi)) == FALSE ||
            mbi.Protect != PAGE_EXECUTE_READ) {
        printf("Oboy, you're doing it again!\n");
        return;
    }

    // continue with the real function.
    calculate_data(in, out);
}

#define calculate_data calculate_data_obf

int main(void)
{
    for (int i = 0, out; i < 25; i++) {
        calculate_data(i, &out);
        printf("%d -> %d\n", i, out);

        // the 12th time we change the memory protection
        if(i == 12) {
            // set the page to read + write + executable
            DWORD old;
            VirtualProtect(&calculate_data, 0x200,
                PAGE_EXECUTE_READWRITE, &old);
        }
    }
}

Well, as expected, printf is called after the 12th iteration,
because the page protection is incorrect. It is interesting to note that,
when changing the access protection to e.g. PAGE_NOACCESS, the process
will actually throw an exception and crash (because we didn’t install an
exception handler.) This is because the main function is located on
the same page as the calculate_data function (or atleast in the
binary we built), therefore the no access page is triggered (after
returning from VirtualProtect), resulting in
an uncatched exception. (Note that this is exactly what a Reverse Engineer
would want, because after setting up an exception handler this would
result in something similar to single-stepping
^[4].)

Unfortunately, this method uses an API as well, which means a Reverse
Engineer could spoof the access protection of the particular page.
However, it presents a very nice method, assuming it is difficult for an
attacker to spoof the access protection (e.g. if it’s only possible from
the kernel, because we execute the system call directly instead of just
calling the API.)

Source and Binaries

Source and Binaries for all Forced Inline posts can be found
here.

Development & Security

By Jurriaan Bremer @skier_t

Abusing Forced Inline Part 2: Breakpoints

Table of Contents:

Abstract

Method IV: Ensuring Memory Integrity

Method V: Defeating Hardware Breakpoints

Method VI: Beating Page Protection

Source and Binaries

References

Leave a Reply Cancel reply