Abusing Forced Inline in C

Table of Contents:

Abstract

This post presents the reader ways to abuse forced inlining, which is
supported by both GCC and Microsoft Visual C/C++ compiler.

Throughout the next paragraphs we will introduce the reader to inlining,
its purpose, several ways to abuse forced inlining, and some additional
notes.

Finally, we present the reader source and binaries of the methods
described.

Introduction

Before we get any further, here is a quick introduction to inlining.
From Wikipedia[1]:


In various versions of the C and C++ programming languages, an
inline function is a function upon which the compiler has been
requested to perform inline expansion. In other words, the programmer
has requested that the compiler insert the complete body of the
function in every place that the function is called, rather than
generating code to call the function in the one place it is defined.

For those of us, that are still unsure about inlining, here is a simple
example.

#include <stdio.h>
inline int MAX(int a, int b)
{
    return a < b ? b : a;
}
int main(void)
{
    printf("max(2, 3): %d\n", MAX(2, 3));
}

Although this might be a bad example, because the compiler might have
inlined it anyway (without specifying the inline attribute), it’s a very
easy one. So basically what happens, assuming the compiler inlines the
MAX function, is the following.

#include <stdio.h>
int main(void)
{
    printf("max(2, 3): %d\n", 2 < 3 ? 3 : 2);
}

As you can see, instead of calling the MAX function, the compiler
has rewritten the main function in such a way that it inlines the
MAX function. Nowadays compilers are smart enough to see that
the equation evaluates to three, so if you analyze the binary, it will
probably say three instead of an equation, but the point should be clear
by now.

So, now we know what inlining does, forced inlining is quite
self-explanatory. It enforces the compiler to inline a certain function.

Normally one uses inlining to gain increased performance, for example in
performance critical code sections, it’s useful to inline a function such
as max (returns the biggest of two given numbers)
because the cpu does not need perform all the work involved with
calling a function (storing/loading the return address and
possible basepointer), in addition to ensuring that the page
containing the called function is loaded into cache.
Besides optimalizations regarding caching, the compiler might also able to
optimize further when inlining, this happens for example when one (or
more) of the parameters to the function are known compile-time.

However, it should be noted that inlining relatively bigger functions
(those with more than a few lines of code) can be fairly expensive. This
is because the CPU can cache functions (or, actually, pages that the
functions are located on) that are commonly called. But if
such function is inlined multiple times, then the CPU will not be able to
recognize that, and it will not be able to cache the particular function
(e.g. if max is inlined several times, the CPU will not recognize
one from another.)
So, as usual, the techniques discussed here will drain some performance.

As mentioned earlier, both GCC and MSVC support forced inlining. In GCC
we give a function the __attribute__((always_inline)) attribute,
whereas MSVC requires us to specify __forceinline. In order to
solve this without worrying too much about the compiler, we create a
simple macro which expands to the correct attribute depending on the
compiler. This macro looks like the following.

#ifdef __MSVC__
#define FORCEDINLINE __forceinline
#else
#define FORCEDINLINE __attribute__((always_inline))
#endif

Now we’ve seen what inlining does, and how we enforce the compiler to
inline a function, we’re up for some more tricks which we use in order to
apply our obfuscations.

Function Overloading through Macros

Although we will discuss methods to obfuscate binaries, we do not want to
obfuscate the source, as obfuscating the source makes it unreadable etc.

So, what we do is, we overload functions. And we do this through macro’s.
By overloading functions we have the ability to create transparant
obfuscations, that is, obfuscations while barely adjusting the original
source.

So, instead of asking someone to rename all occurences of MessageBoxA() to
MessageBoxA_Obfuscated(), or similar, we do this transparantly. This also
means that the obfuscations can be disabled simply by not including a
header file (assuming the obfuscations are defined in a header file.)

Function overloading does however bring some small problems, but they are
quite easily overcome and, fortunately for us, when they do go wrong, it’s
easy to spot what’s going wrong (you’ll get errors compile-time.)

What we do is the following. We make a macro for each function that we
wish to obfuscate. This is the tricky part, because the macro should be
defined after the function has been declared (or implemented, for
that matter.) There are two things we can do. We can either make a header
file containing each obfuscation (which is then included into our C file
of choice), or define the obfuscations after the function has been
declared/implemented (e.g. put the function on the top of a source file,
followed by the obfuscation code.)

Although the latter method is a bit ugly; you essentially mix the
real code with obfuscation code. The first method is
reasonable and doesn’t make you puke while developing further (because all
obfuscations can be developed in a seperate file.) The first method does
however have one requirement: if you have all obfuscations in a
single header file, then you have to put a #undef funcname before
each function that is declared as obfuscation. Otherwise you get one of
those funky compiler errors.

Note that, after undefining a macro (using
#undef), the obfuscation will not be applied to function calls to
this particular function which are made after the function
definition in the same source file.

So, let’s get to some example code to illustrate this method (we use the
second method; defining obfuscation after the declaration/implementation
of the function.)

int foo(int a, int b)
{
    return a + b;
}
FORCEDINLINE int foo_obfuscated(int a, int b)
{
    Sleep(1);
    int ret = foo(a, b);
    Sleep(1);
    return ret;
}
#define foo foo_obfuscated
int main(void)
{
    printf("foo(2, 3): %d\n", foo(2, 3));
}

As you can see, we have redirected execution flow from the foo
function to foo_obfuscated. From here the foo_obfuscated
function, which has the forced inline attribute, can do anything before
and after calling the original function. In the example it sleeps for a
millisecond before and after calling the original function.

So, because we’ve specified the forced inline attribute on the
foo_obfuscated function, the compiler will interpret this code as
something like the following.

int foo(int a, int b)
{
    return a + b;
}
int main(void)
{
    Sleep(1);
    int ret = foo(2, 3);
    Sleep(1);
    printf("foo(2, 3): %d\n", ret);
}

As you can see, the foo_obfuscated code is entirely inlined into
the main function. Now let’s see an example in which the
obfuscation is defined, but not applied (see the note about
#undef a few lines up.)

// implementation of the obfuscation for `foo'
FORCEDINLINE int foo_obfuscated(int a, int b)
{
    Sleep(1);
    int ret = foo(a, b);
    Sleep(1);
    return ret;
}
#define foo foo_obfuscated
... snip ...
// implementation of the `foo' function
// we have to undefine the foo macro,
// otherwise we'd get compiler errors
#undef foo
int foo(int a, int b)
{
    return a + b;
}
... snip ...
int main(void)
{
    // this function call to `foo' is NOT obfuscated
    printf("foo(2, 3): %d\n", foo(2, 3));
}

This is really all there is to it, so now we’re
done with the introduction, it’s time to abuse forced inline.

Abusing Forced Inline

In the following paragraphs we will see how one abuses forced inline.
These methods mainly represent simple obfuscation techniques, but when
combined, they can be very, very annoying towards Reverse Engineers and/or
static code analysis tools.

If you’re unsure about a certain method, or if you’d like to see what kind
of horror the compiler makes using it, then load the binary (which
can be found in the Source and Binaries
section) in a tool such as IDA Pro [2].

Note that in the examples below, we will be using MSVC. This means that,
when porting the examples to GCC, any inline assembly will have to be
rewritten.

Method I: Inserting Garbage

The first method which we will investigate is the following. Instead of
calling the original function directly, we insert some garbage code around
the function call. This is a well-known technique used in so-called
polymorphic viruses. Let’s take a look at the following example.

int foo(int a, int b)
{
    return a + b;
}
FORCEDINLINE int foo_obfuscated(int a, int b)
{
    __asm {
        jmp short lbl
        _emit 0xb8
        lbl:
    }
    int ret = foo(a, b);
    __asm {
        jmp short lbl2
        _emit 0xb8
        lbl2:
    }
    return ret;
}
#define foo foo_obfuscated
int main(void)
{
    printf("foo(2, 3): %d\n", foo(2, 3));
}

In this example we surrounded the call to foo with some garbage
code. The short jump is a simple jump which tells the CPU to step over the
emitted byte (an emitted byte is placed directly into the binary.) The
emitted byte is the first byte for the “mov eax, imm32″ instruction. In
other words, if a disassembler disassembles the 0xb8 byte, it will take
the following four bytes as well (as they are required as the immediate
operand.)

From there the rest of the disassembly will be corrupted,
because the disassembler processed five bytes for the 0xb8 byte, while we
inserted only one byte. The first four bytes of the following
instruction(s) have been processed as part of the mov instruction,
instead of the original instructions they were meant to be.
This is an easy trick and therefore it’s also easy to ignore it when
disassembling, however, in combination with other tricks it can be quite
annoying.

Method II: Parameter Shuffling

In this technique we will redirect execution to an obfuscation helper
function, which we will give a few extra parameters (these are generated
in the inlined function.) The inlined function will call the obfuscation
helper function, which is not inlined. Together the obfuscation and
obfuscation helper functions do some stuff that is not even remotely
useful, such as allocating and freeing heap objects. From the helper
function it will call the original function. Without further ado, let’s
dive into some example code.

int foo(int a, int b)
{
    return a + b;
}
int foo_obfuscated_helper(void *mem, int b, int size, int a)
{
    free(mem);
    return foo(a, b);
}
FORCEDINLINE int foo_obfuscated(int a, int b)
{
    return foo_obfuscated_helper(malloc(42), b, 42, a);
}
#define foo foo_obfuscated
int main(void)
{
    printf("foo(2, 3): %d\n", foo(2, 3));
}

The example code presented above does nothing more (useful) than the
original function, however, we’ve managed to add useless heap routines,
an unused parameter and we’ve shuffled the a and b
parameters, which is really annoying.

Method III: Encrypting Object Context

It is not quite uncommon to implement a simple stand-alone library in C
which will only be accessible using a simple API. Take for example a
simple memory manager, a wrapper around reading and/or writing of files,
etc. In the following example we’ll demonstrate encryption of the
context variabele of a memory manager, that is, the context is encrypted
and decrypted for every function call regarding this particular API.

Although it’s far from “real” encryption, it’s still useful because the
state of the memory manager context will be semi-encrypted in times when
it’s not used. For the entire code, please refer to the source code, here
we’ll only see useful snippets of the code.

//
// memory manager api header include
//
typedef struct _memory_t {
    // memory size used
    int used;
    // memory size left
    int left;
} memory_t;
// yes, there is no `free' memory API..
void memory_init(memory_t *mem, int maxsize);
void *memory_get(memory_t *mem, int size);
void memory_status(memory_t *mem, int *used, int *left);
void *memory_destroy(memory_t *mem);
... snip ...
//
// obfuscation header include
//
// simple encryption/decryption function which
// xor's the input with 0x42.
FORCEDINLINE _object_crypt(void *obj, int size)
{
    for (int i = 0; i < size; i++) {
        ((unsigned char *) obj)[i] ^= 0x42;
    }
}
FORCEDINLINE void memory_init_obf(memory_t *mem, int maxsize)
{
    memory_init(mem, maxsize);
    // encrypt memory context
    _object_crypt(mem, sizeof(memory_t));
}
FORCEDINLINE void *memory_get_obf(memory_t *mem, int size)
{
    // decrypt memory context
    _object_crypt(mem, sizeof(memory_t));
    // call original function
    void *ret = memory_get(mem, size);
    // encrypt memory context
    _object_crypt(mem, sizeof(memory_t));
    // return the.. return value..
    return ret;
}
FORCEDINLINE void memory_status_obf(memory_t *mem, int *used, int *left)
{
    // decrypt memory context
    _object_crypt(mem, sizeof(memory_t));
    // call original function
    memory_status(mem, used, left);
    // encrypt memory context
    _object_crypt(mem, sizeof(memory_t));
}
FORCEDINLINE void memory_destroy_obf(memory_t *mem)
{
    // decrypt memory context
    _object_crypt(mem, sizeof(memory_t));
    // call original function
    memory_destroy(mem);
    // no need to encrypt, context is no longer valid anyway
}
#define memory_init memory_init_obf
#define memory_get memory_get_obf
#define memory_status memory_status_obf
#define memory_destroy memory_destroy_obf
... snip ...
//
// real code here
//
int main(void)
{
    memory_t mem;
    memory_init(&mem, 1000);
    void *a = memory_get(&mem, 10);
    void *b = memory_get(&mem, 20);
    // do something with `a' and `b'
    // get the status (and compare it with
    // the "encrypted" status)
    int used, left;
    memory_status(&mem, &used, &left);
    printf("Real status: %d %d\n", used, left);
    printf("Encrypted status: %d, %d\n",
        mem.used, mem.left);
    memory_destroy(&mem);
}

Executing this will result in the following.

$ ./encrypted_context
Real status: 30 970
Encrypted status: 1111638620, 1111638408

It should be clear by now that this is a pretty interesting technique. You
can “encrypt” virtually anything; sockets, handles, or even
strings (although you won’t be able to do stuff like ptr[0] in that case.)

As an additional note, here is the output difference between the original
Graph Overview (from IDA), and the one after applying the obfuscations.

Additional Notes

Although the methods proposed here are interesting and perhaps useful.
Please do make sure you apply them correctly and, in the case of an API
library, for the entire library; not providing context
encryption/decryption for some function in the entire library results in
undefined behaviour for the functions that have not been obfuscated. The
same goes for raw access to an encrypted data structure (as can be seen in
the example above, where the Encrypted status returns garbage.)

Also keep in mind that undefined behaviour will occur when using an
encrypted object in a multi-threaded manner. (E.g. two threads
might decrypt the object at the same time.)

Even though the examples provided are really, really basic. They show what
one could do using forced inline and, when combining the methods, they can
become quite a pain in the arse.

Source and Binaries

Source and Binaries for all Forced Inline posts can be found
here.

References

  1. Inline Function - Wikipedia
  2. IDA Pro

4 thoughts on “Abusing Forced Inline in C

  1. an alternative way to force inline is “extern inline”, e.g.:

    extern inline int add(int a, int b) { return a+b; }

    • Cool, I did not know about this trick.
      However, I’ve tested it using MSVC and (with my current settings) it does not seem to work (e.g. memory_get_obf is not inlined etc.)
      Maybe I’m missing something, or maybe it does work for GCC? (The first two examples did in fact work.)

  2. I found the _emit 0xb8 trick nice. However, it’s highly dependent on the compiler supporting inline assembly or the will to write assembler stubs for every platform you support.

    In other words, is there such a thing as a cross platform obfuscation bag of tricks?

    • You are right, there is not really a cross-platform solution for this particular trick.
      However, you should be able to add support for some platforms without too much hassle;

      MSVC x86: _emit 0xb8
      gcc x86: __asm__(“.byte 0xb8″)
      etc.

      The other tricks should work cross-platform (except for the windows specific things which are discussed in part two of this article, but you might be able to find similar techniques for other platforms.)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>