Cross-Referencing stand-alone Dalvik Bytecode

A few days ago Patrick Schulz from BlueBox Security
posted an Android Challenge on BlueBox’ blog. In this blogpost we will
not go into the entire challenge, but rather focus on the patched bytecode.

Shameless self-promotion: tweet, reddit (I didn’t even have to make
the reddit post, hah.)

Introduction

After reading the blogpost, including the spoiler, it’s evident that the
native library will patch the bytecode of a particular function that was
originally implemented in classes.dex (the container which keeps all
dalvik bytecode with metadata.)

As part of research I’m doing for my presentation at AthCon I found
the patching process interesting in particular. This is actually a technique
I’ve thought about earlier, but then again, I’m sure many people have

Just-in-Time Bytecode

In order to speed up the process of executing the Dalvik bytecode, Android has
a Just in Time compiler, which may compile certain functions into native ARMv7
instructions. This allows the virtual machine to execute faster compared to
interpreting the bytecode naively.

I do not know the following for sure, as it depends on the internals of the
dalvik JIT, but it may require the bytecode to be patched before executing
it. If we were to patch the bytecode after it has been compiled by the JIT,
then who’s going to execute it? (This is just a side-note for anyone looking
to do the same in the future.)

Locating the Bytecode

Opening up the native library that can be found inside the apk, we find
ourselves with various functions dealing with the Dex file format.
After looking through the functions for a minute or two, we get to a
mprotect() call followed by a memcpy() call, this is where the function is
being patched, as described in the spoiler by the Patrick’s blogpost.

Extracting the Bytecode

I loaded the native library in IDA Pro. It appears that the symbols were not
stripped, so that makes it easier for us as well. Anyway, when the relevant
memcpy function is found, we see an obvious inject_ptr. Which is a pointer
to the target bytecode. We extract the few hundred bytes of bytecode directly
from the binary, as it’s not encrypted or anything, and put the hexdump in a
file. (Use the Hex View in IDA Pro.)

We then translate the hex dump into a binary file using the following command.

$ xxd -r -p hexfile binfile

Analyzing the Bytecode

Now we’ve got raw bytecode. It appears that dexdump doesn’t really know what
to do with it. We can do two things now. The first option would be to create
a new .dex file with the patched function (i.e., patching the original
.dex file with the new bytecode), but I don’t feel like doing that.
The second option would be to disassemble the raw bytecode and to fix all
cross-references to other methods, strings, etc. So that’s what I did.

For simplicity sake I wrote some Python bindings for my dalvik disassembler,
which I made a few months ago and is unfortunately still not public.
Disassembling the raw bytecode then results in the following output.

$ python dalvik.py binfile
0 const/4 v2, #+0
2 invoke-virtual {v12}, meth@3103
8 move-result v4
10 new-instance v5, type@475
14 invoke-direct {v5}, meth@3149
[...]

However, as you may notice, we’re missing some information here. So I also
wrote some Python bindings around my Dex file parser, which is still private,
just like the dalvik disassembler. The references in the bytecode, i.e.,
meth@3103 etc., are references to the original .dex file, so I dumped all
the relevant tables from the original .dex file into a simple database file
(actually just a pickle‘d dictionary, to make life easy.)

Having a database with all lookup tables we can now continue onto the
disassembling part. When disassembling a dalvik instruction, the disassembler
also returns whether there’s a lookup and in which table this lookup is.
Printing the correct information next to the instruction is therefore as easy
as reading from the correct table with the correct index. This looks like the
following in the disassembler code.

length, d = disasm(...)
if d.kind is None:
    print offset, d.string
else:
    print offset, d.string, ';', c[d.kind][d.index]

Disassembling again, with the database file as parameter, we get the following
output.

$ python dalvik.py -c bindb binfile
0 const/4 v2, #+0
2 invoke-virtual {v12}, meth@3103 ; ()I Ljava/lang/String; length
8 move-result v4
10 new-instance v5, type@475 ; Ljava/util/HashMap;
14 invoke-direct {v5}, meth@3149 ; ()V Ljava/util/HashMap; <init>
[...]

Now I thought that was pretty cool, so..

Challenge Spoiler

For those of you that would like to do the challenge without having to write
several hundreds if not thousands of lines of code and/or without directly
patching the binary, the complete output of the bytecode can be found
here.

Small note: it appears my disassembler doesn’t really understand signed
shorts at the moment, but that’ll be fixed another time.

Toolz

I will release all the tools after my AthCon presentation. In the meantime,
I’ll be working on extending the code to do lots of other cool stuff with it
\o/

Development & Security

By Jurriaan Bremer @skier_t

Cross-referencing stand-alone Dalvik Bytecode