Darm Updates - More ARMv7, More Thumb
Darm is an ARMv7 disassembler in C. This blogpost is just
a small update about the new stuff in darm from over the past couple of
months, as there were some delays due to conferences and other stuff
Thumb support
Most notably, recently Darm has gained support for the Thumb instruction
set. Those of you familiar with ARMv7 know ARMv7 has two modes, namely, ARMv7
and Thumb. ARMv7 contains pretty much all the instructions you’d ever need,
but Thumb is a small subset of the most used ARMv7 instructions and are only
16 bits in size, whereas ARMv7 instructions are 32 bits in size. Needless to
say, Thumb allows for more compact code.
The API to disassemble Thumb instructions is as straightforward as the
equivalent function for disassembling ARMv7 instructions. Furthermore, the two
instruction set modes share the same data structure, darm_t, hence it is
easily possible to write generic analysis routines without having to worry
whether you’re analyzing ARMv7 or Thumb.
Currently, the C API looks roughly like the following. (Including the Thumb2
function, for more information on that, read further.)
typedef struct _darm_t { [...] } darm_t; // disassemble an armv7 instruction int darm_armv7_disasm(darm_t *d, uint32_t w); // disassemble a thumb instruction int darm_thumb_disasm(darm_t *d, uint16_t w); // disassemble a thumb2 instruction int darm_thumb2_disasm(darm_t *d, uint16_t w, uint16_t w2);
ARMv7 Improvements and Bug Fixes
ARMv7 has mostly had some bug fixes and a couple of new instructions. Nothing
too spectacular, but it’s still improving as I find bugs and stumble upon new
instructions.
Coming up: Thumb2 support
Currently I’m working on getting support for the Thumb2 instruction set as
well. As the Thumb instruction set is fairly limited with regards to the
instruction that it can handle, as it’s only 16 bits in size, rather than 32
bits, there’s also the Thumb2 extension. Thumb2 features almost all (except
for maybe a handful of instructions) of the instructions which are also
available in the ARMv7 instruction set, hence allowing the optimized Thumb
instructions to be mixed with Thumb2 instructions, which are, as ARMv7, 32
bits in size.
Having said that, if there are requests for instructions which you’d like to
see sooner rather than later, please do contact me. At first I aim to support,
let’s say, 90% of the binaries while keeping the amount of implemented
instructions to a “minimum.” That is, I’ll focus on the most used Thumb2
instructions at first, and go for the complete instruction set later.
Difference between ARMv7 and Thumb/Thumb2
A small explanation on ARMv7 vs Thumb/Thumb2.
When executing ARM instructions, the instruction will be executed as ARMv7
instruction whenever the address is 4-byte aligned, and executed as either
Thumb or Thumb2 instruction, depending on its encoding, when the lowest
significant bit is set. That is, when the address is not 4-byte aligned, but
instead either addr+1 or addr+3 (with addr being a 4-byte aligned pointer),
then the instruction is decoded as being either Thumb or Thumb2.
The instruction is either decoded as Thumb or Thumb2 depending on a couple of
the most significant bits. When decoded as Thumb, one 16 bit word is fetched
and executed. When decoded as Thumb2, a second 16 bit word is fetched and the
instruction is decoded as if it were a 32 bit word.
At the following lines of code we can see the comparison of the
upper 5 bits of the first 16 bit word. When the upper five bits equal either
b11101 (binary 11101, or 29 in decimal), b11110, or b11111, then it is a
Thumb2 instruction. Otherwise it’s a Thumb instruction.
Also note that at the moment there are two seperate functions to disassemble
Thumb and Thumb2 instructions, but don’t worry, in the future there’ll be a
nice wrapper around them