Python Binary Extensions for Compilers

Python Binary Extension for Compilers

Or, pre-processing the pre-preprocessor

As many have stumbled upon before, the C standard does does not support Binary
Constants. In this post we’ll see how we can add support for Binary Constants
ourselves, in a hacky hacky way, of course!

Introduction

Inspired by the Python syntax, where 0b111 equals 7, while working a bit
on a potential ARMv7 disassembler (which will likely never be finished), I
decided that C should support binary constants just like Python.

Turns out there is actually a GCC Extension to do exactly this, but
then you wouldn’t be able to compile your favourite project using the
compiler provided by microsoft.

So, in order to keep cross-compiler support, I’ve decided to hack up some
experimental wrappers around gcc and cl.

Pybinext to the rescue

Pybinext is the utility which wraps around gcc and cl and basically
pre-processes the pre-processor. Using a simple regular expressions, we detect
the pattern which matches the binary constants, namely 0b[0-1]+. (The actual
regex query is a bit more complex, so it tries to avoid false positives as
much as possible.)

From there on, the script temporarily replaces the source file with an updated
source file which has been pre-processed to contain normal numbers rather than
the binary constants. The script then leaves the compiling upto the real
compiler, i.e., gcc or cl, and after compilation restores the original
source file.

That’s pretty much. There might still be some unhandled edge cases, but the
script correctly handles whenever a binary constant is in a string literal, on
a newline, etc.

Extending Pybinext

It’s fairly easy to extend pybinext in such a way to support more exotic
features, but I’ve yet to come up with one, so that’s for another blogpost
perhaps.

Proof of Concept

Assume we’ve got the following source file.

#include <stdio.h>
int main()
{
    printf("-> %d\n", 0b1110);
}
    

Running it under the microsoft compiler will normally give a series of errors,
as can be seen here:

$ cl main.c
main.c(7) : error C2059: syntax error : 'bad suffix on number'
main.c(7) : error C2146: syntax error : missing ')' before identifier 'b1110'
main.c(7) : error C2059: syntax error : ')'
    

However, running it under our pycl.py script, the result is a successful
compilation, as can be here:

$ ./pycl.py main.c
    

Same applies to the gcc script, namely pygcc.py. Now go and use Binary
Constants in your script!

Source

As always, the source of this simple utility can be found on github.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>