Python Source Obfuscation using ASTs
For one of the challenges of the Hack in the Box Capture the Flag
game last week, I decided to release an obfuscated and compiled Python class.
After doing some research on the internet about this particular topic, it
appeared there is no real up-to-date tool for this. I mostly found paid
software and/or software that has been outdated for several years, or at least
looks like it is.
Well, that’s great. This allows me to make something new-ish.
The Actual Challenge
The actual challenge was actually rather easy; given a teamname
and a flag on the commandline, the Python script would verify whether the flag
is correct or not. By correctly using a few prints here and there, the
challenge can be solved within minutes. Which is why we obfuscate it! As we
obviously want the teams playing our CTF to get some headaches
Abstract Syntax Trees
According to Wikipedia, an Abstract Syntax Tree is a tree representation of
the abstract syntactic structure of source code written in a programming
language. In other words, an AST represents the original source code as a
Fortunately for us, Python provides a built-in ast module which is able to
parse Python source into an AST (actually using the built-in compile()
method.) Besides that, the ast module gives us access to all available ast
nodes (e.g., Call, BinOp, etc.)
Finally, after rewriting the AST, we can do two things. We can generate a
compiled python object directly from the AST. Unfortunately I did not find a
way to do this (or a library, for that matter), if you know of one, feel free
to point me to it The other option is to generate Python code from the AST
again and to compile it from there. For this step, one would use the
codegen.py module. (Note that I submitted a pull request,
as the current version gave me an error with regards to the omission of
parentheses for binary operations.)
We’re now at the point where we can parse Python code into an AST, rewrite it,
and write a new Python source from the rewritten AST. The final step for my
challenge was to compile the created Python source into an object, which can
be done by executing the following command on the commandline. (I’m sure it’s
also possible using a Python function, but this works just fine for the
$ python -mcompileall .
Obfuscation through ASTs
So basically I only did a few simple obfuscations, which already proved to be
painful enough, but it’s a nice start for anyone that’s looking into doing
The ast.NodeVisitor class, which was mentioned earlier in this blogpost,
allows one to visit each AST node in the tree, with the possibility to
modify them or to delete them. We can do this by implementing visit_
functions. For example, in order to analyzer/modify/delete certain Name
nodes, which are used for variable lookups etc, we implement a visit_Name
function in our obfuscation class (which, btw, extends ast.NodeVisitor.)
Modifying AST nodes using the NodeTransformer
The NodeTransformer can modify an existing AST in a fairly simple way. By
returning the original node, the AST remains untouched, as is showed in the
following example code.
from ast import NodeTransformer class Example01(NodeTransformer): def visit_Str(self, node): return node
One can modify an AST node by return a new node. For example, to replace all
strings with an empty string, see the following snippet.
from ast import NodeTransformer class Example02(NodeTransformer): def visit_Str(self, node): return Str(s='')
And, finally, to delete a node, simply return None in the visit_ function,
although this can give weird situations in which the new AST is not valid
Example Obfuscation – Strings
In the AST, constant String nodes are represented with Str nodes. These
Str nodes have one interesting field, namely the s field, which contains
the actual string. For example, in the AST of the following Python source,
there will be exactly one Str node with the s field set to “Hello AST”.
print 'Hello AST'
For the challenge, I implemented a handful simple string obfuscations. Take
for example the following code (rewritten a bit, but similar to the code in
from ast import NodeTransformer, BinOp, Str, Add class StringObfuscator(NodeTransformer): def visit_Str(self, node): return BinOp(left=Str(s=node.s[:len(node.s)/2]), op=Add(), right=Str(s=node.s[len(node.s)/2:])),
Noteworthy in this example code is that BinOp is a node representing a
binary operation, in this case addition (because of the Add node.) A binary
operation takes a left operand and a right one. On the left we put the first
half of the actual string, and on the right we put the second half of the
string. When running this “obfuscator” on our example once, we get the
following code. (Note that you can run such obfuscator multiple times to
achieve extra painful code. This is what I did for the challenge :p)
print ('Hell' + 'o AST')
Other string obfuscations included reversing a string, i.e., “abc” ->
“cba”[::-1], and converting single-length strings (which you’ll get soon
enough when recursively running the obfuscator a few times) into a chr()
statement (i.e., “a” -> chr(0×61).)
The Obfuscated Challenge
After running the original challenge a few times through
the obfuscator, which, in addition to obfuscating strings, also
obfuscates integers, import statements, and global variable names, we get
our actual challenge.
And, yes, running the obfuscator several times does indeed look like the
$ python hitbctfobf.py hitbctforig.py|python hitbctfobf.py -|...
Having pasted the original challenge in the blogpost, there’s not much left of
the challenge itself. However, I found the methods behind the obfuscation
fairly interesting, and perhaps so does somebody else..