Why Optimising Python is Hard (2): Messing with Namespaces

We are on a quest to optimise Python programs. More specifically, we want to replace instances where builtin functions are called with known arguments, such as len('abc'), by the respective result (which would be 3 in this case). In order to succeed, we must make sure that the name len really refers to the built-in function we are thinking of, and has not been redefined by the programmer.

It has become apparent that any use of the exec function makes it hard, or even impossible, for the compiler or analyser to predict which names are defined in a Python program, and what they stand for. Not all is lost, though. Many programs do not use exec, and then, it seems, we can easily analyse the Python code, and make sure len, say, refers to the correct function.

But wait, there are a few more possibilities to redefine len than just using def len(): ..., or exec. Today, we are looking at namespaces, and how we can define new global, or even builtin, functions. The goal is, of course, to make life really hard for the compiler.


Namespaces in Python

Python uses dictionaries to implement namespaces; that is it uses dictionaries to hold a value for each currently defined name. That is to say: when a function is being executed, it has a dictionary of all local variables of that function. A module, on the other hand, has a dictionary of all global variables. You can access both of these dictionaries in Python through the functions locals(), and globals(), respectively. If we throw in a function dump_dict() that nicely prints a dictionary’s contents to the screen, and pick a random function (norm(x, y) in our case) with local variables, we might end up with a small program such as this:

def dump_dict(name, dictionary):
    print(name)
    for key in dictionary:
        if not key.startswith('_'):
            print(f" - {key + ':':<6} {dictionary[key]}")

def norm(x, y):
    from math import sqrt
    z = sqrt(x**2 + y**2)
    dump_dict("Locals", locals())
    dump_dict("Globals", globals())
    print("\nNorm:", z)

x = 3.5
norm(x-0.5, x+0.5)

The output neatly presents us with the local, and global variables or our toy program. Note how both the local, and the global namespace each have their own version of a variable called x, and how the import-statement loads the name sqrt into the local namespace.

Locals
 - z:     5.0
 - sqrt:  <built-in function sqrt>
 - y:     4
 - x:     3
Globals
 - dump_dict: <function dump_dict at 0x7fd701>
 - norm:  <function norm at 0x7fd702>
 - x:     3.5

Norm: 5.0

While these two dictionaries account for all the names we have defined, or loaded somewhere in the program, it is not the whole story. There is one more “global” name we have used: print. In fact, print lives in a “super-global” dictionary called the builtins. Of course, we can access, and print that dictionary as well (isn’t Python just great). So let us modify the above norm-function to complete the picture:

def norm(x, y):
    from math import sqrt
    z = sqrt(x**2 + y**2)
    dump_dict("Locals", locals())
    dump_dict("Globals", globals())
    dump_dict("Builtins", __builtins__.__dict__)
    print("\nNorm:", z)

What about math? Shouldn’t that be a global name as well? Actually no. Even though it looks like a name, it is more akin a string literal. Even if you define a variable called math somewhere in your program, it will have no effect on the import statement.

Modifying the Globals

In Python, dictionaries are mutable data structures, which is to say that you can modify them. Hence, since we have access to the global namespace of a Python module in form of the globals() dictionary, we should be able to add variables on our own, or change the value of other variables. This is indeed possible:

x = 5

globals()['sqr'] = lambda x: x**2
globals()['x'] += 7

print(sqr(x))

In line 3, we define a new global function sqr (which, surprisingly, returns the square of a number), whereas in line 4, we modify the value of the global variable x. Your IDE will probably not be happy about this and complain that the name sqr in line 6 is undefined. But Python will run the program happily, and without any issues.

You can do the same from within a function, of course, and we choose to allow a caller to provide a name for the square-function:

x = 5

def define(name):
    globals()[name] = lambda x: x**2
    globals()['x'] += 1

define('sqr')
print(sqr(x))

Well, after our IDE has already been complaining about the missing function sqr in the first version, it will most definitely choke on the second one, where it has virtually no chance of figuring out which names have been defined anymore. And there is nothing stopping us from doing something silly like define('len') (and thereby, once again, mess up the compiler’s assumption of knowing what len() means).

And, just for the fun of it, here is yet another variant:

globals().update({ 'sqr': lambda x: x**2 })

The same technique, by the way, does not work for the locals. We will have to come back to the details of that another time. For now, let it suffice to say that, in constrast to globals(), the locals() function returns a copy of the local variable dictionary. You can modify it without problems, but it has no effect on the original.

Modifying the Builtins

To all intents and purposes, modifying the builtins namespace is usually a very bad idea, and an evil thing to do (which, of course, makes it just the more fun to tinker around with it). That being said, we also probably should not use the private name __builtins__ to do our magic (the private name __builtins__ is not guaranteed to be accessible, or exist in the first place). A better way is to properly import the module as builtins, which, incidently, is exactly the same object as __builtins__. Hence, the following always returns True:

import builtins
print(builtins is __builtins__)

The same thing we have said about the globals dictionary above equally holds for the builtins. In other words, it is equally simple to redefine something like the len() function here:

import builtins
builtins.__dict__['len'] = lambda x: 1
print(len('abc'))

Beware, though: while putting a new len function into the global namespace hides the builtin version (which is still around, though), tinkering around with the builtin namespace completely replaces the len function. The old version is irretrievably gone (until you restart your Python interpreter, that is).

Since builtins itself is not a dictionary but a module, you can modify the len function with an even simpler syntax:

import builtins
builtins.len = lambda x: 1

In the end, if we want our compiler to be able to figure out what a name like len really refers to, we have to do a lot more than just scan the program for ordinary function definitions, and exec functions. By accessing the globals() dictionary, or even the builtins directly, a Python program can mess with any name inside the Python interpreter, and make our compiler’s job almost impossible. Just think of all the possibilities the techniques here open up: you can pass the globals- and builtin-dictionary around, delete, modify, or add any name at will, and even get the names of the modified functions from external sources not accessible to the compiler.