Why Optimising Python is Hard (1)

Among proposed optimisations for Python, we find, for instance, the notion of replacing a call like len('abc') by its value (3 in this case). On the one hand, implementing such optimisations makes sense only if the respective instances are executed very often in a program. Otherwise, the gain is just way too small to bother. This typically means that the len('abc') has to be inside a loop. On the other hand, we can only replace len('abc') if we are absolutely certain, that the result is always going to be 3. Otherwise, the transformation would not be correct, and you wouldn’t want a compiler to introduce bugs into your program.

So, before computing the result of len('abc') ahead of time, and replacing the call in the program code by its result, we must make sure that the programmer has not tinkered with len, and we are indeed calling the correct function. How hard can that be, anyway?

In this article, I will explore this question: how hard can it be to statically decide what a given name in your Python program code stands for. With statically I mean that I want to decide it without having to run the program, just by looking at the code.

Before we do any optimisation on calls to len() in our compiler, what possibilities are there to replace the built-in len() function by something else? (And, for the sake of argument, never mind the question if it makes any sense to do so).

The obvious way to replace len() by a custom function is, of course, to just define a function called len. A compiler can detect that easily enough. But defining a function is, in Python, a statement like any other. After all, from Python’s point of view, it is not all that different from any other assignment, just differently written. Hence, functions can be defined anywhere in your code, even in a loop, as in the example below. This means that one and the same len('abc')-expression might return different results, since the function len itself might change.

while len('abc') == 3:
    def len(x):
        return 1

Using modern control- and data flow analysis, it is pretty easy for a compiler to detect that the len in the first line refers to different functions. But because Python allows us to override even basic builtin functions at any time, there is no simple optimisation possible here (you can, of course, try to unroll the loop, or perform even more elaborate techniques. But whatever you do, it will be more complicated that just replacing the expression len('abc') by the result of the call).

It might quickly get worse, though (at least as seen from the perspective of our compiler). Because of Python’s exec function, we can redefine the len()-function in a way that is much more difficult, nay, often even outright impossible, to detect for the compiler. As programmers, all we have to do is:

exec("def len(x): return 1")

Sure, that does not look so bad. So let us slightly step up the game.

exec("def %s(x): return 1" % "len")

Instead of using this old “printf”-style formatting, there are other options available to do the same, e. g., "def {}(x): return 1".format("len").

But why would you want to define a function with exec, anyway? There are good reasons why you might want to use exec for defining a function. Let us consider a “real-world” example, where exec is used to make the methods of an object available as functions.

The built-in function dir() lets you inspect an object, and returns a list of all the object’s fields. Throwing away private and protected fields (those that start with an underscore), and taking only callable fields (i. e. methods in this context), we define a new function for each method in our object. All these functions do is calling the respective method in the object. For our example, we have chosen the object to be a, and the two methods that get exposed are inc, and dec.

# We start by defining a class and crating an instance `a`.
class A:
    delta = 1

    def dec(self, x):
        return x - self.delta

    def inc(self, x):
        return x + self.delta

a = A()

# Here, we go through all fields of the instance/object `a`, and
# if it is a public method, we define a function of the same name.
for name in dir(a):
    if not name.startswith('_') and callable(getattr(a, name)):
        exec(f"def {name}(*args): return a.{name}(*args)")

# Afterwards, both `inc` and `dec` exist as simple function
print(inc(9), dec(9))

You will find that this technique is indeed used by the turtle module in Python’s standard library (although it is slightly more involved, as it also copies the paramters and doc-strings from the methods to the functions). When you call a function like forward(100), say, of the turtle module, this function will actually call the forward() method on the default turtle object.

The primary advantage of creating the required functions dynamically with exec, as opposed to writing a function for each method, is that any change to the interface of the class is immediately carried over to the functions, keeping both always in perfect sync.

The disadvantage of using such a dynamic feature is that compilers, and other code analysers have a hard time figuring out that functions like inc() and dec() really do exist. Indeed, when I import the turtle module, PyCharm tells me that there is no function named forward(), even though the program then runs just fine.

Wrapping up, optimising a Python program by executing function calls (with known/constant arguments) ahead of time is not quite so simple, because the compiler might have a very hard time figuring out exactly which functions there are, and what they really stand for. Since, obviously, exec seems to be the problem here, the compiler could, of course, just scan the Python code for any occurence of exec, and only do a proper analysis and optimisations, if no exec is used. Compilers that translate Python to machine code usually impose such restriction on their code (leading to Restricted Python, or RPython in the case of PyPy, for instance).

Unfortunately, this is merely the beginning of the story, and there are many more subtle ways to redefine len() in a program.

If you are interested in how exactly we might want to implement an optimisation such as replacing len('abc') by 3 in your programs, and also check for any exec, check out my article on Implementing Code Transformations in Python.

Published by Tobias Kohn