Don’t optimize yet—look for the bigger picture

The Problem

This will sound familiar, even when you don’t write code in Python: at one point, you have a need for a simple vehicle for multiple values. In Python, your go-to solution is of course the tuple:

values = (altitude, velocity, mass)

And in many cases that’s sufficient, although it leads to ugly downstream effects:

# calculate kinetic energy
kinetic = .5 * values[1] * values[1] * values[2]

Surely not the most readable code. Of course, we can store values in a dictionary:

values = dict(altitude=20000, velocity=500, mass=3000)
...
kinetic = .5 * values['mass'] * values['velocity']**2

And that is surely better, but… a dict is not exactly a lightweight object, and a bit of an overkill for these kind of cases. The Pythonic solution is of course, to have store these values as attributes in an object, and access them that way. Faster, and less memory use than a dict!

But I have tons of these cases and I don’t want to clutter my code with all kinds of ad-hoc class definitions to carry these values!

That in itself is a legit thought. And what follows is a classic case of “jumping in solution mode:”

How do I create an empty object on the fly, to assign attributes to?

The first attempt is to simply use the—literal—mother of all objects, object (and let’s try this out in interactive mode):

>>> values = object()
>>> values.altitude = 20000
Traceback (most recent call last):
File "", line 1, in
values.altitude = 20000
AttributeError: 'object' object has no attribute 'altitude'

What happened there? Well, object is indeed the mother of all objects. Remember, in Python everything is an object, even primitives like integers. Since everything is a subclass of object, giving it the ability to have attributes would give attributes to primitive values, and we certainly don’t want that.

But functions can have attributes, so that leads to a popular hack that does work:

>>> values = lambda:0
>>> values.altitude = 20000
>>> values.velocity = 500
>>> values.mass = 300
>>> print(values.altitude)
20000

Wonderful! Of course, nearly no one who hasn’t seen this hack will understand your code, because it’s an obscure trick. Use lambda to create a function on the fly (remember, CreateFunction would have been a better name for lambda), and what the function does is irrelevant, so it is made to return zero. Tadaa! And then we add attributes to our value object, which is really a function that just returns 0. But hey, it works.

Time to step back for a moment

The whole point was to create an object to store attributes in. We ended up with, let’s be honest here, a hack, that gives us such an object. At the price of readability of your code. And that is a very high price!

Why not just create a class for such an object? In other languages such a structure is called a Custom Type (Visual Basic) or simply Struct (C and C++). At the expense of one line of code, we can define a Struct class that does exactly the same as the lambda hack, but better, because the code reveals our intentions:

class Struct: pass
values = Struct()
values.altitude = 20000
values.velocity = 500
values.mass = 300
print(values.altitude)
>>>20000

Obviously, the class definition of Struct can go at the top of your code. But there you have it; at the expense of one extra line, we now have a reusable vehicle for ad-hoc data storage. A price well worth paying, one might say.

But wait, there’s more!

But let’s think this through for just one second. The use case for this is when we have more than one value to transfer, otherwise we’d just stuck that value into a variable, right? So, with the minimum of two values to transfer (and packing and unpacking that into a tuple is trivial, so it’s more likely to have at least three or more values) we need at least three lines of code to prepare our data: one line to create the object and two lines (or more) to assign the attributes.

Hmm.

What if, instead, we invested in two more lines of code into our Struct?

class Struct:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

While this may look like a hack, it’s fairly reasonable to expect a seasoned Python coder to understand this:

  • The init function takes a dictionary of keyworded arguments (** = keyword arguments packed in a dictionary)
  • It then updates the attribute list (represented by __dict__) with the keyword/value pairs inside the kwargs dict.

In other words, for two extra lines of code we can now initialize our Struct with keyworded values and get an immediate return-of-investment on those two extra lines in the definition of Struct:

values = Struct(altitude=20000, velocity=500, mass=300)
print(values.altitude)
>>>20000

Only one line of code is required to build our “object” instead of four; the investment of an extra two lines in the definition of Struct pays off immediately, and it’s a gift that keeps on giving! Less clutter, more clarity, and an efficient vehicle to transfer your data to another part of your code, just as Guido intended it to be. Truly Pythonic!

Advertisements

One Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s