Another challenge

Another dreaded reverse challenge! I don’t like them but this one was quite easy. The challenge was to write a function called imagine that takes 4 arguments (a, b, c, d) and returns two values.

Input:
a: 5
b: 0
c: 3
d: 0
Expected Output:
[15, 0]
As always there were ten test cases, and the way the answers came out, together with the name of the function, led me to think this was regarding complex numbers. Would it be as simple as interpreting the input as two complex numbers, and return the product as output? It wouldn’t be that simple, wouldn’t it?
def imagine(a, b, c, d):
    x = complex(a, b)
    y = complex(c, d)
    r = x * y
    return r.real, r.imag
Why yes! That’s it. Problem solved! Except of course, for the #### challenge in codefights that the code is the shortest amount of (non whitespace) characters possible. This solution, with 72 characters, was nearly twice as long as the 39 characters for the shortest Python solution. My first thought was to introduce some abbreviations and speed things up:
def imagine(a, b, c, d):
    f = complex
    r = f(a, b) * f(c, d)
    return r.real, r.imag
 Well, that reduces a full six characters. Then I realized: complex products are basically matrix multiplication, and not exactly rocket science. Why bother calling the complex function if you can operate straight on the real and imaginary components?
def imagine(a, b, c, d):
    return a*c - b*d, a*d + b*c
This does exactly the same, and is only two characters longer than the best entry (39). The observant student will notice that the result is calculated in a single line though. And that means that we can use a lambda abomination to get even shorter (and harder to read) code:
imagine = lambda a,b,c,d:(a*c - b*d, a*d + b*c)
The parens are required because the lambda function takes multiple arguments. So, at the price of destroying the little piece of readability that was left, we got at what is the shortest entry for Python code: 39 characters.
I still think my first attempt was much, much better code.

Map, Filter, Reduce…

Functions famous from functional programming are map, filter and reduce. In Python they’re not actively promoted because there are better alternatives, but I ran into them recently in a coding challenge. The challenge was basically: write a function that returns True if the sum of the ascii (or unicode) values of the input string is odd.

To get the ascii value of a character, we use the function ord (for “ordinal):

>>>print(ord('A'))
65

So my initial solution looked like this; it’s not exactly rocket science:

def that_is_odd(text):
    return sum(ord(c) for c in text) % 2 == 1

Pretty simple; take each character in the text in a list comprehension, calculate the ordinal, and take the sum. Take the remainder of dividing by two, and if that’s 1, the sum is odd.

The one “problem” with this solution is that in these contests a shorter solution is better. Luckily, white space and comments are not counted, but that still doesn’t reward clean code. So, I was off a considerable amount of characters from the “optimal” solution. Where to squeeze?

Well, first off, your input parameter doesn’t have to be text; it can simply be t. And the modulo function returns either 0 or 1, which if compared against what the outcome should be matches False or True (if the challenge engine tests 0==False or 1==True both results will pass). So we can have a shorter result:

def that_is_odd(t):
    return sum(ord(c) for c in t) % 2

But that wasn’t good enough! Still too long? What’s next? Well, I learned in a previous challenge that if your function can be written as a one-liner, you can replace the function definition with a lambda abomination:

that_is_odd = lambda t:sum(ord(c) for c in t) % 2

That’s a big step forward (in reducing character count; not in making a readable function!) and we’re almost there. I couldn’t figure out the last step and had to peek at the code of one of top 25 entries (who all had the same length):

that_is_odd = lambda t:sum(map(ord, t)) % 2

And that is as short as it gets.

What IS map?

Well, map does exactly what the list comprehension does; take all values from an interable, and apply a function to them. Why use it? Well, it’s there! Mainly because it was there in the early days of Python when list comprehensions didn’t exist.

In Python 2 there was still a good reason to use map; it was executed lazily, so when running it over an iterable with a million elements, memory use would be far less. But Python 3 treats list comprehensions as iterables as well, and not as lists.

For that reason, map and its cousins, filter and reduce are no longer preferred. Reduce is completely gone as a function in Python, and while map and filter are still there, both of them can be perfectly replaced by list comprehensions.

Never done a filter in a list comprehension? If we only want the ordinals for letters and nothing else, the calculation would look like this:

sum(ord(c) for c in text if c.isalpha())

Which is easily understandable if you’ve never seen it; unlike

sum(map(ord, filter(str.isalpha, text)))

Which is only understandable to a trained eye. And that’s why list comprehensions are preferred.

Another way of doing structs

Instead of using an “on-the-fly” lambda abomination, I already showed why I think creating a class is easier. But there’s another, even simpler way for C-style structs: the named tuple.

The downsides of the named tuple are that, unlike an “empty” object, you can’t add attributes to it later; and it requires an import. On the upside: a much shorter definition.

Here’s how it works:

# named tuple demo
from collections import namedtuple

# instead of a class definition: tadaa!
Geo = namedtuple('Geo', 'lat, lon')

# using the named tuple
my_town = Geo(47.6, -45.8)
print(my_town.lat, my_town.lon)
print(my_town)

You get the best of both worlds; only one line to define your “struct” and only one line to create it. Even better, it will behave like a tuple where needed.

Downsides: an extra line to import something (but it’s from a standard library, so no big deal), and, it being a tuple, you cannot alter the attributes. But in those cases where it doesn’t matter, it’s a great solution!

 

Easy Iterations

“I want to put rice grains on each square of a chess board. But not doubling them every square as in the legend of Sissa, but rather just a random amount between 1 and 100 on each square. We will identify the square by an (r, c) index”

Well, that sounds easy, right?

import random
from pprint import pprint

board = {}
for row in range(8):
    for col in range(8):
        board[(row, col)] = random.randint(1, 100)
pprint(board)

Aaaand we’re done for the night. Bye now!

Still here? Worried about the double-indent? Me too! Obviously it’s not a big deal in a case like this, but when you’re iterating over for levels and then you get two nested if-statements… too much!

First, let’s take a look at the home-brewn solution, which in many cases will be just fine, because it’s highly readable: rolling out your own iterator!

import random
from pprint import pprint

def chess_board_squares():
    for row in range(8):
        for col in range(8):
            yield row, col

board = {}
for row, col in chess_board_squares():
    board[(row, col)] = random.randint(1, 100)
pprint(board)

And that’s how iterators work! Instead of a return statement, you use a yield statement instead which doesn’t exit the function, but rather temporarily “jumps out,” only to continue when the next “for” is called.

This solution “works” because we have created a custom iterator function that precisely describes what it’s doing: giving you all the chess board squares. It’s easy to go overboard and write a generic multi-iterator function, but that would be counterproductive. Because that’s why there’s an itertools library!

import random
import itertools
from pprint import pprint

board = {}
for row, col in itertools.product(range(8), range(8)):
    board[(row, col)] = random.randint(1, 100)

pprint(board)

The product function returns a cartesian product between the listed iterators. If our chess board was ten rows deep and six columns wide, we’d call it like this:

for row, col in itertools.product(range(10), range(6)):

The “homegrown” solution is fine, because it’s slightly less cryptic than “product.” Just don’t build your own generic iterator for an arbitrary amount of iterable ranges; because someone already did that for you!

Python 2 or Python 3?

It shouldn’t even be a question at this point! I’ve been using Python 3 for quite a while now, and I’m very happy with it. Personally, P3 deals with the little quirks like integer division, and exposing iterators instead of lists when dealing with dictionaries and the range function (without having to use “ugly” code).

Most of the Python 3-advertising talks at PyCon that is being held this weekend lauded the fact that performance-wise it is so much better. Raymond Hettinger even boasted in one of his talks that 3.6 is probably the first Python 3 release that is better than Python 2.7!

But what does that mean, better?

Nearly everyone will tell you how much less the memory footprint is, and how much better the performance is. And that’s great when you manage enterprise applications or webservices; I don’t. I just use Python to make my daily life easier.

I’ve tried switching to Python 3 a couple of times before “it stuck.” Every time I switched I felt that it was important to be prepared for the future, and not be left having to switch over with tons of code waiting to be upgraded.

The first time was around Python 3.1. Bad mistake! By the time Python 3.4 was out things were much better though, and I’m a happy Python 3 camper by now. But it wasn’t performance that convinced me; it was libraries.

Because that’s the one thing that most Python 3 evangelists are not telling you. Practically nothing worked the first time I tried. The standard library, of course. But external libraries were quite a different story. Reportlab? No go. WxPython? No go. And so on.

Today is quite different as practically everything seems to be working with Python 3. If it’s not made for Python 3, it’s at least compatible with it.

If you’re still on 2.7, this might be a good moment to consider switching!

Don’t optimize yet—look for the bigger picture

The Problem

This will sound familiar, even when you don’t write code in Python: at one point, you have a need for a simple vehicle for multiple values. In Python, your go-to solution is of course the tuple:

values = (altitude, velocity, mass)

And in many cases that’s sufficient, although it leads to ugly downstream effects:

# calculate kinetic energy
kinetic = .5 * values[1] * values[1] * values[2]

Surely not the most readable code. Of course, we can store values in a dictionary:

values = dict(altitude=20000, velocity=500, mass=3000)
...
kinetic = .5 * values['mass'] * values['velocity']**2

And that is surely better, but… a dict is not exactly a lightweight object, and a bit of an overkill for these kind of cases. The Pythonic solution is of course, to have store these values as attributes in an object, and access them that way. Faster, and less memory use than a dict!

But I have tons of these cases and I don’t want to clutter my code with all kinds of ad-hoc class definitions to carry these values!

That in itself is a legit thought. And what follows is a classic case of “jumping in solution mode:”

How do I create an empty object on the fly, to assign attributes to?

The first attempt is to simply use the—literal—mother of all objects, object (and let’s try this out in interactive mode):

>>> values = object()
>>> values.altitude = 20000
Traceback (most recent call last):
File "", line 1, in
values.altitude = 20000
AttributeError: 'object' object has no attribute 'altitude'

What happened there? Well, object is indeed the mother of all objects. Remember, in Python everything is an object, even primitives like integers. Since everything is a subclass of object, giving it the ability to have attributes would give attributes to primitive values, and we certainly don’t want that.

But functions can have attributes, so that leads to a popular hack that does work:

>>> values = lambda:0
>>> values.altitude = 20000
>>> values.velocity = 500
>>> values.mass = 300
>>> print(values.altitude)
20000

Wonderful! Of course, nearly no one who hasn’t seen this hack will understand your code, because it’s an obscure trick. Use lambda to create a function on the fly (remember, CreateFunction would have been a better name for lambda), and what the function does is irrelevant, so it is made to return zero. Tadaa! And then we add attributes to our value object, which is really a function that just returns 0. But hey, it works.

Time to step back for a moment

The whole point was to create an object to store attributes in. We ended up with, let’s be honest here, a hack, that gives us such an object. At the price of readability of your code. And that is a very high price!

Why not just create a class for such an object? In other languages such a structure is called a Custom Type (Visual Basic) or simply Struct (C and C++). At the expense of one line of code, we can define a Struct class that does exactly the same as the lambda hack, but better, because the code reveals our intentions:

class Struct: pass
values = Struct()
values.altitude = 20000
values.velocity = 500
values.mass = 300
print(values.altitude)
>>>20000

Obviously, the class definition of Struct can go at the top of your code. But there you have it; at the expense of one extra line, we now have a reusable vehicle for ad-hoc data storage. A price well worth paying, one might say.

But wait, there’s more!

But let’s think this through for just one second. The use case for this is when we have more than one value to transfer, otherwise we’d just stuck that value into a variable, right? So, with the minimum of two values to transfer (and packing and unpacking that into a tuple is trivial, so it’s more likely to have at least three or more values) we need at least three lines of code to prepare our data: one line to create the object and two lines (or more) to assign the attributes.

Hmm.

What if, instead, we invested in two more lines of code into our Struct?

class Struct:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

While this may look like a hack, it’s fairly reasonable to expect a seasoned Python coder to understand this:

  • The init function takes a dictionary of keyworded arguments (** = keyword arguments packed in a dictionary)
  • It then updates the attribute list (represented by __dict__) with the keyword/value pairs inside the kwargs dict.

In other words, for two extra lines of code we can now initialize our Struct with keyworded values and get an immediate return-of-investment on those two extra lines in the definition of Struct:

values = Struct(altitude=20000, velocity=500, mass=300)
print(values.altitude)
>>>20000

Only one line of code is required to build our “object” instead of four; the investment of an extra two lines in the definition of Struct pays off immediately, and it’s a gift that keeps on giving! Less clutter, more clarity, and an efficient vehicle to transfer your data to another part of your code, just as Guido intended it to be. Truly Pythonic!