Map, Filter, Reduce…

Functions famous from functional programming are map, filter and reduce. In Python they’re not actively promoted because there are better alternatives, but I ran into them recently in a coding challenge. The challenge was basically: write a function that returns True if the sum of the ascii (or unicode) values of the input string is odd.

To get the ascii value of a character, we use the function ord (for “ordinal):

>>>print(ord('A'))
65

So my initial solution looked like this; it’s not exactly rocket science:

def that_is_odd(text):
    return sum(ord(c) for c in text) % 2 == 1

Pretty simple; take each character in the text in a list comprehension, calculate the ordinal, and take the sum. Take the remainder of dividing by two, and if that’s 1, the sum is odd.

The one “problem” with this solution is that in these contests a shorter solution is better. Luckily, white space and comments are not counted, but that still doesn’t reward clean code. So, I was off a considerable amount of characters from the “optimal” solution. Where to squeeze?

Well, first off, your input parameter doesn’t have to be text; it can simply be t. And the modulo function returns either 0 or 1, which if compared against what the outcome should be matches False or True (if the challenge engine tests 0==False or 1==True both results will pass). So we can have a shorter result:

def that_is_odd(t):
    return sum(ord(c) for c in t) % 2

But that wasn’t good enough! Still too long? What’s next? Well, I learned in a previous challenge that if your function can be written as a one-liner, you can replace the function definition with a lambda abomination:

that_is_odd = lambda t:sum(ord(c) for c in t) % 2

That’s a big step forward (in reducing character count; not in making a readable function!) and we’re almost there. I couldn’t figure out the last step and had to peek at the code of one of top 25 entries (who all had the same length):

that_is_odd = lambda t:sum(map(ord, t)) % 2

And that is as short as it gets.

What IS map?

Well, map does exactly what the list comprehension does; take all values from an interable, and apply a function to them. Why use it? Well, it’s there! Mainly because it was there in the early days of Python when list comprehensions didn’t exist.

In Python 2 there was still a good reason to use map; it was executed lazily, so when running it over an iterable with a million elements, memory use would be far less. But Python 3 treats list comprehensions as iterables as well, and not as lists.

For that reason, map and its cousins, filter and reduce are no longer preferred. Reduce is completely gone as a function in Python, and while map and filter are still there, both of them can be perfectly replaced by list comprehensions.

Never done a filter in a list comprehension? If we only want the ordinals for letters and nothing else, the calculation would look like this:

sum(ord(c) for c in text if c.isalpha())

Which is easily understandable if you’ve never seen it; unlike

sum(map(ord, filter(str.isalpha, text)))

Which is only understandable to a trained eye. And that’s why list comprehensions are preferred.

Advertisements

Another way of doing structs

Instead of using an “on-the-fly” lambda abomination, I already showed why I think creating a class is easier. But there’s another, even simpler way for C-style structs: the named tuple.

The downsides of the named tuple are that, unlike an “empty” object, you can’t add attributes to it later; and it requires an import. On the upside: a much shorter definition.

Here’s how it works:

# named tuple demo
from collections import namedtuple

# instead of a class definition: tadaa!
Geo = namedtuple('Geo', 'lat, lon')

# using the named tuple
my_town = Geo(47.6, -45.8)
print(my_town.lat, my_town.lon)
print(my_town)

You get the best of both worlds; only one line to define your “struct” and only one line to create it. Even better, it will behave like a tuple where needed.

Downsides: an extra line to import something (but it’s from a standard library, so no big deal), and, it being a tuple, you cannot alter the attributes. But in those cases where it doesn’t matter, it’s a great solution!

 

A tale of two mouses

The Microsoft Style Guide recommends that the plural form of the pointing device is mouses, to distinguish them from the rodents. With that out of the way; when I bought my desktop computer to play Kerbal Space Program without blowing my laptop apart (I have one dead laptop, likely because of the heat stress caused by the game) I got myself a nice mouse.

designer_mouse

Microsoft “Designer Bluetooth” mouse

And then I made the mistake of putting cheap batteries in it. Not of some fantasy brand, but Kodak branded. I didn’t they’d be bad, one needs batteries in camera equipment and Kodak is a reputable (albeit bankrupt) brand. And I was wrong, so wrong…

Because these batteries did something that surprised my: leaking while providing charge. Normally batteries leak when they’re empty and you keep pushing their internal chemical reaction, but not these. So when my mouse stopped working, I figured “empty battery” and popped out the battery door (a lovely magnetic construction on this mouse), only to be confronted with a stunning amount of white crust, that obviously had taken days, if not weeks, to build up.

I cleaned my mouse as much as possible, but it never worked as good and would require, from time to time, taking the batteries out and putting them back in (probably to remove oxidation from the contacts).

Finally, I had enough and replaced it with a cheap mouse, the Microsoft “Mobile 1850” that the local electronics store had on sale for $10

mobile_mouse_1850

Microsoft Mobile 1850 mouse

It’s not a bad mouse, it does the job, and it’s fairly comfortable. But it struggled with the woodgrain on my desk, something the Designer had never a problem with. Nothing a mousepad can’t fix, but I hate mousepads. Sometimes I want to have the mouse next to the keyboard, sometimes a little bit behind it, to stretch my arm; and now I have to move the mousepad with it.

So, after two days I gave in and got myself another Designer mouse (using Duracell batteries) and all is good now. And I have a spare wireless mouse!

Who knew?

I got my son a gaming PC for his birthday. I don’t have experience with PC’s that are supposed to look like a spaceship.

 

cyberpowerpcThe transparent side had protective plastic on both outside and inside. So I removed the four screws that hold it… only to be shocked by its weight. That’s not a piece of perspex that was bolted on the side. That’s a solid piece of glass! Who knew?

Easy Iterations

“I want to put rice grains on each square of a chess board. But not doubling them every square as in the legend of Sissa, but rather just a random amount between 1 and 100 on each square. We will identify the square by an (r, c) index”

Well, that sounds easy, right?

import random
from pprint import pprint

board = {}
for row in range(8):
    for col in range(8):
        board[(row, col)] = random.randint(1, 100)
pprint(board)

Aaaand we’re done for the night. Bye now!

Still here? Worried about the double-indent? Me too! Obviously it’s not a big deal in a case like this, but when you’re iterating over for levels and then you get two nested if-statements… too much!

First, let’s take a look at the home-brewn solution, which in many cases will be just fine, because it’s highly readable: rolling out your own iterator!

import random
from pprint import pprint

def chess_board_squares():
    for row in range(8):
        for col in range(8):
            yield row, col

board = {}
for row, col in chess_board_squares():
    board[(row, col)] = random.randint(1, 100)
pprint(board)

And that’s how iterators work! Instead of a return statement, you use a yield statement instead which doesn’t exit the function, but rather temporarily “jumps out,” only to continue when the next “for” is called.

This solution “works” because we have created a custom iterator function that precisely describes what it’s doing: giving you all the chess board squares. It’s easy to go overboard and write a generic multi-iterator function, but that would be counterproductive. Because that’s why there’s an itertools library!

import random
import itertools
from pprint import pprint

board = {}
for row, col in itertools.product(range(8), range(8)):
    board[(row, col)] = random.randint(1, 100)

pprint(board)

The product function returns a cartesian product between the listed iterators. If our chess board was ten rows deep and six columns wide, we’d call it like this:

for row, col in itertools.product(range(10), range(6)):

The “homegrown” solution is fine, because it’s slightly less cryptic than “product.” Just don’t build your own generic iterator for an arbitrary amount of iterable ranges; because someone already did that for you!

Python 2 or Python 3?

It shouldn’t even be a question at this point! I’ve been using Python 3 for quite a while now, and I’m very happy with it. Personally, P3 deals with the little quirks like integer division, and exposing iterators instead of lists when dealing with dictionaries and the range function (without having to use “ugly” code).

Most of the Python 3-advertising talks at PyCon that is being held this weekend lauded the fact that performance-wise it is so much better. Raymond Hettinger even boasted in one of his talks that 3.6 is probably the first Python 3 release that is better than Python 2.7!

But what does that mean, better?

Nearly everyone will tell you how much less the memory footprint is, and how much better the performance is. And that’s great when you manage enterprise applications or webservices; I don’t. I just use Python to make my daily life easier.

I’ve tried switching to Python 3 a couple of times before “it stuck.” Every time I switched I felt that it was important to be prepared for the future, and not be left having to switch over with tons of code waiting to be upgraded.

The first time was around Python 3.1. Bad mistake! By the time Python 3.4 was out things were much better though, and I’m a happy Python 3 camper by now. But it wasn’t performance that convinced me; it was libraries.

Because that’s the one thing that most Python 3 evangelists are not telling you. Practically nothing worked the first time I tried. The standard library, of course. But external libraries were quite a different story. Reportlab? No go. WxPython? No go. And so on.

Today is quite different as practically everything seems to be working with Python 3. If it’s not made for Python 3, it’s at least compatible with it.

If you’re still on 2.7, this might be a good moment to consider switching!

Don’t optimize yet—look for the bigger picture

The Problem

This will sound familiar, even when you don’t write code in Python: at one point, you have a need for a simple vehicle for multiple values. In Python, your go-to solution is of course the tuple:

values = (altitude, velocity, mass)

And in many cases that’s sufficient, although it leads to ugly downstream effects:

# calculate kinetic energy
kinetic = .5 * values[1] * values[1] * values[2]

Surely not the most readable code. Of course, we can store values in a dictionary:

values = dict(altitude=20000, velocity=500, mass=3000)
...
kinetic = .5 * values['mass'] * values['velocity']**2

And that is surely better, but… a dict is not exactly a lightweight object, and a bit of an overkill for these kind of cases. The Pythonic solution is of course, to have store these values as attributes in an object, and access them that way. Faster, and less memory use than a dict!

But I have tons of these cases and I don’t want to clutter my code with all kinds of ad-hoc class definitions to carry these values!

That in itself is a legit thought. And what follows is a classic case of “jumping in solution mode:”

How do I create an empty object on the fly, to assign attributes to?

The first attempt is to simply use the—literal—mother of all objects, object (and let’s try this out in interactive mode):

>>> values = object()
>>> values.altitude = 20000
Traceback (most recent call last):
File "", line 1, in
values.altitude = 20000
AttributeError: 'object' object has no attribute 'altitude'

What happened there? Well, object is indeed the mother of all objects. Remember, in Python everything is an object, even primitives like integers. Since everything is a subclass of object, giving it the ability to have attributes would give attributes to primitive values, and we certainly don’t want that.

But functions can have attributes, so that leads to a popular hack that does work:

>>> values = lambda:0
>>> values.altitude = 20000
>>> values.velocity = 500
>>> values.mass = 300
>>> print(values.altitude)
20000

Wonderful! Of course, nearly no one who hasn’t seen this hack will understand your code, because it’s an obscure trick. Use lambda to create a function on the fly (remember, CreateFunction would have been a better name for lambda), and what the function does is irrelevant, so it is made to return zero. Tadaa! And then we add attributes to our value object, which is really a function that just returns 0. But hey, it works.

Time to step back for a moment

The whole point was to create an object to store attributes in. We ended up with, let’s be honest here, a hack, that gives us such an object. At the price of readability of your code. And that is a very high price!

Why not just create a class for such an object? In other languages such a structure is called a Custom Type (Visual Basic) or simply Struct (C and C++). At the expense of one line of code, we can define a Struct class that does exactly the same as the lambda hack, but better, because the code reveals our intentions:

class Struct: pass
values = Struct()
values.altitude = 20000
values.velocity = 500
values.mass = 300
print(values.altitude)
>>>20000

Obviously, the class definition of Struct can go at the top of your code. But there you have it; at the expense of one extra line, we now have a reusable vehicle for ad-hoc data storage. A price well worth paying, one might say.

But wait, there’s more!

But let’s think this through for just one second. The use case for this is when we have more than one value to transfer, otherwise we’d just stuck that value into a variable, right? So, with the minimum of two values to transfer (and packing and unpacking that into a tuple is trivial, so it’s more likely to have at least three or more values) we need at least three lines of code to prepare our data: one line to create the object and two lines (or more) to assign the attributes.

Hmm.

What if, instead, we invested in two more lines of code into our Struct?

class Struct:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

While this may look like a hack, it’s fairly reasonable to expect a seasoned Python coder to understand this:

  • The init function takes a dictionary of keyworded arguments (** = keyword arguments packed in a dictionary)
  • It then updates the attribute list (represented by __dict__) with the keyword/value pairs inside the kwargs dict.

In other words, for two extra lines of code we can now initialize our Struct with keyworded values and get an immediate return-of-investment on those two extra lines in the definition of Struct:

values = Struct(altitude=20000, velocity=500, mass=300)
print(values.altitude)
>>>20000

Only one line of code is required to build our “object” instead of four; the investment of an extra two lines in the definition of Struct pays off immediately, and it’s a gift that keeps on giving! Less clutter, more clarity, and an efficient vehicle to transfer your data to another part of your code, just as Guido intended it to be. Truly Pythonic!