bor.borygmus

A programming weblog by Hao Lian. • A long walk through an angry forest. • A series of memory leaks brought on by senility.

It’s time now to write about two of my favorite pieces of software ever written: Georg Brandl’s Sphinx and Pylons’ Pylons. Normally, these get along like a house on concrete, which is to say very well. A Pylons web application is a well-behaved, standard Python package that pays all the taxes: a setup.py, no module magic like pre-modern Django, and imports as expected. However, if you have ever used the highly useful Sphinx autodoc extension and have pointed it at a module in your Pylons app, you may have run into a traceback that looks like this:

[snip]
File "[snip]/autodoc.py", line 1019, in can_document_member
  return (isdescriptor(member) and not
File "[snip]/autodoc.py", line 203, in isdescriptor
  if hasattr(getattr(x, item, None), '__call__'):
File "[snip]/paste/registry.py", line 137, in __getattr__
  return getattr(self._current_obj(), attr)
File "[snip]/paste/registry.py", line 194, in _current_obj
  'thread' % self.____name__)
TypeError: No object (name: url) has been registered for this thread

Two questions should emerge immediately: What is paste.registry? And why is it throwing a traceback?

The first question is easy to answer enough. We can read the documentation for paste.registry, but I feel story-ish, so let’s tell a story: If you’ve ever programmed an Web application in Python completely by yourself, either in CGI or raw WSGI—the latter being something I recommend everybody do at least once or nine times—you will run into The Temptation of Globals. Got a variable that you don’t want to send as a cookie? Store it as a global.

HTTP cookies are notoriously tricky anywhere and even in Python with the cookie library. For one, the SimpleCookie interface provides a raw expiration field, and HTTP cookie expiration dates are boring and frustrating to figure out. The bigger problem is that SimpleCookie doesn’t act enough like a dictionary with the Cookie/Morsel distinction, and it’s bound to trip you up. No wonder HTTP people get that crazy gleam in their eyes when you say the word “stateless”. Fortunately, there’s the Beaker middleware these days.

With simple testing servers, you won’t run into any problems. Globals work as expected. You begin to redden at the cheeks. “This is fun,” you say. You tell all your friends about Python. You carve the benevolent dictator’s name into the “magic method” area of your body. But the problem is that these globals clash with threads. As soon as you start using mod_wsgi, you can count on bizarre errors and missing tufts of hair. Multiple requests will hit different threads, and each one sees different globals and begin stepping on each others’ toes. You quickly realize that all state in a Python application must either be created when new requests are made or persisted out to a database, the filesystem, or cookies. Facing this, glumly, you begin implementing everything in classes, which are the perfect vehicles for transmitting common state, what with inheritance and all that hullabaloo. No importing globals conveniently into all your modules for you, mister.

Paste Registry neatly solves this by implemented a stacked thread-safe globals system: once a request finishes, he (or she) pops off his globals and restores everything back to normal. For this to work, you must replace your objects with proxies. These are empty hollow containers that must be filled with object goodness once a request hits a thread. You do this by registering an object with a global proxy, thus pushing it onto the stack. The middleware then takes care of popping it off when needed, shuffling it into the backroom, and telling it to sternly wait like a huffy matron. What’s ingenious about this is that it’s still the same-old system where you create objects are request time. Except now you can import globals into modules and they’ll be there, always working, always smiling at you even if that smile is a little bit off. And cleverly sinister.

Now then: the more interesting question. Why does the traceback occur? Reading closely, we can deduce that the getattr call in sphinx.ext.autodoc is raising an Exception in Paste Registry’s proxy object. We can crack open that file, and we suddenly realize this: Proxy objects in Python aren’t all that easy to do. It’s still a lot of overriding methods, most whose names are buffeted by fancy double underscores. And unless a Paste Registry object is registered with a request object, these proxy objects are invalid when called upon because the overriden methods have nowhere real to point to without a registered object. Naturally, they throw an Exception. And then we quickly realize that Sphinx’s autodoc has no way of registering a global. It’s not there to call the WSGI application, it just wants the documentation strings. Here, all isdescriptor quietly wants is to know whether or not a certain attribute should have the descriptor metadata marked in the Sphinx output. Poor isdescriptor calls getattr, assuming that all objects have a valid __getattr__ method, which is fine and dandy until you meet a proxy object like Paste Registry’s.

Now we put our minds together and wrack our brains. We pace up and down the floor, irritably answering the children’s demands for “lunch” and “dinner”. How do we change isdescriptor so that it recognizes Paste Registry? The answer, of course, is monkey-patching!

def isdescriptor(x):
    """Check if the object is some kind of descriptor."""

    for item in '__get__', '__set__', '__delete__':
        try:
            if hasattr(getattr(x, item, None), '__call__'):
                return True
        except Exception:
            return False
    return False


# MONKEY-PATCH!
import sphinx.ext.autodoc as autodoc
autodoc.isdescriptor = isdescriptor

A nice try-catch loop not only fixes our problems but maintains the semantics of isdescriptor: If the mere act of getting an attribute raises an Exception, it’s probably not a descriptor. Fortunately, this problem has been solved in development. (As of writing, the release I’m using is 0.6.1.) The getattr call has now been replaced with safe_getattr, which does indeed implement a better version of our try-catch monkey-patch.

Hooray!

Epilogue.

You may have noticed a long time ago, when you were reading the last few paragraphs over one second ago, that we didn’t mention where to monkey-patch. It turns out the perfect place to do this is the conf.py configuration file that every Sphinx-created directory has, which brings me to my final point. Please use Python as your configuration language. The single thing that saddens me most about Paste is the use of INI files. Though INI files have precedent, I’ve always hated them for doing something Python is perfectly capable of doing in an inferior syntax. You would think these web people would realize the dangers of having a mini-language for everything: Do Python, PHP, Perl, HTML, CSS, RSS, Atom, JavaScript, Flash, and Silverlight come to mind? No wonder web developers get paid so much. (A point first brought to my attention by Clay Allsopp, who’s an up-and-coming iPhone developer [iTunes link] you should check out.) What if you want to express data more complex than strings? What if you wanted to nest headers deeper than a one-level hierarchy? What if you wanted configuration inheritance? (For example, a common configuration for development and production machines almost always exists. In INI files, they get duplicated—if you have ever deployed Pylons across multiple environments, you will have a Siberia of a time trying to get this right.) To use Python-language configuration, just import the file into an alias. Or evaluate it with exec and Python-3-removed’s execfile, again into an aliased module. Not only will you show you trust the user to do powerful voodoo in the configuration when necessary as we had to do above, you’ll be saving yourself a lot of time. By the way, Django got this one right with its settings.py. And knowing that Django got something right just makes me mad.

Joking! Not really.

[(July 25, 2009) .]

Abandon your ideas.

Use Markdown+, but not HTML. In code blocks, beware angle brackets.