bor.borygmus

A programming weblog by Hao Lian. • A long walk through an angry forest. • A series of memory leaks brought on by senility.

Google App Engine’s datastore API is a continuation of a trend that the outstanding Python SQLAlchemy started, namely to re-purpose the class, an object-oriented concept, as a declaration of a table. Take, for example, the SA example

Base = declarative_base()
class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    password = Column(String),

which by all standards seems primitive. I have to call declarative_base? And what are those underscores doing in my pristine class? Can’t you pluralize the table for me? I have to sit down, I have a headache now. SQL is hard.

This is an old hat for Ruby on Rails (released 2004), whose ActiveRecord and related APIs are so good that they don’t even need proper documentation. I’m assuming ActiveRecord ideas influenced the SA (released 2006), albeit with the ideas vivisected and modularized and heavily improved.

A good wrapper over SA is Elixir, which brings back some of that homemade declarative magic. Amusing side note: It seems that Elixir went to Rails copycat

class Person(Entity):
    has_field('name', Unicode(255))
    has_many('addresses', of_kind='Address')

class Address(Entity):
    has_field('email', Unicode(128))
    belongs_to('person', of_kind='Person')

before regressing back to its TurboEntity roots.

class Person(Entity):
    name = Field(Unicode(255))
    addresses = OneToMany('Address')

class Address(Entity):
    email = Field(Unicode(128))
    owner = ManyToOne('Person')

The point is, SQLAlchemy is fundamentally different than ActiveRecord because it breaks the idea of an ORM into thoughtful and much more fundamental components. This is the key juju that’s been able to ignite a tiny bazaar of declarative SA layer libraries. But it’s not the only big difference between it and ActiveRecord.

You see, how these declarative layers work is another matter. For Ruby, it’s metaclasses, which is usually mind-boggling enough and is where the learning curve suddenly takes a right angle into a very un-sexy place. I think the problem most people have is “Why does it take so much code?” After all, you’re just messing with the boundary between class and object, static and instance. A good secondary question is “Couldn’t this be better done by hijacking the constructor?” You should absolutely distrust this sort of thing, whether it's metaclasses in Python or Ruby. It’s as suspicious as beautiful women in trench coats or typing newlines before opening braces.

Python, on the other, more refreshing hand, does it with descriptors. If you’ve never heard of those, it’s because tutorials don’t mention them and the two scary sections the documentation spends on the topic are examples-less. A lucky search for it might turn up Raymond Hettinger’s excellent how-to guide for it, which does a much better job. Python’s descriptors come from the conflict between types and classes (PEP 252) back in Python 2.2, which is by and large a non-issue these days. How many Python people who started during 2.4 or 2.5 (like me) can rattle off the differences between the two? If you’ve subclassed from object all your life, you probably can’t. You are doomed to repeat history. Bup bup; it’s too late.

The short version of descriptors is that it transforms B.x into “B.__dict__[‘x’].__get__(None, B)” (Hettinger) using a few short lines in __getattribute__. Again, Hettinger:

def __getattribute__(self, key):
    "Emulate type_getattro() in Objects/typeobject.c"
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__'):
        return v.__get__(None, self)
    return v

And, similarly, it transforms obj.x into B.__dict__[‘x’].__get__(obj, B).

No metaclasses. So let’s unpack those underscores and dots. __dict__ is the underlying lookup table for all objects. Classes are objects in Python (praise the lord), so they too have a lookup table. They contain what C++ and Java programmers would happily call static variables, variables you can access without having to create an instance of the class because the class itself is already an object instance of some deeper metaclass. Once we’re in __dict__, we have to check if it’s a descriptor with a hasattr call. Assuming that’s successful, we’re guaranteed a __get__ by the descriptor protocol. And then the magic ends.

We can model this as a conversation between you the programmer and the object:

  • You: Hello, object. Please get me your variable x.
  • Object: Yargh, it might be a descriptor. Let me interface with my class.
  • You: How long will this take?
  • Object: It’ll take as long as I want, motherfrigger.
  • Class: Hi, everybody. It looks like it is a descriptor. I’ll interface with it.
  • You: I’m going to get a soda.
  • Descriptor: OK, I’m calling __get__ and returning it. Back to you, class.
  • Class: Back to you, instance.
  • Object: Back to you, guy.
  • You miss it because you were getting a soda.

Which brings us to Google App Engine, which surprised and disappointed at least me in introducing another declarative layer and furthermore not building on the work of SQLAlchemy, but I admit that Rossum knows more than I do about Python. (Improbable, I know.) If you’ve seen one declarative Python ORM, you’ve seen them all, and GAE isn’t all that far from the rest. The only tricky part is entity ancestors, entity groups, transactions, and keys, but that’s another post.

Unlike SA and SA derivatives, there’s no Unicode property. Instead the StringProperty takes both str and unicode instances (bytes and unicode for you Python 3000 fans). It assumes that all str instances are ASCII, which is tantamount to saying that you can ignore Unicode going into your datastore until one day somebody comes along and decides that a certain variable should be localized or otherwise use a character outside ASCII at which point the program will explode. Disclaimer: program may just throw an exception.

Of course, you and I know better. String instances can be in hundreds of different encodings. However, if you output charset=utf-8 in your Content-Type, you should get back Unicode request data, which should postpone the inevitable decoding error. The WebOb object that Google uses is generally fantastic at decoding GET and POST data, praise the paste for the paste is good. But if you’re like me, you’ll be unhappy the minute you hear that one type will take in ASCII encoded bytestrings and Unicode strings, so you’ll want to implement a subclass of StringProperty. Call it UnicodeProperty. (Here’s where you say: OK, sure.) Thanks.

Now, it turns out that the datastore always returns Unicode, so no problem there. We just want to make sure that everything that goes in is also Unicode. Reason 1: If a section of our code is not passing in Unicode, then you’re not operating in Unicode everywhere and that’s bad citizenship. Reason 2: Not every bytestring you pass in is guaranteed to be in ASCII. Might as well root out the places and insert some good old decode()s.

From reading Hettinger’s how-to and being aware of the three-method descriptor protocol, a talented reader might venture that overriding __set__ would be the way to go. And it is!

from google.appengine.ext import db
class UnicodeProperty(db.StringProperty):
    def __set__(self, obj, value):
	if type(value) == str:
	    err = 'UnicodeProperty: "%s" not Unicode' % value
	    raise Exception, err
	db.StringProperty.__set__(self, obj, value)

Note that it’s better to check for type equality there are three possible types for value: None, str, and unicode. Raising a virgin Exception is also bad practice, but changing it is left as an exercise for the reader.

Note also that unless you call the base __set__, you’ll be left with a descriptor without all the Google datastore magic. So don’t leave that out. Using Python’s super is left as an exercise for the reader.

Now if you try to pass a bytestring containing ASCII string, with the assumption that the bytestring will always be ASCII forever and ever, you will see an exception in your logs and in your shame, which should be vast and permanent.

Similarly, we can subclass Google App Engine’s TextProperty with

class UnicodeTextProperty(db.TextProperty):
    def __set__(self, obj, value):
	if type(value) == str:
            err = 'UnicodeTextProperty: "%s" not Unicode' % value
	    raise Exception, err
	db.TextProperty.__set__(self, obj, value)

These are drop-in replacements for their blander counterparts. Witness this

class User(db.Model):
    bio = db.TextProperty(default='')
    name = db.StringProperty(required=True)
    blog = db.StringProperty(default='')

turn into this, turning the defaults into Unicode while you’re at it,

class User(db.Model):
    bio = UnicodeTextProperty(default=u'')
    name = UnicodeProperty(required=True)
    blog = StringProperty(default=u'')

Hooray.

[(January 2, 2009) .]

Declarative in SQLAlchemy was meant to be a simple way to define tables, mappers, and classes, not as some sort of overall “hide all the messy details from me please” type of approach. It keeps things explicit. To suggest that SQLAlchemy attempts to implement a “better rails” is sort of missing the point. Rails implements an Active Record pattern only, albeit well. SQLAlchemy supports an Active Record pattern through Declarative but also fully supports a Data Mapper pattern. Additionally it’s a full SQL Toolkit and not just an ORM.

Not to take anything away from Rails and ActiveRecord, but it won’t support this (http://techspot.zzzeek.org/?p=33) even without the ugly underscores corrupting your pristine class.

Empty, 2009.01.19 (2 am)

To suggest that SQLAlchemy attempts to implement a “better rails” is sort of missing the point.

Where did I do this? I certainly didn’t mean to. I tried to explicitly suggest a possible influence based on ActiveRecord’s massive popularity and the rise of the ORM libraries around that time.

Additionally it’s a full SQL Toolkit and not just an ORM.

Where did I do this?

Hao Lian, 2009.01.19 (9 am)

I think my comments came off in a manner I did not intend. (That will teach me to try and comment after 2AM). I was not criticizing your post; It’s very good and you’re very kind to SQLAlchemy. I was just trying to clarify what SQLAlchemy offers. Thank you for the nice writeup.

Empty, 2009.01.19 (10 am)

nice post. (nice comment interface too, the text field is invisible….) anyway. I can fill in some details on some of your points:

> which by all standards seems primitive. I have to call declarative_base?

the Base class stores the string names of classes and tables, so that you can say things like relation("Address") or ForeignKey("othertable.other_id"). You need to have a non-global variable to store that namespace if you ever want two applications to run in the same process without name collisions. (not sure what Elixir does for this, if anything…)

> And what are those underscores doing in my pristine class?

what, the double underscores ? go read some python code. granted we’re not supposed to use those in userspace libraries but im drawing upon my inner Ian Bicking for that one.

> Can’t you pluralize the table for me?

I’m assuming this is a joke.

> I’m assuming ActiveRecord ideas influenced the SA (released 2006)

Absolutely not. SA’s primary inspiration from the python point of view was SQLObject. Otherwise it was Fowler’s datamapper examples in POEAA. We also took some cues from Hibernate, though in retrospect those decisions should have been thought through more deeply (and at least one has been reversed).

From the little I’ve seen of ActiveRecord it doesn’t seem to have too many “ideas”, it’s all been done before (and has been done since, and will be done again, and again…). Whatever they’ve done on the Ruby side I’m sure is all very clever in that Ruby-like way.

> which surprised and disappointed at least me in introducing another declarative layer and furthermore not building on the work of SQLAlchemy

Google had me take a look at GAE before it was released and asked if I would build a SQLAlchemy emulation layer on top of it. I told them that it wouldn’t accomplish much, since GAE’s datastore is not relational in the slightest, and there would be pretty much zero chance of any GAE or SQLA application porting to the other without a rewrite in any case. A slightly different looking API at the declarative level is the least of your concerns if you’re porting from one to the other.

—mike bayer, 2009.01.19 (11 am)

I’m assuming this is a joke.

The first two paragraphs are tongue-in-cheek.

Absolutely not.

Darn.

I told them that it wouldn’t accomplish much, since GAE’s datastore is not relational in the slightest, and there would be pretty much zero chance of any GAE or SQLA application porting to the other without a rewrite in any case.

My general disappointment was aimed at reusing code, not the ability to port from GAE to a regular SQL server. There’s still overlap between the two projects: the types system, the mapper, the sessions, and sqlalchemy.sql/GQL. I didn’t want SQLA to start supporting GAE; but the scratch creation of google.appengine.ext.db was hugely disappointing.

Hao Lian, 2009.01.19 (5 pm)

Abandon your ideas.

Use Markdown+, but not HTML. In code blocks, beware angle brackets.