Google App Engine’s datastore API is a continuation of a trend that the outstanding Python SQLAlchemy started, namely to re-purpose the class, an object-oriented concept, as a declaration of a table. Take, for example, the SA example
Base = declarative_base() class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String) fullname = Column(String) password = Column(String),
which by all standards seems primitive. I have to call declarative_base? And what are those underscores doing in my pristine class? Can’t you pluralize the table for me? I have to sit down, I have a headache now. SQL is hard.
This is an old hat for Ruby on Rails (released 2004), whose ActiveRecord and related APIs are so good that they don’t even need proper documentation. I’m assuming ActiveRecord ideas influenced the SA (released 2006), albeit with the ideas vivisected and modularized and heavily improved.
class Person(Entity): has_field('name', Unicode(255)) has_many('addresses', of_kind='Address') class Address(Entity): has_field('email', Unicode(128)) belongs_to('person', of_kind='Person')
before regressing back to its TurboEntity roots.
class Person(Entity): name = Field(Unicode(255)) addresses = OneToMany('Address') class Address(Entity): email = Field(Unicode(128)) owner = ManyToOne('Person')
The point is, SQLAlchemy is fundamentally different than ActiveRecord because it breaks the idea of an ORM into thoughtful and much more fundamental components. This is the key juju that’s been able to ignite a tiny bazaar of declarative SA layer libraries. But it’s not the only big difference between it and ActiveRecord.
You see, how these declarative layers work is another matter. For Ruby, it’s metaclasses, which is usually mind-boggling enough and is where the learning curve suddenly takes a right angle into a very un-sexy place. I think the problem most people have is “Why does it take so much code?” After all, you’re just messing with the boundary between class and object, static and instance. A good secondary question is “Couldn’t this be better done by hijacking the constructor?” You should absolutely distrust this sort of thing, whether it's metaclasses in Python or Ruby. It’s as suspicious as beautiful women in trench coats or typing newlines before opening braces.
Python, on the other, more refreshing hand, does it with
descriptors. If you’ve never heard of those, it’s because
tutorials don’t mention them and the two scary sections the
documentation spends on the topic are examples-less. A lucky search
for it might turn up Raymond Hettinger’s excellent how-to
guide for it, which does a much better job. Python’s
descriptors come from the conflict between types and classes (PEP
252) back in Python 2.2, which is by and large a non-issue these
days. How many Python people who started during 2.4 or
2.5 (like me) can rattle off the differences between the two? If
you’ve subclassed from
object all your life, you
probably can’t. You are doomed to repeat history. Bup bup; it’s too
The short version of descriptors is that it transforms B.x into “B.__dict__[‘x’].__get__(None, B)” (Hettinger) using a few short lines in __getattribute__. Again, Hettinger:
def __getattribute__(self, key): "Emulate type_getattro() in Objects/typeobject.c" v = object.__getattribute__(self, key) if hasattr(v, '__get__'): return v.__get__(None, self) return v
And, similarly, it transforms obj.x into B.__dict__[‘x’].__get__(obj, B).
No metaclasses. So let’s unpack those underscores and dots. __dict__ is the underlying lookup table for all objects. Classes are objects in Python (praise the lord), so they too have a lookup table. They contain what C++ and Java programmers would happily call static variables, variables you can access without having to create an instance of the class because the class itself is already an object instance of some deeper metaclass. Once we’re in __dict__, we have to check if it’s a descriptor with a hasattr call. Assuming that’s successful, we’re guaranteed a __get__ by the descriptor protocol. And then the magic ends.
We can model this as a conversation between you the programmer and the object:
- You: Hello, object. Please get me your variable x.
- Object: Yargh, it might be a descriptor. Let me interface with my class.
- You: How long will this take?
- Object: It’ll take as long as I want, motherfrigger.
- Class: Hi, everybody. It looks like it is a descriptor. I’ll interface with it.
- You: I’m going to get a soda.
- Descriptor: OK, I’m calling __get__ and returning it. Back to you, class.
- Class: Back to you, instance.
- Object: Back to you, guy.
- You miss it because you were getting a soda.
Which brings us to Google App Engine, which surprised and disappointed at least me in introducing another declarative layer and furthermore not building on the work of SQLAlchemy, but I admit that Rossum knows more than I do about Python. (Improbable, I know.) If you’ve seen one declarative Python ORM, you’ve seen them all, and GAE isn’t all that far from the rest. The only tricky part is entity ancestors, entity groups, transactions, and keys, but that’s another post.
Unlike SA and SA derivatives, there’s no Unicode property. Instead the StringProperty takes both str and unicode instances (bytes and unicode for you Python 3000 fans). It assumes that all str instances are ASCII, which is tantamount to saying that you can ignore Unicode going into your datastore until one day somebody comes along and decides that a certain variable should be localized or otherwise use a character outside ASCII at which point the program will explode. Disclaimer: program may just throw an exception.
Of course, you and I know better. String instances can be in hundreds of different encodings. However, if you output charset=utf-8 in your Content-Type, you should get back Unicode request data, which should postpone the inevitable decoding error. The WebOb object that Google uses is generally fantastic at decoding GET and POST data, praise the paste for the paste is good. But if you’re like me, you’ll be unhappy the minute you hear that one type will take in ASCII encoded bytestrings and Unicode strings, so you’ll want to implement a subclass of StringProperty. Call it UnicodeProperty. (Here’s where you say: OK, sure.) Thanks.
Now, it turns out that the datastore always returns Unicode, so no problem there. We just want to make sure that everything that goes in is also Unicode. Reason 1: If a section of our code is not passing in Unicode, then you’re not operating in Unicode everywhere and that’s bad citizenship. Reason 2: Not every bytestring you pass in is guaranteed to be in ASCII. Might as well root out the places and insert some good old decode()s.
From reading Hettinger’s how-to and being aware of the three-method descriptor protocol, a talented reader might venture that overriding __set__ would be the way to go. And it is!
from google.appengine.ext import db class UnicodeProperty(db.StringProperty): def __set__(self, obj, value): if type(value) == str: err = 'UnicodeProperty: "%s" not Unicode' % value raise Exception, err db.StringProperty.__set__(self, obj, value)
Note that it’s better to check for type equality there are three
possible types for
value: None, str, and unicode. Raising a virgin
Exception is also bad practice, but changing it is left as an exercise
for the reader.
Note also that unless you call the base __set__, you’ll be left with a
descriptor without all the Google datastore magic. So don’t leave that
out. Using Python’s
super is left as an exercise for the reader.
Now if you try to pass a bytestring containing ASCII string, with the assumption that the bytestring will always be ASCII forever and ever, you will see an exception in your logs and in your shame, which should be vast and permanent.
Similarly, we can subclass Google App Engine’s TextProperty with
class UnicodeTextProperty(db.TextProperty): def __set__(self, obj, value): if type(value) == str: err = 'UnicodeTextProperty: "%s" not Unicode' % value raise Exception, err db.TextProperty.__set__(self, obj, value)
These are drop-in replacements for their blander counterparts. Witness this
class User(db.Model): bio = db.TextProperty(default='') name = db.StringProperty(required=True) blog = db.StringProperty(default='')
turn into this, turning the defaults into Unicode while you’re at it,
class User(db.Model): bio = UnicodeTextProperty(default=u'') name = UnicodeProperty(required=True) blog = StringProperty(default=u'')