Be Safe with Django 1.2


Transitioning to using Django 1.2 templates with AppEngine has been a real adventure, and this is just the second in a series of posts about my experience with it. I am glad to have the increased power that is in the newest version of the template language, but I have paid a steep price in hours spent making changes to be compatible with it.

For a good chunk of last night and spots of time here-and-there today, I have been going through my code and adding the |safe filter to variables that might contain HTML that needs to be rendered. Django 1.2 assumes that you want your variables to be escaped. While that is a good practice to prevent spammers and script kiddies from abusing your site, I always assume that I need to sanitize on input rather than output, and I consider all of my stored data to be safely renderable.

There are a couple of ways to prevent Django from escaping your variables. I have chosen to append the |safe filter. Why? Well, it is safer. As I said, I sanitize on input, but there’s always a chance that I will have overlooked something, and some persistent fool will figure out a way to game me. As long as I have to manually look at my templates, models and controller code before I decide that something is safe, I feel confident that my application is going to be OK.

If you are less risk averse -- or maybe just lazier -- than I am, you can get the same effect by simply placing an {% autoescape off %} {% endautoescape %} at the outermost level in your template hierarchy. To me, that is tempting fate and failure, but you are the best judge of own your own liabilities.

Most importantly, don’t get caught with a busted app after you innocently follow the directions on how to set a specific version of Django in you AppEngine application. This blog spent several days with a non-functioning Atom feed because I never really thought to check it after the upgrade.

Who really thinks about their feed once it’s working?



Fixing Custom Tags with Django 1.2

When I upgraded to version 1.4.2 of the AppEngine SDK, I was both intrigued and alarmed to see the warning message that I ought to be using use_library() to specify a version of Django that my application would use. On one hand, I was glad that I could switch to version 1.2, as it provides dramatically-improved if tags over 0.96. On the other hand, I knew that I would be in for a certain amount of pain because things were going to break. The first thing that broke for me was the call to template.register_template_library that my code uses to load a library of custom template tags. It crashed with the extraordinary helpful stack trace:

File "C:\google_appengine\google\appengine\ext\webapp\template.py", line 269, in register_template_library
File "C:\google_appengine\lib\django_1_2\django\template\__init__.py", line 1049, in add_to_builtins
File "C:\google_appengine\lib\django_1_2\django\template\__init__.py", line 984, in import_library
   app_path, taglib = taglib_module.rsplit('.',1)
ValueError: need more than 1 value to unpack 

After a great deal of stepping through code in Wing IDEs superb Debugger, I figured out that the register_template_library call had changed. Instead of taking an absolute path to the file to load, it expects a Python Module style path like we use with an import statement.

The fix was simple. I created a directory called ‘tags’ and moved my custom tags library file (MVCTags.py) there. I also added an empty __init__.py file to make it a Module. Finally, I changed the code to template.register_template_library(‘tags.MVCTags’).

With that out of the way, I have moved to fixing the next big problem: a change to the way that template loaders work has broken many of the include statements in my templates. It looks like I am going to be writing a custom template loader, but figuring out how to get it into the collection of template loaders looks daunting.


Do not Reinvent the Pagination Wheel
From the very day that Google released the first version of the AppEngine SDK until today, developers have been asking themselves and each other, “How can I handle paginating results from the datastore?” There were dozens of different naive attempts that failed because of the unique nature of the datastore. There were many approaches the solved only half the problem by offering forward-paging only. Still others offered good pagination while limiting the kinds of queries that could be performed.

It was an ugly and uncomfortable mess. It made everyone somewhat uneasy.

Google partially solved the problem with version 1.3.1 of the SDK which introduced query cursors, a simple, transparent and HTTP-friendly way to serialize and deserialize query states. They only provided a single-direction of query resumption, but it was a huge advance over the prior capabilities. A complete paging solution still required quite a bit of work.

I began doing that work in order to provide a new project of mine with web-standard paging. I wanted Previous and Next buttons as well as links to each page of results. Using cursors greatly simplified the code, but I was still writing a lot of code, and time that would otherwise be spent creating a great user interface and improving usability was going to building ugly behind-the-scenes mechanisms.

Fortunately for me, the voice of experience rolled around inside my head and advised me, “Hit up Google. Make sure that you aren’t reinventing the wheel.” One well-constructed Google query later, and I found Ben Davies’s PagedQuery class. It had all of the features that I needed, and it used all of the techniques and strategies a that top-notch AppEngine engineer would apply. It was concisely-coded yet extensively commented. It was beautiful and free. It was the wheel that I nearly reinvented.

So, go ahead and ignore everything else that is out there related to AppEngine paging. Disregard even this very blog’s old posts on the subject. Ben Davies built what you want. Use it.

A Better Sharded Counter
My current AppEngine project was crying out for some counters to track site-wide instances of various models, and I recalled watching Brett Slatkin’s video about building highly-scalable web apps. A few seconds of quality Google time later, and I had the ShardCounter classes ready to go. The same code is available in several locations:

One thing that quickly caught my attention is that these classes only support incrementing, and while that makes sense for something like a primitive visit counter, it didn’t handle my needs very well at all. My initial attempt to simply copy the increment function and change the critical += to a -= was naive and doomed to failure, but a little tinkering with the way that counts are recorded gave me a nice working solution that completely preserves the desirable performance characteristics of this approach.

Here’s the code that I came up with. Please feel free to use it in your own projects.
from google.appengine.api import memcache
from google.appengine.ext import db
import random

# This code unabashedly stolen from Google
# http://code.google.com/appengine/articles/sharding_counters.html#counter_python

class GeneralCounterShardConfig(db.Model):
    """Tracks the number of shards for each named counter."""
    name = db.StringProperty(required=True)
    num_shards = db.IntegerProperty(required=True, default=20)

class GeneralCounterShard(db.Model):
    """Shards for each named counter"""
    name = db.StringProperty(required=True)
    "The name of the counter."
    plus = db.IntegerProperty(required=True, default=0)
    "The number of times that the counter has been incremented."
    minus = db.IntegerProperty(required=True, default=0)
    "The number of times that the counter has been decremented."

def get_count(name):
    """Retrieve the value for a given sharded counter.

      name - The name of the counter
    total = memcache.get(name)
    if total is None:
        total = 0
        for counter in GeneralCounterShard.all().filter('name = ', name):
            total += counter.plus
            total -= counter.minus
        memcache.add(name, str(total), 60)
    return total

def increment(name):
    """Increment the value for a given sharded counter.

      name - The name of the counter
    config = GeneralCounterShardConfig.get_or_insert(name, name=name)
    def txn():
        index = random.randint(0, config.num_shards - 1)
        shard_name = name + str(index)
        counter = GeneralCounterShard.get_by_key_name(shard_name)
        if counter is None:
            counter = GeneralCounterShard(key_name=shard_name, name=name)
        counter.plus += 1

def decrement(name):
    """Decrement the value for a given sharded counter.

      name - The name of the counter
    config = GeneralCounterShardConfig.get_or_insert(name, name=name)
    def txn():
        index = random.randint(0, config.num_shards - 1)
        shard_name = name + str(index)
        counter = GeneralCounterShard.get_by_key_name(shard_name)
        if counter is None:
            counter = GeneralCounterShard(key_name=shard_name, name=name)
        counter.minus += 1

def increase_shards(name, num):
    """Increase the number of shards for a given sharded counter.
    Will never decrease the number of shards.

      name - The name of the counter
      num - How many shards to use

    config = GeneralCounterShardConfig.get_or_insert(name, name=name)
    def txn():
        if config.num_shards < num:
            config.num_shards = num

A New Facet of Computing in the Cloud

I have been working with Google's AppEngine since its first day of release. Their vision was instantly compelling, and the birth of Microsoft's Azure platform only increased my feeling that even in its infancy, cloud computing is the best way for web entrepreneurs to create their businesses. These platforms give us a way to develop and deploy applications with two compelling advantages over alternative hosting options: low cost of entry and effortless scaling to accommodate success.
Over the past week, I completed some work for a client that showed me another important facet of the cloud computing revolution: services in the cloud. These are web services  that can provide absolutely critical infrastructure points to enable the success of small, incipient web-based businesses.
In particular, I was working with Amazon Flexible Payment Service, creating a Python library to enable its use from within an AppEngine application. It is a beautiful, wonderfully-designed product, and working with it was a joy. It was readily apparent to me that any business could use it to quickly and almost effortlessly take care of their billing needs with very little middle-man overhead.
While I was investigating FPS, I took a look at the other AWS products. Everyone knows about EC2 and S3, both of which are integral parts of many exciting new web ventures, but I was surprised and delighted to discover CloudFront, a pay-as-you-go Content Delivery Network and DevPay, a way for developers not only to bill users for provided services but also to pay for the Amazon Web Services products that they utilize to provide the service in their first place. There you have two huge obstacles to success and growth accounted for. CloudFront enables you to deliver your static content at very low cost and minimal effort; DevPay provides for the management and collection of subscriptions or other fees while removing the step of getting billed by Amazon for the EC2, S3, SimpleDB, CloudFront or whatever other products are being used. Brilliant.
Over the coming days and weeks, I'm going to be writing a lot more about these and other services that enable entrepreneurs and small businesses to succeed on the web. I can scarcely contain my excitement at what I see as an explosion of new technologies that -- if used synergistically -- could completely reshape the way that businesses are created and run.


Announcing Sluggable-mixin


I've extracted the slug-related code from this blogging software and packaged it so that it should be easily-portable to any Google AppEngine application.  I hope that it will be useful for anyone who needs to add nice human and search engine-friendly URLs to their app.  It works like and is similar in spirit to my previously-released AppEngine mixin, taggable-mixin.

I'd like to encourage anyone who finds this useful -- or any who finds it to be a useless atrocity -- to leave questions, suggestions and feedback here.  I will answer as quickly as I can.


Now with slugs!

I've just integrated my new AppEngine Datastore mixin class, sluggable-mixin.  It adds the ability to associate a user-friendly slug with any datastore entity.  Posts here are now referenced by slug rather than their lengthy and meaningless datastore ID string.

I'm going to release it as open source as soon as I can polish it up a bit and write some good documentation.  I'm a stickler for good, thorough documentation and automated unit tests.  The tests are done; although, they might not have as much coverage as I'd like.  I'm hoping to get sluggable-mixin released before the New Year begins. 


A Pattern for RESTful URLs

I recently decided that I didn't like the way that URLs on the blog were formatted.  For example, the link to show the entry before this is:


and that is bad on a number of levels.  First, the post-specific data is the AppEngine Datastore ID of the entity that holds the post.  While it is usefully unique and a quick index to the data, it is also terribly ugly and utterly unhelpful to either human readers or search engines.  It  needs to be a slug.  That's well-and-good, as I have been working on a Sluggable mixin class to go along with the two other tools in my CMS belt, Taggable and Commentable.  I'll write more about Sluggable when it is ready to be released.

Secondly, the ID is passed in to the showpost handler as a GET parameter, and I'd rather have it be more RESTful, something like:


or even


since I don't have Sluggable ready.  Now, it occured to me that it would be reasonably easy to change the code up to have the RESTful-style URLs, but then I would be breaking any existing links to posts.  So, I needed to be able to switch over to the new-style while keeping the old-style available.  I came up with a pretty decent approach, I think.

The first step is that I needed to change the mapping  in the WSGIApplication setup.  You'll notice that I have removed all of the other mappings for the sake of brevity, but it used to look like this:

def main():
    application = webapp.WSGIApplication(
         ('/showpost', ShowPost)

In order to handle the RESTful pattern, I changed to a regular expression:

def main():
    application = webapp.WSGIApplication(
         (r'^/showpost{1}(/.*)?', ShowPost)

That will match both the desired new format and the must-be-tolerated old format.  Now that the mapping is set up to call the correct function, I have to go about modifying the ShowPost function.  This is how it looks:

class ShowPost(SmartHandler):
    def get(self):
        from post import Post
        postid = self.request.get('id')
        if postid is not None and len(postid) > 0:
                post = Post.get(postid)

Not bad, but modifying it to account for the new format while keeping the old format will be ugly, and I'll end up repeating the code in any other request-handling methods, so I'm going to abstract it a bit and put it into SmartHandler, the customized version of RequestHandler that I use.  I added the following instance method to the SmartHandler class:

def expects_request_id(self, *look_for):
    "Searches the request Uri for an embedded resource id."
    import string
    # First preference is to find it in the request Uri.  Assumption is
    # that it is the last element in a multi-element path.
    path_parts = string.split(self.request.path, "/")
    # Empty elements are meaningless, so delete them
    cleaned_path_parts = []
    for each_part in path_parts:
        if len(each_part) > 0:
    found_id = None
    if len(cleaned_path_parts) > 1:
        found_id = cleaned_path_parts[-1]
        # There is only one element in the path, so we will look
        # for id info in the GET & POST arguments.  Candidate argument
        # names are passed in through *look_for
        for each_arg_name in look_for:
            if each_arg_name in self.request.arguments():
                found_id = self.request.get(each_arg_name)
    if found_id is None:
        raise NoIDFound
        self.requested_id = found_id
    return found_id

And I call it in ShowPost like this:

class ShowPost(SmartHandler):
    def get(self, *args):
        from post import Post
                post = Post.get(self.requested_id)
                # code snipped for brevity
            except db.BadKeyError:
                # Render an error page here..."Sorry, but the post that you requested isn't there."
        except NoIDFound:
            # Render an error page: "Sorry, but when requesting a post, you have to specify the id of the Post."

You can see that expects_request_id has a declarative feel to it, and it seamlessly allows me to handle new-style and old-style URLs.  It assumes that any request id information is the last element in a multi-element path, and if it is a single-element path, it looks for a URL parameter that we pass in.  In this case, the parameter name is id, but it could be any string, and it could even be many different strings:

self.expects_request_id("id", "postid", "post")

would allow me to honor many different parameters.

I hope that this pattern and this code is useful.  I'll be happy to answer any questions about it, and I'm always deeply grateful for any suggestions and comments.


New Release of Taggable-mixin

I have just uploaded siginifcant changes to the taggable-mixin code base.  The new version has a markedly-simplified API and substantially-improved performance.  The cost of this goodness is that it is not backwards-compatible with the previous release, version 1.0.  Applications that have already started using Taggable-mixin will have to undertake conversions of their codebases and datastores.

Interested programmers should download the current code base and look through the documentation file, taggable.html.

I will answer any and all questions, comments and criticisms made here in the comments or sent to my email address.


Just released: taggable-mixin

I have just packaged up a bit of the code that is part of this blog, the part that manages tags, and released it as a Google Code-hosted open source project: taggable-mixin.

Taggable is a Python mixin class that can be added to any AppEngine Model class to give it the ability to have tags associated with it.  It does so without modifying the Model itself; the tags are stored in a completely separate Model of their own, and they are associated by Key.  The tags are managed efficiently, so they are never duplicated.  A single tag instance can be associated with any number of different model instances.

I think that it is a pretty neat, clean and compact solution to the problem of tagging.  I'm very much looking forward  to getting feedback from my fellow AppEngine developers, as I am still a relative newbie to Python.  I have learned a great deal about it since I began working with AppEngine, but I am hardly a veteran coder, familiar with Python idioms and such.  I also want this to be as valuable a contribution to the community as it can be, but unless I hear back from those using it, I will have only limited ideas about how it can be improved.


A few little changes...

Although nobody ever visits this blog, I'm going to pretend that I have a bevy of faithful readers for this post, so I can proudly announce that the site has undergone a cosmetic makeover and has benefited from some bugfixing.

I think that the new, purply-dark-pastel thing is much easier on the eyes than the old psychedelic orange look-and-feel.  It feels cool against my eye balls rather than stabbing and searing hot.  That's a nice sensation.  Also, there a several largely-invisible bug fixes.  The About link at the very bottom now functions correctly.

The fact that the About link was previously broken brings up an important subtlety about coding for Google  AppEngine on a Windows machine, as I do.  It seems that whatever flavor of UNIX (or is it Plan 9?  I read about it, but forgot) that AppEngine runs on has a filesystem that treats case significantly; about.html and About.html are two different files.  To Windows, however, they are the same file.  When my application went looking for About.html in my development, it found about.html, and it was happy, but when AppEngine went looking for About.html, it wasn't going to settle for anything other than that, exactly.

Fair enough, at least I know now.  One important point to make, though, is that when I changed the name of the file on my development system, adding a capital A, either Windows didn't actually change the name of the file at all or appcfg.py didn't upload the change to the production environment.  I'm not really sure that I can tell which is true.

It was fixed easily enough, however, but renaming the method that gets called when it gets a request for about.html.

I continue to make slow, steady progress on this and other Google AppEngine applications.  I am currently modularizing another piece of this blog application, turning the code that handles comments into a mixin, like my taggable class.  I will release commentable as a Google Code project, too, and in the reasonably-near future.  Before I do, I'd like to look into Python's automated unit testing facilities, as I feel a bit guilty about releasing code that doesn't have good automated testing coverage.

Perhaps one of my entirely hypothetical readers can point me in the right direction, hmm?


Going back to MyKidsLibrary.com

I've been playing hard with AppEngine, Python and Django for a couple of weeks now, and I've managed to come out of it with a useful application: this blog.  While I'd hardly say that it is finished, as there are still some features that I'd like to add, I have climbed the steep -- and thus challenging, interesting, addictive -- part of the learning curve.  So, I can no longer justify continuing to blow off MyKidsLibrary.com.

MKL is my primary off-hours project, and it has been for almost a year now.  It's been in public beta for a couple months now, and I am just beginning to implement the larger features that were suggested by the very helpful and considerate testers.  Apparently, as it stands, it lacks the stickiness that is mandatory for a successful social/crowd-sourcing web site.  Last night, I designed the data models for the new capabilities, and I started getting that buzzy feeling of excitement that I had lost during the last few months of coding that preceded the start of beta.

Getting back in to the Ruby on Rails groove was easy and enjoyable.  As cool as AppEngine is right now, Rails beats it hands-down for a productive development experience.  It was also nice to be back in NetBeans.  Komodo and the other IDEs that I played around with for Python were decent enough, but they felt immature, especially as I never was able to do any kind of step-through-the-code debugging with AppEngine and the tools that I had.  That seriously rubs me the wrong way; I've been coding far-too-long to be stuck with printf-style debugging.  It's bad enough that working with dynamically-typed languages seriously limits the code completion that is available.


Life is good (for a frugal New Englander software entrepreneur)

Legitimate Spring weather has finally arrived in New England, seemingly to stay.  The Red Sox have the best record in the AL; the Celtics look like they are poised to cruise through the Eastern conference, and the Bruins made a good show of it.

What's not to like?

Oh yeah, and I've been thinking recently about how being a software entrepreneur today is so much more interesting and fun and full of possibilities than it ever has been before, at least in my experience.  Why?  Because the barrier to entry is so low.  It was once the case that creating and selling an application required investment in compilers, distributable media, advertising, hosting, staffing, office space...a seemingly endless list of overhead-hangover-inducing rigmarole.

Today, I can create a revenue-generating application with nothing more than a laptop and a good idea.  There are excellent development tools that are free (NetBeans, Komodo Edit, Aptana, Eclipse, Visual Studio Express), hosting providers that are free or so cheap that they don't hurt (Google AppEngine, Amazon's EC2), powerful frameworks that speed development (too many to list, but Ruby on Rails and Django spring to mind for web applications and .NET/Mono for web and desktop) that are free.  There are free operating systems, and even free network access (at least some places, like the fabulous Boston Public Library).

I don't have to pay for any kind of company information infrastructure; Google Apps for Domains provides free email and calender hosting, a word processor, spreadsheets and presentation builder all with built-in collaboration tools.  I don't have to pay for a sales and marketing staff (AdWords) or worry about a complex revenue model or collections (AdSense).

Of course, I am only writing here about the tools/frameworks/applications that I have directly used at one time or another.  I'm sure that there are dozens or even hundred of other cost-saving/eliminating techniques and approaches being used by other like-minded developers.  It all adds-up to there being no excuses not to try out your ideas, to form sudden companies and get to market immediately.


Still learning, still liking

I'm continuing to work on this blog software, and I am learning a lot about AppEngine, Python and Django.  I used Django's include function to refactor the rendering of posts.  That removed some code duplication.  Got to love that.  I feel like I've already done a full day's work, and it's only 10:30!

My inclination is next to look at extending Python's WSGIHandler further to make it more MVC-like.  I'd never paid much attention to that design pattern before I started using Ruby on Rails, but I'm quite addicted to it now, and doing the extra work -- and admittedly, it's not that much -- to do it the AppEngine way irks me.  I've already taken a step in that direction, subclassing RequestHandler to do all the work of creating a path to the appropriate template file and passing it to the render method, but I know that there's a lot more to do.  I also feel like I could refactor the handling of the dictionary that passes values to the template to reduce the repititious setting of values that happens in each handler.


Live at Last!

Learning to create Google App Engine applications via building this simple blog has been hugely fun and not just a little bit frustrating.  Many, many thanks to Google's Marzia for so diligently helping me to figure out the missing piece that kept me from being able to upload this code.

I've come away from the experience feeling very enthusiastic about App Engine.  It's not going to replace Ruby on Rails as my web development platform of choice, but I have to admit that GAE makes it very, very easy to quickly build and deploy an application.  I managed to become a pretty competant Python programmer in just a few days, and I know enough about Django to build a small web application that observes the DRY principle.

I became very frustrated that Django is a whole seperate language unto itself, so I had to learn Python and Django.  I suppose that I have become spolied by Rails, which allows me to write the logic, data and presentation layers in one language.  It's so simple, beautiful and elegant.

That's not neccesarily an argument against GAE, which it seems is going to support multiple language and frameworks as it matures.  The combination of Ruby on Rails's power with Google's essentially infinite scalability would be a mighty and compelling thing.


Paginating Records in Google AppEngine

3/3/2001: I now consider the information in this post to be obsolete. More useful and up-to-date advice is available in the post Do Not Reinvent the Pagination Wheel.


In creating this blogging software, I have had to come to grips with finding a way to paginate content.  It's a relatively trivial exercise under most circumstances; it is a well-understood pattern, and it is actually built in to some of the popular frameworks.  AppEngine is a little different, and the nature of the Datastore actually makes it rather challenging to implement efficient useful paging.   I've come up with a solution that I think makes for a good balance of functionality and AppEngine-friendliness.

The code and tehcniques included here are Open Source.  I do hope that if you choose to use this code in your oen project that you'll comment here to share your feedback, suggestions and experiences.  Sharing means caring, guys.  For real.

This Paginator class depends on the Model that it will be paginating having an 'index' field, a unique value that is order with respect to how the pagination will occur.  For instance, here is the model definition for this blog's Comment entity:

class Comment(db.Model):
    """A Model for storing comments associated with another entity."""
    author = db.StringProperty(required=True, verbose_name="Author")
    "A text representation of the user who write the comment."
    body = db.TextProperty(required=True, verbose_name="Comment")
    "The text of the comment."
    added = db.DateTimeProperty(auto_now_add=True, verbose_name="Date Added")
    "The date that the comment was added, or created."

    index = db.IntegerProperty(required=True, default=0)
    "The index of the comment in the collection of comments for the parent entity."

Here, index increases every time a new comment is added; in fact, it mirrors added, always increasing.  However, index will always be unique.  It might not always be contiguous however, as a Comment can be deleted.  This function adds comments to the parent entity.  You can see how index is maintained:

def add_comment(self, author, body):
    "Add a new comment to this entity.  Returns the new comment object."
    new_comment = None
    def add_comment_txn():
        new_comment = Comment(parent=self, author=author, body=body, index=self.comment_index)
        self.comment_index += 1
        self.comment_count += 1
        return new_comment
    new_comment = db.run_in_transaction(add_comment_txn)
    # Invalidate the cached collection of records, so it will be regenerated
    # and re-loaded with the new record in it.
    return new_comment

Paginator comes in to play in the function that gets a page of comments when the blog is requested to show a post:

def get_comments(self, index=0, count=5):
    "Return the comments attached to this entity."
    comments_paginator = Paginator(count, 'index')
    comments = comments_paginator.get_page(db.Query(Comment).ancestor(self), index, True)
    return comments

The only perhaps slightly non-obvious part is index.  Where does it come from?  How do I know which index to ask for?  Is index the page number?  The answer to those questions is a little bit of a chicken-and-egg situation.  You provide Paginator's get_page method with an index from a previous call, usually the next_page or prev_page index.  Usually, you'll get those values the first time by calling get_page with an index of None.  That will tell it to get the very first page of results, and then you will have access to the prev_index, next_index and curr_index values that can be fed back in to it.  The Paginator alwasy looks for indexes relative to what is passed in, so the requested index doesn't exist --because it was deleted between calls -- it'll find the next one in the order.

So, that should give you a pretty good idea of how the Paginator works.  Please post any questions or suggestions as a comment, and I'll see them and address them as best as I am able.  Here, then is the actual Paginator code:

#Copyright 2008 Adam A. Crossland
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#See the License for the specific language governing permissions and
#limitations under the License.

from google.appengine.ext import db
import copy

class PaginatedList(list):
    """An extended normal Python list with three additional properties used for
    pagination purposes:
    prev_index - the starting index of the previous page of entities;
    next_index - the starting index of the next page of entities;
    curr_index - the starting index of the current page of entities
    def __init__(self, *args, **kw):
        list.__init__(self, *args, **kw)
        self.prev_index = None
        "The starting index of the previous page of entities"
        self.next_index = None
        "The starting index of the next page of entities"
        self.curr_index = None
        "The starting index of the current page of entities"
class Paginator:
    "A class that supports pagination of AppEngine Datastore entities."
    def __init__(self, page_size, index_field):
        self.page_size = page_size
        "The number of entities that constitute a 'page'"
        self.index_field = index_field
        "The name of the field in the Model that is a orderable index"

    def get_page(self, query=None, start_index=None, ascending=True):
        """Takes a normal AppEngine Query and returns paginated results.
        query - a Datastore Query object.  It must not have an order clause.
        start_index - the index of the first record in the desired page.  If the
            index is not known, or the first page is needed, None should be
        ascending - True if the index column is to be ordered ascending; False
            should be passed for descending ordering.
        fetched = None
        # I need to make a copy of the query, as once I use it to get the main
        # collection of desired records, I will not be able to re-use it to get
        # the next or prev collection.
        query_copy = copy.deepcopy(query)
        if ascending:
            # First, I will grab the requested page of entities and determine
            # the index for the next page
            filter_on = self.index_field + " >="
            fetched = PaginatedList(query.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
            # Now, I will try to determine the index of the previous page
            filter_on = self.index_field + " <"
            previous_page = query_copy.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
            # Follow the same logical pattern as for ascending, but reverse
            # the polarity of the neutron flow
            filter_on = self.index_field + " <="
            fetched = PaginatedList(query.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size + 1))
            if len(fetched) > 0:
                # The first row that we get back is the real index.
                fetched.curr_index = fetched[0].index           
            if len(fetched) > self.page_size:
                # We fetched one more record than we actually need.  That is the
                # index of the first record of the next page.  Record it, and
                # delete the extra record from our collection.
                fetched.next_index = fetched[-1].index
            # Determine index of previous page
            filter_on = self.index_field + " >"
            previous_page = query_copy.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size)
            if len(previous_page) > 0:
                # The last record is the first record in the previous page.
                # Record it.
                fetched.prev_index = previous_page[-1].index
        return fetched

Fixing the TemplateDoesNotExist error AppEngine/Django 1.2

When I converted my blogging application from Django 0.96 templates to 1.2, I encountered three problems that set me back a few days: my custom template tag libraries stopped working, all of the output had its HTML content escaped and all of my templates that used extends or include broke. I described how I resolved the first two issues in my most recent two posts, and I have been planning to cover the third issue, but I procrastinated.

Yesterday, a fellow AppEngine programmer posted a question on StackOverflow about the exact problem that I had fixed, so I bit the bullet and wrote up a lengthy answer describing my solution. I hope that someday soon I might have the time required to cover the subject matter more comprehensively, but for the time being, here is the question with my answer on StackOverflow.

You'll know that you care about this if your AppEngine application has suddenly started throwing TemplateDoesNotExist exceptions.


Another Wheel Not Reinvented: Full-text Search

If you spend enough time with people using AppEngine, trolling the newsgroups or answering questions in the google-app-engine tag on StackOverflow, you will begin to notice a collection of ‘holy grail’ problems and features that always come up. As near as I can tell, these include: really good pagination, uninqueness constraints in the datastore and full-text search. There are others, of course, but those three come up all the time, and finding solutions for them tends to be problematical because of certain inherent characteristics of the datastore.

The pagination issue has been addressed to my complete satisfaction by Ben Davies’ PagedQuery class; I have written about it in a previous post.

Full-text search is much trickier, and I don’t yet know of any true solution, but I think that for many applications, there is a free, simple and elegant alternative. You were probably looking at it just a second ago. Go back to the home page of this blog, and take a look at the Google Custom Search bar that now sits at the top. Give it a try.

I was able to create that Custom Search in less than 10 minutes, and I spent about 30 minutes getting it to look just the way that I wanted. Now, my blog is searchable through the very best means available: Google.

Certainly, there will be a substantial class of applications that won’t be able to take advantage of this terrific, free service in the same way that I have, but many will, and they should not even consider rolling their own indexing engine, search parser and presentation templates. Just don’t. Don’t be a control freak, and don’t go looking to solve the problem just because it is such a hard one. Use your time on better things like creating great services that solve real-world problems.


Going Go on Windows


I admit it: I love all things Google. Gmail, Google+, Google Music, Google Docs, AppEngine, Apps for Domains, AdSense, Analytics -- I use and enjoy them all. I was very excited when I first saw the announcement about Google’s new programming language Go, but I didn’t set about learning it right away. I was deterred not having any particular project build with it, and it was not available for Windows, my platform of choice.

The first problem was solved by the addition of the Go runtime to the AppEngine platform. It was a perfect fit for a project that I had been working on with the idea of deploying it on node.js. I still think that I might build a backend for this program in Node, not only to have a backup to AppEngine but also to learn Node really well. The second problem has mostly been solved by the release of the official SDK for Windows. After downloading and installing it, I was immediately about to compile and link, but using other tools like gomake and gotest was not working. Through a great deal of trial-and-error, I eventually settled on a solution that now works very nicely for me.

The biggest hurdle to overcome is that many of the Go tools assume the existence of the standard UNIX developer toolchain, such as make. There are many packages that offer ports for UNIX: Cygwin, GnuWin32, Djgpp, Microsoft’s Windows Services for UNIX and MinGW just off the top of my head. I tried all of them and eventually settled on the MSYS package from MinGW. While it doesn’t contain an entire development environment, it has bash, make and a terminal window (rxvt) that is a vast improvement over the stock DOS command window.

After installing MSYS, you want to make some config changes to make things easier for you. In etc/fstab, you can map your long and complicated Windows paths to directories in your MSYS home directory. For instance, I like to map C:/Users/adam/Documents/Projects to /home/Adam/projects, so when I start up the MSYS shell, I can just cd projects and I will be were all my code lives. Also, you may want to modify the .bash_profile in your MSYS home directory such that it creates and exports the Go-related environment variables that will make the Go tools work correctly.

Once you have all that set up, you should be able to start up the MSYS terminal, cd to the directory where your Go project lives and use gomake and gotest to compile, link, install and run unit tests.

I have been having a great time learning Go. It is a wonderful language, and I really hope that it gains traction. I will be having a lot more blog posts in the near future about it and some of the work that I am doing with it. I already have an open source Go project hosted on bitbucket: mtemplate.

AppEngine/Go Knowledge Resources

I have been spending a great many evenings working on a web application that will -- some day -- run on AppEngine's Go runtime. Go is a wonderful language, and programming with it is fun in a way that is increasingly uncommon. Still, it is a very new language and good references are rare. Useful documentation for the AppEngine/Go platform is even hard to come by; the so-called "Reference" for the Go interface to the datastore is little more than a few simple examples and a trivial list of API calls. It's disappointing, frustrating and unproductive.

Fortunately, the web is full of smart people who are willing to go out of their ways to share what they have learned, and I am gradually discovering all of them. I will be listing them here for my own reference as well as making the discovery process less difficult for my fellows.

Ugorji Nwoke's Blog is full of great ideas and thoughts about AppEngine/Go and AppEngine as a platform generally. Very smart, interesting guy.

Miek Gieben's Learning Go isn't specifically about AppEngine, but it is an invaluable reference for the GO language.

GoLang Tutorials has some very valuable guides for obtaining a better mastery of Go. If you find yourself feeling a bit confused by something that Go does a little differently than other programming languages (like interfaces) you may well find a well-written and informative post here that will clear the way for you.


So, it is a meager collection at the moment, but I will be adding to it every chance that I get. Also, I hope to be making my own blog a useful resource, as I have a couple of good techniques and ideas that I have come up with while working on my project.