July 22, 2010
One thing that quickly caught my attention is that these classes only support incrementing, and while that makes sense for something like a primitive visit counter, it didn’t handle my needs very well at all. My initial attempt to simply copy the increment function and change the critical += to a -= was naive and doomed to failure, but a little tinkering with the way that counts are recorded gave me a nice working solution that completely preserves the desirable performance characteristics of this approach.
Here’s the code that I came up with. Please feel free to use it in your own projects.
from google.appengine.api import memcache
from google.appengine.ext import db
import random
# This code unabashedly stolen from Google
# http://code.google.com/appengine/articles/sharding_counters.html#counter_python
class GeneralCounterShardConfig(db.Model):
"""Tracks the number of shards for each named counter."""
name = db.StringProperty(required=True)
num_shards = db.IntegerProperty(required=True, default=20)
class GeneralCounterShard(db.Model):
"""Shards for each named counter"""
name = db.StringProperty(required=True)
"The name of the counter."
plus = db.IntegerProperty(required=True, default=0)
"The number of times that the counter has been incremented."
minus = db.IntegerProperty(required=True, default=0)
"The number of times that the counter has been decremented."
def get_count(name):
"""Retrieve the value for a given sharded counter.
Parameters:
name - The name of the counter
"""
total = memcache.get(name)
if total is None:
total = 0
for counter in GeneralCounterShard.all().filter('name = ', name):
total += counter.plus
total -= counter.minus
memcache.add(name, str(total), 60)
return total
def increment(name):
"""Increment the value for a given sharded counter.
Parameters:
name - The name of the counter
"""
config = GeneralCounterShardConfig.get_or_insert(name, name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)
counter = GeneralCounterShard.get_by_key_name(shard_name)
if counter is None:
counter = GeneralCounterShard(key_name=shard_name, name=name)
counter.plus += 1
counter.put()
db.run_in_transaction(txn)
memcache.incr(name)
def decrement(name):
"""Decrement the value for a given sharded counter.
Parameters:
name - The name of the counter
"""
config = GeneralCounterShardConfig.get_or_insert(name, name=name)
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = name + str(index)
counter = GeneralCounterShard.get_by_key_name(shard_name)
if counter is None:
counter = GeneralCounterShard(key_name=shard_name, name=name)
counter.minus += 1
counter.put()
db.run_in_transaction(txn)
memcache.decr(name)
def increase_shards(name, num):
"""Increase the number of shards for a given sharded counter.
Will never decrease the number of shards.
Parameters:
name - The name of the counter
num - How many shards to use
"""
config = GeneralCounterShardConfig.get_or_insert(name, name=name)
def txn():
if config.num_shards < num:
config.num_shards = num
config.put()
db.run_in_transaction(txn)
December 09, 2008
I recently decided that I didn't like the way that URLs on the blog were formatted. For example, the link to show the entry before this is:
/showpost?id=aglhZGFtY2Jsb2dyLQsSCUJsb2dJbmRleCIJYWRhbWNibG9nDAsSBFBvc3QiC2FkYW1jYmxvZzE2DA
and that is bad on a number of levels. First, the post-specific data is the AppEngine Datastore ID of the entity that holds the post. While it is usefully unique and a quick index to the data, it is also terribly ugly and utterly unhelpful to either human readers or search engines. It needs to be a slug. That's well-and-good, as I have been working on a Sluggable mixin class to go along with the two other tools in my CMS belt, Taggable and Commentable. I'll write more about Sluggable when it is ready to be released.
Secondly, the ID is passed in to the showpost handler as a GET parameter, and I'd rather have it be more RESTful, something like:
/showpost/acts_as_urlnameable-instructions
or even
/showpost/aglhZGFtY2Jsb2dyLQsSCUJsb2dJbmRleCIJYWRhbWNibG9nDAsSBFBvc3QiC2FkYW1jYmxvZzE2DA
since I don't have Sluggable ready. Now, it occured to me that it would be reasonably easy to change the code up to have the RESTful-style URLs, but then I would be breaking any existing links to posts. So, I needed to be able to switch over to the new-style while keeping the old-style available. I came up with a pretty decent approach, I think.
The first step is that I needed to change the mapping in the WSGIApplication setup. You'll notice that I have removed all of the other mappings for the sake of brevity, but it used to look like this:
def main():
application = webapp.WSGIApplication(
[
('/showpost', ShowPost)
])
wsgiref.handlers.CGIHandler().run(application)
In order to handle the RESTful pattern, I changed to a regular expression:
def main():
application = webapp.WSGIApplication(
[
(r'^/showpost{1}(/.*)?', ShowPost)
])
wsgiref.handlers.CGIHandler().run(application)
That will match both the desired new format and the must-be-tolerated old format. Now that the mapping is set up to call the correct function, I have to go about modifying the ShowPost function. This is how it looks:
class ShowPost(SmartHandler):
def get(self):
from post import Post
postid = self.request.get('id')
if postid is not None and len(postid) > 0:
try:
post = Post.get(postid)
Not bad, but modifying it to account for the new format while keeping the old format will be ugly, and I'll end up repeating the code in any other request-handling methods, so I'm going to abstract it a bit and put it into SmartHandler, the customized version of RequestHandler that I use. I added the following instance method to the SmartHandler class:
def expects_request_id(self, *look_for): "Searches the request Uri for an embedded resource id." import string # First preference is to find it in the request Uri. Assumption is # that it is the last element in a multi-element path. path_parts = string.split(self.request.path, "/") # Empty elements are meaningless, so delete them cleaned_path_parts = [] for each_part in path_parts: if len(each_part) > 0: cleaned_path_parts.append(each_part) found_id = None if len(cleaned_path_parts) > 1: found_id = cleaned_path_parts[-1] else: # There is only one element in the path, so we will look # for id info in the GET & POST arguments. Candidate argument # names are passed in through *look_for for each_arg_name in look_for: if each_arg_name in self.request.arguments(): found_id = self.request.get(each_arg_name) break if found_id is None: raise NoIDFound else: self.requested_id = found_id return found_id
And I call it in ShowPost like this:
class ShowPost(SmartHandler):
def get(self, *args):
from post import Post
try:
self.expects_request_id("id")
try:
post = Post.get(self.requested_id)
#
# code snipped for brevity
#
except db.BadKeyError:
# Render an error page here..."Sorry, but the post that you requested isn't there."
except NoIDFound:
# Render an error page: "Sorry, but when requesting a post, you have to specify the id of the Post."
You can see that expects_request_id has a declarative feel to it, and it seamlessly allows me to handle new-style and old-style URLs. It assumes that any request id information is the last element in a multi-element path, and if it is a single-element path, it looks for a URL parameter that we pass in. In this case, the parameter name is id, but it could be any string, and it could even be many different strings:
self.expects_request_id("id", "postid", "post")
would allow me to honor many different parameters.
I hope that this pattern and this code is useful. I'll be happy to answer any questions about it, and I'm always deeply grateful for any suggestions and comments.
December 07, 2008
A few posts ago, I promised to share my fail-proof instructions for installing and integrating the Ruby on Rails plugin called acts_as_urlnameable. Here is what I learned. Since I put this together, I have discovered some more issues that I need to find workarounds for, and I'll post about those later. For now, here's how you get it working for you in the vast majortiy of cases.
1. Install plugin:
- script/plugin install http://code.helicoid.net/svn/rails/plugins/acts_as_urlnameable/ will make a static copy of the source for you, or
- script/plugin install -x http://code.helicoid.net/svn/rails/plugins/acts_as_urlnameable/ will fetch a copy via SVN
2. Add acts_as_urlnameable to //environment.rb// if needed. If you define config.plugins, add urlnameable there.
3. For each model that will be urlnameable, add acts_as_urlnameable:
class Foo < ActiveRecord::Base acts_as_urlnameable :nameable_field end
4. Add to each Model an override of to_param. This implementation differs from the suggested ones by continuing to provide the default numeric id for records that haven't been urlnameified yet. This should help you to avoid breaking existing functionality when adding this to an existing website:
def to_param
if urlname and urlname.length > 0
urlname
else
id
end
end
5. Add a new method, smart_find. Again, this approach allows you to have mixed numeric and urlnamed ids. This will save a good deal of time when converting an existing Rails application to use acts_as_urlnameable. There are plenty of places in code that you won't care about having pretty, legible ids, like in HIDDEN form fields. This reduces the total exposure to your code:
def self.smart_find(id)
found_foo = nil
if id.to_i > 0
# We got a regular, old int id, so look it up as usual
found_foo = Foo.find(id)
else
# We got a string, a urlname id
found_foo = Foo.find_by_urlname(id)
end
end
6. Add a migration to add the table and apply to existing rows:
class AddUrlnamesTable < ActiveRecord::Migration # :nodoc:
def self.up
create_table 'urlnames' do |t|
t.column 'nameable_type', :string
t.column 'nameable_id', :integer
t.column 'name', :string
end
# For each Model to which acts_as_urlnameable will apply
# and which has existing rows, add a loop like the following;
# simply resaving each record will add the urlname data.
for each_foo in Foo.find(:all)
each_foo.save
end
end
def self.down
drop_table 'urlnames'
end
end
7. In each controller that references one of the now-acts_as_urlnameable Models, change references to Foo.find to Foo.smart_find
8. In all of your views, check for links that use the pattern '':id => @foo.id'' and replace them with '':id => @foo''. This will allow the to_param override to intelligently choose which id to provide.
That's the basic idea, and that should be enough to get anyone going with acts_as_urlnameable. During the course of integrating it into My Kids Library, I have discovered a number of circumstances that require some pretty sophisticated customization, and I will detail those in future posts. Until then, I am happy to answer any questions posted in the Comments section.
August 28, 2008
One of the many changes that Jessamyn suggested for MyKidsLibrary is that the URLs should be comprised of meaningful text rather than just numbers. In addition to being more human-friendly, it is, apparently, an important search engine optimization technique.
Ruby on Rails likes to construct URLs that end with a numeric identifier that is used to look up a specific record in the database. It is an efficient, effective solution, and the software engineer side of me never considered why you'd have it be otherwise. I have come to think of URLs as being things that are as effectively meaningless and worthless to my brain as printouts of UNIX coredumps. I click on links, I bookmark pages, I never pay the slightest attention to URLs. I use tools -- browsers, bookmarking services -- to work with URLs just as I use tools to write software.
Once I decided to go about making the change, I set out to find who else had already done this work. The Rails ecosystem is vast and densely populated; I knew that there was but a very tiny chance that I'd actually have to start from scratch. Sure enough, a little work on Google revealed that there were many candidate solutions. I picked one that looked solid and set about integrating it into my project.
I'm not new to this; I have been a working, salary-earning software engineer for nearly two decades, so I should have been prepared for the documentation to suck. The documentation always sucks. The last time that I read really good, comprehensive documentation was when I was writing code for a VMS system, and I sat right next to the big orange wall. At least, I remember it being good; it's all so long ago that I might be remembering it in a somewhat nostalgic light.
I had to figure out a lot of things that weren't mentioned in the documentation, and while that's not the worst thing, it is still frustrating to see a useful, well-put-together package that stops just short of being perfect. And, really, they all do.
Open Source is invaluable, but in many respects, it reminds me of a Roadside Picnic.
And just to prove that I'm not a hypocritical dick, my next post will include extensive, failproof instructions for configuring and using the wonderful Rails plugin acts_as_urlnameable.
December 01, 2008
3/3/2001: I now consider the information in this post to be obsolete. More useful and up-to-date advice is available in the post Do Not Reinvent the Pagination Wheel.
In creating this blogging software, I have had to come to grips with finding a way to paginate content. It's a relatively trivial exercise under most circumstances; it is a well-understood pattern, and it is actually built in to some of the popular frameworks. AppEngine is a little different, and the nature of the Datastore actually makes it rather challenging to implement efficient useful paging. I've come up with a solution that I think makes for a good balance of functionality and AppEngine-friendliness.
The code and tehcniques included here are Open Source. I do hope that if you choose to use this code in your oen project that you'll comment here to share your feedback, suggestions and experiences. Sharing means caring, guys. For real.
This Paginator class depends on the Model that it will be paginating having an 'index' field, a unique value that is order with respect to how the pagination will occur. For instance, here is the model definition for this blog's Comment entity:
class Comment(db.Model): """A Model for storing comments associated with another entity.""" author = db.StringProperty(required=True, verbose_name="Author") "A text representation of the user who write the comment." body = db.TextProperty(required=True, verbose_name="Comment") "The text of the comment." added = db.DateTimeProperty(auto_now_add=True, verbose_name="Date Added") "The date that the comment was added, or created." index = db.IntegerProperty(required=True, default=0) "The index of the comment in the collection of comments for the parent entity."
Here, index increases every time a new comment is added; in fact, it mirrors added, always increasing. However, index will always be unique. It might not always be contiguous however, as a Comment can be deleted. This function adds comments to the parent entity. You can see how index is maintained:
def add_comment(self, author, body): "Add a new comment to this entity. Returns the new comment object." new_comment = None def add_comment_txn(): new_comment = Comment(parent=self, author=author, body=body, index=self.comment_index) new_comment.put() self.comment_index += 1 self.comment_count += 1 self.put() return new_comment new_comment = db.run_in_transaction(add_comment_txn) memcache.delete(self._comments_cache_key()) # Invalidate the cached collection of records, so it will be regenerated # and re-loaded with the new record in it. return new_comment
Paginator comes in to play in the function that gets a page of comments when the blog is requested to show a post:
def get_comments(self, index=0, count=5): "Return the comments attached to this entity." comments_paginator = Paginator(count, 'index') comments = comments_paginator.get_page(db.Query(Comment).ancestor(self), index, True) return comments
The only perhaps slightly non-obvious part is index. Where does it come from? How do I know which index to ask for? Is index the page number? The answer to those questions is a little bit of a chicken-and-egg situation. You provide Paginator's get_page method with an index from a previous call, usually the next_page or prev_page index. Usually, you'll get those values the first time by calling get_page with an index of None. That will tell it to get the very first page of results, and then you will have access to the prev_index, next_index and curr_index values that can be fed back in to it. The Paginator alwasy looks for indexes relative to what is passed in, so the requested index doesn't exist --because it was deleted between calls -- it'll find the next one in the order.
So, that should give you a pretty good idea of how the Paginator works. Please post any questions or suggestions as a comment, and I'll see them and address them as best as I am able. Here, then is the actual Paginator code:
#Copyright 2008 Adam A. Crossland
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
#http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from google.appengine.ext import db
import copy
class PaginatedList(list):
"""An extended normal Python list with three additional properties used for
pagination purposes:
prev_index - the starting index of the previous page of entities;
next_index - the starting index of the next page of entities;
curr_index - the starting index of the current page of entities
"""
def __init__(self, *args, **kw):
list.__init__(self, *args, **kw)
self.prev_index = None
"The starting index of the previous page of entities"
self.next_index = None
"The starting index of the next page of entities"
self.curr_index = None
"The starting index of the current page of entities"
class Paginator:
"A class that supports pagination of AppEngine Datastore entities."
def __init__(self, page_size, index_field):
self.page_size = page_size
"The number of entities that constitute a 'page'"
self.index_field = index_field
"The name of the field in the Model that is a orderable index"
def get_page(self, query=None, start_index=None, ascending=True):
"""Takes a normal AppEngine Query and returns paginated results.
query - a Datastore Query object. It must not have an order clause.
start_index - the index of the first record in the desired page. If the
index is not known, or the first page is needed, None should be
passed.
ascending - True if the index column is to be ordered ascending; False
should be passed for descending ordering.
"""
fetched = None
# I need to make a copy of the query, as once I use it to get the main
# collection of desired records, I will not be able to re-use it to get
# the next or prev collection.
query_copy = copy.deepcopy(query)
if ascending:
# First, I will grab the requested page of entities and determine
# the index for the next page
filter_on = self.index_field + " >="
fetched = PaginatedList(query.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size + 1))
if len(fetched) > 0:
# The first row that we get back is the real index.
fetched.curr_index = fetched[0].index
if len(fetched) > self.page_size:
# We fetched one more record than we actually need. That is the
# index of the first record of the next page. Record it, and
# delete the extra record from our collection.
fetched.next_index = fetched[-1].index
del(fetched[-1])
# Now, I will try to determine the index of the previous page
filter_on = self.index_field + " <"
previous_page = query_copy.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size)
if len(previous_page) > 0:
# The last record is the first record in the previous page.
# Record it.
fetched.prev_index = previous_page[-1].index
else:
# Follow the same logical pattern as for ascending, but reverse
# the polarity of the neutron flow
filter_on = self.index_field + " <="
fetched = PaginatedList(query.filter(filter_on, start_index).order("-" + self.index_field).fetch(self.page_size + 1))
if len(fetched) > 0:
# The first row that we get back is the real index.
fetched.curr_index = fetched[0].index
if len(fetched) > self.page_size:
# We fetched one more record than we actually need. That is the
# index of the first record of the next page. Record it, and
# delete the extra record from our collection.
fetched.next_index = fetched[-1].index
del(fetched[-1])
# Determine index of previous page
filter_on = self.index_field + " >"
previous_page = query_copy.filter(filter_on, start_index).order(self.index_field).fetch(self.page_size)
if len(previous_page) > 0:
# The last record is the first record in the previous page.
# Record it.
fetched.prev_index = previous_page[-1].index
return fetched