Windows Azure Table Storage is a great way to store your data: partitions make it easy to scale and it’s very cheap (as opposed to Windows Azure SQL Database). But there are some downsides of choosing Table Storage: no indexes (except for the partition and row key), no full text search and no sorting.
Well, actually you get sorting. The entities are stored in lexicographical order and this is based on the row key. Steve Marx wrote a blog post about this (over 4 years ago!): Using Numbers as Keys in Windows Azure. He generates a row key which is based on the DateTime.UtcNow.Ticks to sort items in reverse chronological order. This is a great way to (ab)use lexicographical order of entities to make sure your items are sorted by newest first/oldest first. And many applications you build often need this: a blog where you want the newest posts first, a CMS that should show the newest articles first, a todo list which shows the oldest items first, …
Before we look at the NuGet package we’ll take a quick look at why we need it.
The problem
Let’s take a look at an application you could build in Windows Azure. This would be a social application (duh) which stores blog posts from popular blogs like the Windows Azure blog. Now we will use Windows Azure Table Storage to store this information.
I use the SyndicationFeed class (comes with the System.ServiceModel assembly) to fetch the last 25 items from the Windows Azure blog:
So if I get these blog posts and write them to the console this is what I’ll get:
The font is rather small, but if you take a closer look you’ll see that items arrive in reverse chronological order. This means that I’ll see a blog post which was posted today as the first item, and a blog post of a month ago as the last item. Let’s try to store this in Table Storage.
So I created a class BlogPostEntity which inherits from TableEntity, mapped it the BlogPost records to this entity and stored this in Table Storage. As you can see I’m using an index as the RowKey.
Using a sequential integer as the RowKey is a poor choice for this application. Once I start getting more users / processes I will see concurrency issues popping up (users trying to add records with the same RowKey) and this stores my entities in a lexicographical order. This means that an entity with RowKey 19 will come before an entity with RowKey 2.
The same happens if we use a Guid as the RowKey. While this fixes the concurrency issue (the chance that the same Guid is generated twice is very small, so we assume it’s unique), it still doesn’t store the entities in reverse chronological order:
There’s a package for that
There are a few blog posts explaining how you can sort entities in (reverse) chronological order in Table Storage but this can get a little heavy. You will need to work with DateTime.UtcNow.Ticks or DateTime.MaxValue.Ticks – DateTime.UtcNow.Ticks, add zero-padding, use CompareTo when querying for items, … That’s why I created the following NuGet package to make things easier:
PM> Install-Package WindowsAzure.ChronoTableStorage
Creating entities
I can now update my code and use the RowKey.CreateReverseChronological method (by passing it the date the blog post was published). This will generate a row key which will make sure the items are sorted in reverse chronological order:
Now this is looking much better, the items are stored in reverse chronological order. When I’ll query this table I will always receive the newest posts first which is exactly what I need for application.
Internally the row key is generated like this:
The row key will contain the ticks, a separator (the default separator is a dash, but you can change it) and a suffix (by default this is a Guid, but you can also change this). Now there will be times where you will want to sort items in chronological order (a task list for example). In that case you will only need to generate a row key using the RowKey.CreateChronological method:
As you can see the CreateChronological method will give you the opposite result (old items first):
Query based on dates and date ranges
Our entities are stored in chronological or reverse chronological order, which is great. Typically you would show the last 10 records in a table and use continuation tokens to support paging. That way, your user will be able to navigate through the items in a (reverse) chronological order.
But what if you want to show today’s items? Or all records from last week? Maybe records created on a specific date?
You might need to scroll a bit to the right to see the actual query statement. Instead of using TableQuery.GenerateFilterCondition you’ll be using ChronologicalTableQuery.GenerateFilterCondition to generate a condition based on a (reverse) chronological row key. If your entities are sorted in chronological order you’ll be using the QueryDateChronologicalComparisons comparison enum and if you’re storing them in reverse chronological order you simply use the QueryDateReverseChronologicalComparisons enum. Both enums allow you to use the following comparisons:
Using TableQuery.CombineFilters you can combine different comparisons which allow you to query for all items withing a date range or entities for a specific day.
Finally, if you’re still using the WCF DataServices implementation you can simply use one of the Where extension methods:
Final note: All dates passed to the RowKey class or used in queries will be converted to UTC with the DateTimeOffset class
The code (with the examples) is available on GitHub.
Enjoy!