Both web and worker roles have a variety of places to store data in the Azure cloudscape. Since processing large amounts of data is likely the raison d’etre of cloud applications in the first place, understanding all the possible places that data might be stored (and their trade-offs) is of some benefit when designing cloud applications.
What follows is more of my personal brain dump based on my experience with Azure. I don’t include performance metrics because I suppose them to be continually changing and subject to variability based on the application in question. I recommend collecting performance metrics in the design phase to give further weight to your own design choices.
LOCAL Memory (per Instance)
Obviously the fastest place to cache your data, but sadly quite limited in size (depending on the size of the Azure instances that you choose (and pay for)).
Also sadly not shareable between multiple instances running in the same web role. Storing session state on one machine helps you very little when the next request lands on the next machine in your web farm.
LOCAL STORAGE (PER INSTANCE)
Storing to disk has exactly the same benefits and trade-offs that one might expect – a far greater capacity than local memory but far longer read and write speeds. Where possible, asynchronous writes may be possible to ameliorate some of the performance penalty. Again, not shareable between instances.
SQL Azure
Since web roles with multiple instances often do need a place to share session data, the database represents a logical place. The ACID properties of a relational database mean that even when individual instances of your application die and are recycled, the user should see no difference in their session from the remaining instances that are still serving requests.
However the size constraints of SQL Azure databases (and expense involved) mean that not every piece of data should simply be bundled into the database. Large blobs of data are certainly a good candidate for blob storage, but their timestamps and other meta information might be stored in the database to aid retrieval time and to integrate with stored procedure logic.
Blob Storage
Blob storage is the cheapest way to store large amounts of data “in the cloud”, although the current offering is not especially compelling, performance wise. Since you pay per transaction it also represents a poor choice to put things that you will be frequently accessing.
Blob storage is like the “stack” at your local library – non frequently accessed things can be stored at low cost for large amounts of time, and brought into the forward caches as appropriate.
Windows Azure AppFabric Caching
An alternative place to share session state between instances, AppFabric caching sounds good in theory but suffers a little in practise. The idea is that it represents a high speed storage location that is easily shared between instances. The reality is that it will shut you off if you use it too much, and the performance does not seem especially compelling. It is however, considerably faster than blob storage.
Putting it all together
A well designed Azure application will take into account all the strengths and weaknesses of the storage options available and work with each accordingly.
Often data seems to move like a waterfall across the different tiers. If an application might require some particular data it might check first with a centralised and reliable store such as the SQL database as to where that data is located and its latest timestamp. Then it might check its various caches for the blob’s handle and timestamp – first in local memory, then local storage, then the AppFabric cache and finally from blob storage. When it finds its data it might refresh the other (empty or invalidated) caches that failed it along the way.
Other times a backend process might invalidate some piece of non-SQL data, update both blob and cache storages and then send a message to each role instance that the data has changed. Each role instance in turn can then query the cache service for the data, and fall back on blob storage if the cache drops the ball. The role instance might store that data in local memory and local storage and simply assume that the data remains valid until a notification to the contrary is received.
But as with all advice, your mileage will vary ; )
0 comments:
Post a Comment