Saturday, July 30, 2011

To The Cloud!

First of all, what a great job Marketing has done with The Cloud.  Nobody knows what the hell it means, but everyone seems motivated to get on the cloud, somehow, anyhow.  At the heart of the cloud is a big secret that is easily glossed over  – the cloud is nothing new at all, just more of the same - big servers, centralised access, the same ideas that computing has been using from day one (there used to be mainframes that took up an entire office suite, now modern computers live in giant warehouses out in the sticks.  That right there is 50 years of Progress).

Aside from the brief and increasingly irrelevant “PC revolution” (the idea that large clusters of small machines would be cheaper than one large mainframe – an idea carefully promulgated by Microsoft and other interested parties), computing has always been a big business kind of thing to do, with big business efficiency gains of centralization and commodification the driving forces behind the push for cloud services.

Anyway, I too partook the cool aid and moved Ask Bluey to the cloud this year, from it’s previous home on shared hosting.  Shared hosting was actually surprisingly efficient and cost effective – it was able to accommodate thousands of visitors a day, but it was most definitely grinding to a halt when you threw in web crawlers, which accounted for around half of the traffic to the website).

The premise of being able to scale out Ask Bluey as traffic ebbed and flowed was hard to resist, but the reality is that architecting for the cloud really quite difficult to get right (although the end result is “better”, in the same way that truly cross platform C++ code feels “better” than code with implicit assumptions about bit alignment, or god forbid, non standard compiler extensions).

The first big mistake I made was buying into the NO SQL movement’s claim that the future of databases lay in non relational databases.  The idea is that you still have tables to put your structured data into, but that you have to manage the keyed links between these tables yourself (while a SQL database enforces referential integrity on your behalf).  What you gain is that these tables can be partitioned across machines, which means that they scale out better.  Maybe.

What I didn’t realise is that the “table storage” that the cloud providers offer is not NO SQL, but some weird hybrid of blob storage with two keys rather than one.  While it would certainly be possible to use table storage instead of a SQL database, it is not a good idea for two reasons.  First of all, the performance is not very good.  I found a table based lookup to take twice as long compared to a blob based lookup, which is an order of magnitude slower than a local disk based retrieval.  When you have to retrieve from multiple tables and deal with “upsert” semantics, the performance penalties are cumulative and prohibitive.

The second reason to use SQL over table storage is that you have to pay for each transaction.  They are billed in blocks of 10,000 per cent, so they sound so cheap that you don’t have to think too hard about them.  But they can add up!  In one month my “table storage” architecture blew $250 on these seemingly insignificant transactions.  Once I had moved back to SQL, the same usage pattern cost $10 per month, as SQL databases are billed on instance size rather than per transaction.

Table storage still has a place in a cloud based architecture, but it only really makes sense in a few limited scenarios.  An ideal usage would be an archive of tweets on a per user basis.  The partition key is on the user, and the row key is the date.  Buy keeping these large wads of data out of the database the SQL instance size is kept down, and the performance penalty becomes less important because of the “archive” aspects.  Obviously nobody wants their entire twitter history every single day, but it might be useful to get once or twice a year (the most recent tweets could be in the SQL database).

In this the way table storage becomes ancillary to a good old fashioned SQL database and again nothing has really changed in any aspect of computing, despite the hype.  In some ways this is reassuring ; )

0 comments: