Everybody was keyed-up, waiting for Blaine Cook of Twitter to talk, and he finally did. It wasn't quite what we expected (well, what were we expecting?), but it was nonetheless educational and really sort of surprising.
Cook took advantage of a long list of Twitter's mistakes and weaknesses to introduce us to Starling, his response to platforms based on Erlang and other technologies -- which he asserts would have been a general nightmare.Some stats on Twitter:
* 600 requests per second, growing fast
* 180 rails instances via Mongrel, growing fast
* 1 database server MySQL + 1 Slave
* 30-odd processes for misc. jobs
* 8 sun X4100s
* A growing community of users, growing fast
Cook rapidly follows these quants by giving a big speech about why growing Web 2.0 enterprises should not behave the way Twitter has.
To illustrate that admonition, he lent some obvious (but easily overlooked) wisdom: test everything, install an exception notifier and logger, and think out your analytics strategy. With a wry smile he noted that Google Analytics doesn't give you any analytics when your pages don't load, a major consideration when your business model depends almost entirely on asynchronous content loading.
Also, the Action Cache Plugin becomes a young enterprise's best friend when a site is growing too quickly to adequately support its users. With Action Cache, at least you know your cached pages will render properly if your site crashes.
"There's nothing worse," Cook notes, "than serving users the information they're not looking for." Unfortunately, few people go a-looking for an error page.
Don't try being fancy. Working, simple functionality is better.
Denormalize a lot. This leads to faster queries, simpler adds and deletes, plus everything just runs better. No caching of individual active record objects is necessary, a big boon for Twitter.
And don't make the following icky mistakes:
* bob.friends.map(&:email) works great when Bob only has 3 friends. when bob has 3,000, not so good.
* status.count() stops working when there are millions of rows -- your site will effectively shut down.
Presently Twitter's working on better partitioning. Partitioning is tricky for social apps, particularly when some users have thousands of friends. You can't guarantee all your operations will be read-only, a risky proposal because if your Slaves get written-on, your site is down the drain.
Twitter makes extensive use of MemCache -- a Ruby client library for the popular memcached distributed object cache, which does pretty much all their caching. If you need a stat count, for example, just write a simple stat count into MemCache. status.count, a more popular option, isn't scalable; at least not at Twitter's rate. When your site gets big you might pay dearly for lack of foresight.
To wrap up the presentation, Cook explores different functionality platforms Twitter used before rendering an elegant solution. They started with DRb -- easy, fairly fast (client can sit on another machine, shared object, persistent thread). Unfortunately it's also flaky, with zero redundancy, and it's tightly coupled.
No major solutions really improve on DRb so they explored other options including Rinda, ActiveMQ, RabbitMQ and MySQL + Lightweight Locking (LiveJournal's method).
RabbitMQ was toted as the most viable option, but when push came to shove, troubleshooting documentation was virtually non-existent. "Plus it's ugly," Cook adds.
Enter Starling, a platform written entirely for and by Twitter. Starling consists of 200 lines of Ruby code and is handling 4,000 transactional messages per second. Its first pass was written in four hours and it speaks MemCache. Talk about elegant. Cook thought so.
Teasing the crowd a little, Cook suggested that Starling may be released as an open source solution, but stopped short of making any promises.
One major component to Twitter's ongoing success (or survival) is their zero-tolerance policy on abuse. When a user makes a thousand friends in a day, Twitter just shuts them down -- no questions, no second chances.
One bad seed just ain't worth website apocalypse. Check Blaine Cook out on Twitter. You can also peep his whole presentation here.
CMSWire is a leading, native digital publication produced by Simpler Media Group, Inc. We provide articles, research and events for sophisticated professionals driving digital customer experience strategy, evolving the digital workplace and creating intelligent information management practices. The CMSWire team produces 450+ authoritative articles per quarter for our 750,000 community members. Join us as a subscriber.