This year for Drupalcon, the organizers felt a little cross-pollination was in order. There are multiple talks that don't in fact feature Drupal (news, site) at all. Instead, the topics revolve around things that the Drupal community can learn from.
One of those talks was offered by Ben Sandofsky (@sandofsky) of Twitter (news, site).
Enter the Twitterverse
The story of Twitter is a matter of scale. There are 107 million Twitter accounts and climbing. These accounts generate 50 million tweets per day, and through interaction with the site and clients, 3 billion API requests a day. That's not a typo. Three billion—representing 75% of Twitter's traffic.
As anyone familiar with the infamous Fail Whale knows, the folks at Twitter have learned a lot of rough lessons along the way as their real-time messaging service grew in popularity. The solutions revolve really around one central concept, that of removing bottlenecks.
Hardware Bottlenecks
At one point, Twitter was a cloud-based service. However, the cloud simply couldn't offer the level of performance that they needed as they grew, so Twitter's servers all now run out of a single data center.
Human Capacity Bottlenecks
Twitter has 175 employees. Among those, the devs follow an agile model with one-week sprints and code in pairs. Pair programming has helped cut down on the amount of time lost to relatively simple bugs like typos, not to mention helps them better avoid issues like memory leaks since there are always two sets of eyes on the code.
From there, the pairs are integrated into teams, which work on features. Since the teams are distributed they use a web chat product called Campfire to coordinate things, choosing it for its logging, file upload features and the fact that being browser-based means people can access it when necessary without having to go and get an IRC client. As much as possible, they're trying to remove issues where individuals or teams become bottlenecks.
Process Bottlenecks
The Twitter code repository is managed through Git. There are dozens of active code base branches, with the farthest out branches typically relating to features, which feed into Team branches, which feed into the Master branch, and so on. The smarter the logic for controlling branch syncs and merges, the less bottlenecks exist.
For issue tracking they use Pivotal Tracker. Sandofsky says that he finds that breaking an issue into points rather than units of time generates more active estimates of how long it will take to get something done.
Software Bottlenecks
While he went heavily into how they handle incoming API calls and Tweets, I'm going to break this down into some basics. Anyone familiar with the Fail Whale knows that Twitter has had a number of growing pains in trying to handle the massive traffic the network can generate. Navigating the rough waters of those pains has led the company down a heavily distributed path that involves:
- Pre-processing incoming messages through business logic that checks to make sure the tweet is within the 140 character limit, the user is authenticated and the account isn't offline for spam or other behavioral issues
- Handing off the message to a brand new instance of a queue
- Handing the message off to a brand new instance of a worker process to finish the job
This distributed workflow is designed to move things through as quickly as possible. Sandofski says that there are some trade-offs for the speed, such as timelines that might show tweets arriving in a different order than their real arrival times, but they feel that given the performance benefits a few tweets out of order is worth it.
Continue reading this article:

Full RSS Feed
Receive
the Free CMSWire Newsletter
Email It