What SXSW is to the user experience (UX) and design geeks who make the Internet useful, useable and desirable, the Cultivate and Velocity conferences are to the teams of geeks that literally make the Internet function. Both conferences, produced by O'Reilly, were held this week in New York City. The week began with the warm and fuzzy, culture-focused Cultivate (which actually kicked off with a morning yoga session). Day two and three at Velocity… not so warm or fuzzy. Velocity is designed for the hard-core web technology geek. The tag line says it all -- "Building a Faster, Stronger Web"

Much of the content at the Velocity conference focused on the past: what has been created and what has been done (such as creating a vibrant startup culture, utilizing performance management tools, leveraging DevOps culture and practices.)

Courtney Nash, the conference co-chair, spoke about the future. The next wave, she said, would be dominated by three themes: resiliency, concurrency and adaptability.

Richard Cook, professor of healthcare systems safety at the Kungliga Techniska Hogskolan (the Royal Institute of Technology) in Stockholm kept the crowd befuddled and slack-jawed with a keynote that synthesizes and make relevant complex dynamic enterprise systems theory in less than 20 minutes. He opened his remarks by explaining how resiliency in systems is necessary to ensure safety -- so much so that lives around the world are dependent upon it.

Once he got through a couple of quick examples from the medical, public safety, energy, transportation and military communities, his talk took off at near light speed to explain three complex forces that concurrently apply pressure to all complex enterprise systems:

  • Economic failure
  • Unacceptable workload
  • Acceptable performance


Interaction of the operations, financial and safety boundaries. Adapted with permission from “Going Solid” by R. Cook and J. Rasmussen.

These three forces are always in concordance, forcing complex critical systems to operate at peak capacity. Because of the nature of these forces, systems will move in a form representative of "Brownian motion" -- a mathematical model that describes how small particles move in a random way. Propelled by the economic and human workload forces, systems will ultimately and continuously "normalize deviance" to the point where the systems "flirt with the margin of disaster."


Flirting with the margin and marginal creep. Adapted with permission from “Going Solid” by R. Cook and J. Rasmussen.

With this understanding of dynamics, it becomes apparent that the salient question is not "Why do systems fail?" but rather "How is it that they don't fail more often? Why do systems succeed as much as they do?" The answers: Because humans monitor, alert, anticipate and learn from the operating point of the system. These four activities are what create resilience.

Beyond Resiliency

Dave Zwieback, VP of engineering at Next Big Sound, spoke about resiliency and what lies beyond it. Nassim Taleb, a Lebanese American essayist, scholar and statistician whose work focuses on problems of randomness, probability and uncertainty, recently coined the term "antifragility," which is primed to take off in the web technology community.

Taleb and Zwieback have chosen the new term for a reason. Resiliency brings up ideas and images that are close to a mattress made of memory foam: it bends around pressure and returns to form when pressure is removed. Antifragility is distinct from resiliency because it notes that some systems can do more than be resilient.

Systems that are antifragile gain strength from pressure that is applied, for example, systems that feed off and leverage the universal force of entropy.

Zwieback started in an unconventional way by quoting Sidney Dekker, professor, pilot and best-selling author on human factor and safety. To wit, "what you call root cause is simply the place where you stop looking any further".

Zwieback continued by stating that the actual root cause of every problem and outage you have or will experience is already known. His thesis is this: All systems are made up of multiple parts -- they are "compounded." All compounded things are impermanent because all compounded things are at the mercy of time and decay. Impermanence is the root cause of decay.

This doesn't mean that you should stop looking for a deeper understanding of why failures happen. It means that if you really want to minimize failures and outages, develop a model that thrives on chaos and entropy. Zwieback looked to Google and BitTorrent to provide concrete examples of this:

  • In July 2012, a large segment of the web experienced an outage due to a lack of accounting for a leap second. The reason it hit so much of the web was that the very tool used to create stability (standardization of toolsets) was the medium that allowed the smallest bug, what Taleb and Zwieback refer to as a "Black Swan" event, to take them all out of commission.
  • BitTorrent can be used to demonstrate the other side of the coin. In the traditional web world, a Black Swan event -- like everyone in the world simultaneously asking for a file -- would cause mass chaos and crashes. In BitTorrent, however, the same Black Swan event makes the file maximally available and actually renders it nearly impossible to delete.

A Force In The Universe

A naturally occurring example of antifragility in action can be seen when one observes the attempt of governments or other organizations to ban books or discredit an idea. The more effort that goes into banning the book or discrediting the idea, the more well known the book or idea becomes. Zwieback noted that 46 of the 20th century's top 100 novels were targets of ban attempts.

So now the only question that remains is: How do I get this article banned?