Back On The Market!

After a year spent writing my book, working on Aura, speaking at conferences and user groups, advising startups, and proposing new design patterns, I am back on the market.

I’ve been writing PHP code since 1999, and in that time I’ve been everything from a junior developer to a VP of Engineering. If you have a PHP codebase that requires some attention, especially a legacy app that needs to be modernized, I’m your man. I’m also excellent as a leader, mentor, manager, and architect, on small teams and on large ones.

Resume and references available on request. Contact me by email (pmjones88 at gmail) or on Twitter @pmjones if you want to talk!>

UPDATE (Tue 19 Aug): Well that was quick. I’m off the market again, and looking forward to productive efforts with my new employer. My deepest gratitude to everyone who expressed interest; I am truly humbled by the whole experience. Thank you to all.

Complex Systems and Normal Accidents

One of my favourite sections of the book was Harford’s discussion of accidents. Most of the problems Harford examines in the book are complex and “loosely coupled”, which allows experimentation with failure. But what if the system is tightly coupled, meaning that failures threaten the survival of the entire system? This concept reminded me of work by Robert May, which undermined the belief that increased network complexity led to stability.

The concept of “normal accidents”, taken from a book of that title by Charles Perrow, is compelling. If a system is complex, things will go wrong. Safety measures that increase complexity can increase the potential for problems. As such, the question changes from “how do we stop accidents” to how do we mitigate their damage when they inevitably occur? This takes us to the concept of decoupling. When applied to the financial system, can financial institutions be decoupled from the broader system so that we can let them fail?

(Emphasis mine.) via Harford’s Adapt: Why Success Always Starts with Failure.

Blogger outage makes case against cloud-only

Earlier this week, Google rolled out a maintenance release for its Blogger service. Something went terribly wrong, and its Blogger customers have been locked out of their accounts for more than a day. Google’s engineers have been frantically working to restore service ever since, although they haven’t shared any details about the problem.

That’s nearly 48 hours of downtime, and counting. Overnight updates promise “We’re making progress” and “We expect everything to be back to normal soon.”

Google has owned and operated Blogger since 2003. It’s not like they’re still trying to figure out how to integrate the service into their operation. If it can happen at Blogger, why can’t it happen with another Google service?

This, to me, is the strongest possible argument against putting everything you own in the cloud. If your data matters, you need a hybrid strategy, with local storage and local content creation and editing tools. If your local storage fails, you can grab what you need from the cloud. If your cloud service fails, you’ve still got it locally. But if you rely just on the cloud, you’re vulnerable to exactly this sort of failure.

via Google’s Blogger outage makes the case against a cloud-only strategy | ZDNet.

Disaster Rituals

Combine How Complex Systems Fail with Fooled By Randomness and throw in some organizational behavior models, and you get the human response to unforeseen disaster. We think we can prevent future disaster, somehow, by going through a particular set of rituals. Then Malcolm Gladwell asks, way back in 1996:

But what if the assumptions that underlie our disaster rituals aren’t true? What if these public post mortems don’t help us avoid future accidents? Over the past few years, a group of scholars has begun making the unsettling argument that the rituals that follow things like plane crashes or the Three Mile Island crisis are as much exercises in self-deception as they are genuine opportunities for reassurance. For these revisionists, high-technology accidents may not have clear causes at all. They may be inherent in the complexity of the technological systems we have created.

I think there are lessons here for, among other things, the BP oil spill. As with most of Gladwell, it’s worth your time to read the whole thing.

How Complex Systems Fail

The paper How Complex Systems Fail by Richard Cook should be required reading for anyone in programming or operations. Hell, it should be required reading for most everyone. You should read the whole paper (it’s very short at under five pages), but here are the main points:

  1. Complex systems are intrinsically hazardous systems.
  2. Complex systems are heavily and successfully defended against failure.
  3. Catastrophe requires multiple failures – single point failures are not enough.
  4. Complex systems contain changing mixtures of failures latent within them.
  5. Complex systems run in degraded mode.
  6. Catastrophe is always just around the corner.
  7. Post-accident attribution accident to a ‘root cause’ is fundamentally wrong.
  8. Hindsight biases post-accident assessments of human performance.
  9. Human operators have dual roles: as producers & as defenders against failure.
  10. All practitioner actions are gambles.
  11. Actions at the sharp end resolve all ambiguity.
  12. Human practitioners are the adaptable element of complex systems.
  13. Human expertise in complex systems is constantly changing
  14. Change introduces new forms of failure.
  15. Views of ‘cause’ limit the effectiveness of defenses against future events.
  16. Safety is a characteristic of systems and not of their components.
  17. People continuously create safety.
  18. Failure free operations require experience with failure.

Points 2 and 17 are especially interesting to me. It’s not that people can build complex systems that work; instead, it is that people have to actively prevent system failure. Once you stop maintaining the system, it begins to fail. It sounds like thermodynamics; without a constant input of energy from people, the mostly-orderly complex system will descend into increasing disorder and failure.