While you're reading this, keep in mind that I'm
available for hire stupid!
Setting the scene
I think I’d enjoy building a bunker. I’d certainly enjoy having a bunker. I’m not very good at digging holes though – my back hurts for days if I even spend too long in the garden. The software equivalent of a bunker though? That sounds like something I could do.
Recently I’ve had a bit of a fascination with CouchDB. Every now and then I go take another read through the replication and conflict resolution documentation, and I just… enjoy it. It’s like technical poetry. Simple, isolated concepts coming together cleanly to form a cohesive system with well-defined failure modes and fallback processes. Beautiful. CouchDB is a bit of a tragedy though – it does its job quite well, but its operating model of conflict resolution and highly-asynchronous replication is so foreign that it’s often misapplied. It doesn’t help that one of its highest-profile implementations (the npm registry) was historically plagued with issues stemming from that misapplication. I think this has led to it receiving far less positive attention than it deserves.
I’ve also been really enjoying SQLite – though I’ve had SQLite close at hand for quite some time now. I have a few programs using it that have survived the gauntlet of production, and I’ve come to appreciate it even more over the years. It simply does not die. Energizer bunny of databases. Sure, you can make it die, but if you’re colouring inside the lines it just will not let you down. Frightfully simple to take care of too! You want to take a backup from production? scp that bad boy down to your laptop at 4:45 on a Friday and you’ll be out the door before 5, with time to put your cup in the dishwasher on the way out. Want regular backups? Wire that scp command up to cron. Don’t want to copy 100MB every few minutes? rsync to the rescue. SQLite plays well with others.
One more paragraph of setup - stick with me.
Get to the point
Enough stories. Time for me to tie my rambling together.
The common thread here is that these things are reliable. In some ways they’re a little more work to get running, but like a toothbrush in the vacuum of space, you give them a nudge in the right direction and they’ll just keep going. I’ve found this to be a refreshing change of pace.
In a strange turn of luck, this coincided with me reading Nikita Prokopov’s article Software Disenchantment. In his article, he focuses mostly on runtime efficiency, but he does raise complexity and reliability as important and closely-related topics.
You might be familiar with “prepping” – this is a hobby/lifestyle based on preparedness. Generally this is preparedness for disaster resulting in, let’s say, interruptions to societal function. These potential disasters can range from the outlandish, like a zombie virus outbreak, to the more common, like natural disasters or extended utility outages. I’ve always thought this was pretty cool. Stockpiling canned food, learning survival skills, determining a potential hierarchy of the neighbourhood folk and planning how you might claw your way to the top of it, Mad Max style… just the usual stuff.
This brings me to my opening statement - I think would enjoy building a bunker. I’m pretty sure my real estate would be mad if I dug fifty cubic meters of soil out from my yard to install an underground shipping container stronghold though, so I’ll have to start smaller. I want to go all-in on engineering for failure. For me, this means thinking more carefully about the software I build, and how I can prepare it for various failure modes.
Okay. Down to business. I had a bit of a think about the failure modes of my own software, and I’ve decided to categorise them as follows. I’ve ordered them roughly from highest to lowest urgency.
- Network – the most obvious weakness is network connectivity. Few of the things I build are useful without an internet connection, and most of them are pretty unpleasant to use if that connection isn’t rock-solid broadband. This is not inherent to their functionality though – just their implementations. I’d like to make sure my software functions in a local capacity for extended periods with reduced connectivity, and for reasonable periods with no connectivity.
- Electricity – more specifically, not enough electricity. Especially in the case of software I run at home or onsite, I want to prevent a blackout from being a big deal. Ideal case would be if a power outage of less than a few hours did not interrupt operation at all. Stretch goal: reduce baseline energy requirements such that an application will run (perhaps with reduced functionality) from batteries for an extended period.
- Hardware – alternatively “too much electricity”. Let’s say there’s a lightning strike and it literally explodes the computer running Important Business Application 2000. This should not be fatal. It should be annoying at worst, and almost unnoticeable at best. This means some form of redundancy in the form of geographically-separate installations.
- Stability – think platform stability rather than runtime. Though “don’t crash” stability is equally important, I’m lucky not to have too many problems there. I’d just like to make sure that my software continues operating with as little maintenance and upkeep as possible. I shouldn’t have to adapt my software too often to account for platform changes. The only way I can think of to do this is to choose which platform features I use carefully, and conservatively.
I’d love to hear from anyone with similar thoughts. Have you done anything to
analyse the failure modes of your projects? How would you plan to address them?
Is there anything obvious I’ve missed? Tweet at me on twitter at
@deoxxa, or send
hate fan mail to