Failure nets : the ‘dark matter’ of complex systems

A thought:

Complex systems are prone to fail and, knowing that, we build into them every manner of defense against such outcomes. If we consider complex systems as networks, with the number and variety of nodes of any type being some sort of measure of overall complexity, the nodes, edges and paths all hold a binary potential: to push the system towards correct/expected behaviour, or failure. In the absence of design, each of these possible points or paths through the system would have an equal likelihood of an expression leading to one of these two outcomes, so that there would be equal opportunities for failure and success. In designed systems, we stack the odds to lower the overall probability of actual failure.

I think we tend to forget about this ‘alter network’  and it’s ‘failure processing’; it bites us when we least expect it. It can be thought of as a sort of inverse of the understood, ‘as-designed’ system. It anti-operates / anti-runs alongside the latter, and is inextricably tangled with it. The best we can do is siphon power away from it.

A non-dualistic approach to system design &  failure analysis

This way of seeing a system suggests that we might benefit from modelling incorrect behaviour in pretty much the same way that we model correct/required behaviour. We have a tendency to view correct behaviour as a synergy of positive, stable interactions between functioning subsystems. When it comes to faults, however, we like to think of these as somehow singular, restricted to a specific source or cause. In other words failure is not seen as an equally concerted effort of different moving parts. We assume that failure has a fixed locus, and that faults are simply to be located and rooted out. And naturally, we look inside our system boundary first, as if the failure network ever had any interest or knowledge of that boundary. Corollary: since the system boundary is for all intents and purposes a designed artifact, it follows that it is the designed (required) system that is aware of it. The failure system itself “couldn’t care less” about the big box around your system diagram.

Failures have their own synergy

I think that failures have a certain synergy: faults interact and accumulate, the same way that the slightly-more-fortuitous operations within the system do. And here I’m not just referring to the simplistic notion of ‘chain reactions’ or domino-effects that lead to catastrophe. It’s true that it never rains and then it pours, but far more mundane interactions attend malfunctions in general. It follows that faults can also ‘accidentally’ dampen their own systemic ‘failure signal’, or cancel each other out. In this way, lesser failures can go un-manifested (and unnoticed), leading to surprise and shock when they finally reveal themselves as part of a larger fiasco.

Even then, they might remain invisible. They can hide under cover of transient environmental conditions (again, the irreverence toward the system boundary)… and cause the failure analyst to submit a ‘no fault found’ (NFF) valuation. At best, the failure network allows one or two ‘faulty nodes’  to be sacrificed (found and excised) during failure analysis, while the ‘failure network’  itself remains intact.

Failures at the human scale

Engineers may nibble at the edges of failure, trying to get a grip on it… but up and down the country organisations made up of humans and machines, along with myriad processes that unfold at human-speed, fail spectacularly everyday and the failure analyses – where they take place at all – are laughable.

In the business world failure analysis takes place as a self-cannibalising 2-pronged attack:

  • Divergence analysis (“this was the target, we missed it by this much; what gives?!”)
  • Firing of staff / Resigning of staff (“somebody’s gotta take the fall”).

Very rarely are the subtleties of personality types, lines of command, control and communication, team dynamics, speed of growth, competitor activity, recent events, timing of events, spread of talent and so on ever taken into consideration. The ‘alter network’ of divergent aspirations, watercooler talk, gossip, egos, shortcuts taken by adapting (and adaptable) humans when presented with unclear tasks; lack of direction and mismatched skillsets and so on combine synergetically to create a failure network / system that chugs along happily: the expected organism, only in flipmode.

Without the kind of planning ahead that we bring to bear on engineering projects, without stacking the odds in favour of the required system behaviour, we can end up with a failure signal propagating through an organisation. Human intuition detects this, but it is also human to do nothing, or not know what to do, until that magical coincidence of badly-done tasks yields a sufficiently botched outcome.

One thing is for sure: we refuse to pay attention to, or study failure as a system / a holistic force in its own right. Perhaps we fear that to do so might give it too much power.

***

image: tweaked tiny portion of the circuit diagram for the Z80 microprocessor.

0 Replies to “Failure nets : the ‘dark matter’ of complex systems”

  1. I dunno if I buy the basic premise. Networks are designed (or alternatively, evolve) with a certain cohesion because their disparate elements, working together, accomplish some function that promotes system persistence. (You keep your computer around because it runs payroll without fucking up, for e.g.; an evolved brain integrates metabolic and sensory processes in a way that keeps the organism going.) Faults, in contrast, are self-defeating; some random glitch causes the system to crash, the glitchy elements along with it. There’s no up side, no payoff that would promote cooperation among faults. They’re just random kamikaze sand in the gears; how can that form any kind of “network”, anti or otherwise?

    Your office-politics comparison kind of provides an example of this, but it doesn’t ring true because the individual nodes (i.e., the people) are pretty self-contained systems in their own right, pursuing their own agendas. Petty political infighting may fuck up the social structure, but at least some of the people within that metasystem benefit, so you can see how those kind of destructive dynamics would persist. But I don’t an analogy to the more conventional silicon networks of which your speak.

    Of course, I’m not a compsci guy by any means. What am I missing here?

  2. [updated]
    howdy, giant squid.

    Faults, in contrast, are self-defeating; some random glitch causes the system to crash, the glitchy elements along with it. There’s no up side, no payoff that would promote cooperation among faults.

    I didn’t say faults cooperated, I said that they interacted and accumulated.

    Faults are not not always “self-defeating”: eg resonance in a badly designed bridge, coupled with unfortunate environmental coincidences (a windy day) creates a positive-feedback highway (haha) to destruction. A better example of a network of faults: Consider the Denver airport baggage handling system described thus:

    “…bags were misloaded, were misrouted, or fell out of telecarts, causing the system to jam. The baggage system continued to unload bags even though they were jammed on the conveyor belt, because the photo eye at this location could not detect the pile of bags on the belt and hence could not signal the system to stop. The baggage system also loaded bags into telecarts that were already full. Hence, some bags fell onto the tracks, again causing the telecarts to jam…”

    You can read a very basic account of interacting faults here.

    I dunno if I buy the basic premise. Networks are designed…

    We just design a lot of them nowadays. Perhaps you mean a network kind of ‘has an agenda’, and there’s no way a collection of faults could have one. My basic premise is that we should model faults as if they had an agenda… as if they were working together in concert.. Because when they finally manifest themselves, it’s almost as if they’ve been in cahoots with each other all along.

    There is a holism to failure; inter-dependencies across time and space are the linkages of the network of which I speak. It is not a sentient network…

Leave a Reply

Your email address will not be published. Required fields are marked *