Fault Trees vs Failure Modes Analysis

FMA, or FMEA (Failure modes and effects analysis) is the old-fashioned way of hunting down all the ways in which things can go pear-shaped, in the hopes that such events could be avoided. I was reminded of it when I came across a random article using fault trees to illustrate ways in which modern disk drives (capacious as they’ve become) are more prone to fail. But I’m not gonna be talking about disk drives…

A brief overview of FMEA

In short: F=the failure, M=the mode, E=the effects, A=Analysis.

FMEA is basically a proactive approach, and it does not concern itself with functional design or physical realities of the system whose SNAFUs it’s trying to prevent. The whole idea is to brainstorm on all the possible ways in which things can go wrong. Example: If I tell you I’m building an atomic kettle, you should immediately be listing things like: nuclear meltdown, scalding of user, water remains cold, etc etc. These are failure modes of an atomic kettle, regardless of how (or, God help us, why) one is even built.

Of course it’s generally insufficient to just list all the horrible scenarios; you need to sift through them and (in the case of FMECA (…’Critical’ Analysis…)) identify the items that are mission-critical, safety-critical, and so on.

The process may be as simple as a tabular listing of all the failure modes and effects, or an involved FMA lifecyle with several teams. The deliverables of the process are the FMEA documents and the recommendations it makes, which is supposed to feedback into (primarily) the design phase of a larger SDLC. For in-depth info, check out http://fmea-fmeca.com/index.html

FTA

Fault Tree analysis, on the other hand, is like drawing a flowchart describing all the things that can go wrong:

It’s not as ‘unconcerned’ with the details as FMA is, but it might be a friendlier way forward as it is a graphical representation of a system’s faults.

The cool thing about FTA is that if you were to take real-life measurements of the number of times each of the faults manifested themselves, you could assign a probability rating to each of the Boolean paths in the tree, which could be used to feed a simulation program which would then be able to predict outcomes based on certain initial unfavorable conditions.

Comparison

FMEA can be used to inform very low-level design and development processes but it’s tempting to just use it for a top-down, high-level view of the system. I also like that the ‘how’ of a failure mode is largely irrelevant. This allows a third party to objectively carry out FMEA without intimate knowledge of the system.

FTA by relying on boolean logic does impose some concern with system details – at least that’s my personal take on it. But it’s visual format and logical underpinnings makes it easier to automate, update and analyse. (of course, FMA has its own automated tools too).

erm.. Back to disk drives

If you’re wondering, here’s the original article that started me thinking about this post: http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=506&page=1

Leave a Reply

Your email address will not be published. Required fields are marked *