Friday, 28 November 2014

Book of the month: Managing Maintenance Error (A Practical Guide) by Reason and Hobbs

About the authors

James Reason is Emeritus Professor of Psychology at the University of Manchester and one of the best-known names in the field of human factors (he came up with the Swiss cheese model of system failure).
Alan Hobbs is a Senior Research Associate at NASA's Human Systems Integration Division with a background as a human performance investigator at the Australian Bureau of Air Safety Investigation.


Who should read this book?

The authors' target readership is "people who manage, supervise or carry out maintenance activities in a wide range of industries." Simulation technicians and managers may therefore find some useful advice about maintenance of sophisticated equipment such as mannequins and audiovisual systems. The book will also appeal to human factors enthusiasts as it explores the unfamiliar world of the routine (maintenance) rather than the more familiar world (to simulation educators) of the crisis. The book will also be of interest to anybody who is involved in creating a "safety culture" or in analysing errors. Lastly, surgeons and other healthcare professionals who carry out maintenance-style tasks may enjoy this book. The authors talk of performing tasks in "poorly lit spaces with less-than-adequate tools, and usually under severe time pressure"; this may strike a chord with some...

In summary

The authors main argument is that maintenance error is a major, but under-investigated, cause of system failure. As automation increases, the maintenance of the automated systems, which is still primarily carried out by human beings, can lead to failure by omission (not stopping a fault) or commission (introducing a fault).

The book consists of 12 chapters:

  1. Human Performance Problems in Maintenance
  2. The Human Risks
  3. The Fundamentals of Human Performance
  4. The Varieties of Error
  5. Local Error-provoking Factors
  6. Three System Failures and a Model of Organizational Accidents
  7. Principles of Error Management
  8. Person and Team Measures
  9. Workplace and Task Measures
  10. Organizational measures
  11. Safety culture
  12. Making it Happen: The Management of Error Management

In a bit more detail

Chapter 1: Human Performance Problems in Maintenance
Reason and Hobbs make their case about the importance of maintenance error and its effects (including Apollo 13, Three Mile Island, Bhopal, Clapham Junction and Piper Alpha). They give us the "good news" that errors are generally not random. Instead, if one looks, one can find "systematic and recurrent patterns" and error traps.

Chapter 2: The Human Risks
The authors inform us that errors are "universal… unintended… (and) merely the downside of having a brain". They argue against trying to change the human condition (human nature) and ask organisations to focus their efforts on changing the conditions in which people work. They also tell us that errors are and should be expected and, in many cases, foreseeable.

Chapter 3: The Fundamentals of Human Performance
Using the activity space to define the 3 performance levels (p.29)
This chapter details human performance and its limitations in terms of attention,  vigilance, fatigue and stress. The authors also mention the "paradox of expertise", where highly skilled people can no longer describe what they are doing. (Teaching your teenage son or daughter how to drive the car might come to mind for some.) The authors explain automatic and conscious control modes (Kahneman's system 1 and system 2), Rasmussen's knowledge-/rule-/skill-based performance taxonomy and then combine them into a useful diagram. Reason and Hobbs also lay out the principal stages in skill acquisition and show how fatigue and stress cause us to revert back to more effortful ways of performing. The Yerkes-Dodson "inverted U-curve" is also referred to and explained.

Chapter 4: The Varieties of Error
Reason and Hobbs' definition of error is:
"An error is a failure of planned actions to achieve their desired goal, where this occurs without some unforeseeable or chance intervention."
They divide error into 3 types:

  1. Skill (failure at action stage): may be a recognition failure, memory failure or slip
  2. Mistake (failure at planning stage): may be rule-based (involving incorrect assumption or bad habit) or knowledge-based (involving failed problem-solving or lack of system knowledge)
  3. Violation (may be routine, thrill-seeking or optimising, and situational)
They then go on to look at the major types of unsafe acts that occur in maintenance. 

Chapter 5: Local Error-provoking Factors
Reason and Hobbs argue that, although there are many possible local factors, there are only a few  which are implicated in the majority of maintenance errors:

  • Documentation
  • Housekeeping and Tool Control
  • Coordination and Communication
  • Tools and Equipment
  • Fatigue
  • Knowledge and Experience
  • Bad Procedures
  • Procedure Usage
  • Personal Beliefs
The authors also provide us with a useful diagram showing the link between errors (Ch4) and contributing factors (Ch5). Thus, slips are linked with equipment deficiencies (too many similar-looking dials), while knowledge errors are linked with inadequate training.

Chapter 6: Three System Failures and a Model of Organizational Accidents
In this chapter Reason and Hobbs introduce us to latent conditions (which they compare to dormant pathogens in the human body) and active failures. They then go on to analyse three maintenance-involved incidents: The crash of an Embraer 120 aircraft in 1991, the Clapham Junction railway collision in 1988 and the Piper Alpha oil and gas platform explosion, also in 1988. They end the chapter by talking about the system defences, either to detect errors or to increase the system's resilience.

Chapter 7: Principles of Error Management
Reason and Hobbs provide us with a set of guiding principles of error management, including:
  • Human error is both universal and inevitable
  • Errors are not intrinsically bad
  • You cannot change the human condition, but you can change the conditions in which humans work
  • The best people can make the worst mistakes
  • Errors are Consequences rather than Causes
They complete the chapter by explaining that error management has 3 components: 1) error reduction, 2) error containment and 3) managing the first 2 so that they continue to work.

Chapter 8: Person and Team Measures
Red flag signals
In this chapter Reason and Hobbs discuss error management strategies directed at the person and the team. This includes providing people with knowledge about human performance and "red flags" which should alert the individual to the potential for error. These could be perhaps considered with reference to Reason's 3 bucket model of person, task, context. I.e. if you are tired, carrying out an unfamiliar task and constantly being interrupted, the risk of making a mistake is high.
The authors also stress the importance of the unseen mental changes required before behavioural changes (e.g. breaking a bad habit) are evident, and that these take time.

Chapter 9: Workplace and Task Measures
This chapter looks at environmental and task factors implicated in errors, including: fatigue, task frequency, equipment and environment design etc. 

Chapter 10: Organizational Measures
In this chapter, the authors look at reactive outcome measures and proactive process measures can be used to look for systemic and defensive weaknesses. They also explain how trust and convenience of reporting is essential in order to develop a safety culture. The Maintenance Error Decision Aid (MEDA) is used to show how information regarding events can be gathered and used to identity failed defences as well as potential solutions. While Managing Engineering Safety Health (MESH) is provided as an example of a proactive process measure.

Chapter 11: Safety Culture
Reason and Hobbs call this "most important chapter in the book. Without a supportive safety culture, any attempts at error management are likely to have only very limited success." They subdivide the safety culture into 3 sub-components:

  1. Reporting culture (the most important prerequisite for a learning culture)
  2. Just culture
  3. Learning culture
They discuss how it is very difficult or impossible to change people's values but much easier to change practices. They use smoking as an example of a practice which has changed because of a change in controls. Exhortations to stop smoking on national TV made little difference, but banning smoking in public places has had a much greater effect. This chapter also introduces us to some tests for determining culpability: the foresight test (would an average person have predicted that the behaviour was likely to lead to harm?) and the substitution test (could an average person have made the same mistake?)

Chapter 12: Making it Happen: The Management of Error Management
The authors discuss Safety and Quality Management systems and the difference between quality assurance and quality control. (Quality assurance ensures quality is engineered into the product at every stage, quality control is about testing the end product, when it's often too late to rectify mistakes). They also discuss organisational resilience which, they say, is a result of three Cs: commitment, competence and cognisance.

I haven't got time to read 175 pages!

The paragraph entitled "Looking ahead" on page 17 provides an overview of the book. In addition, reading through the useful summary at the end of each chapter will tell you if that chapter is worth reading in detail. Personally I found Chapter 7: Principles of Error Management particularly informative as it covered or put into words some concepts I had not yet seen elsewhere, such as "Errors are Consequences rather than Causes."

What's good about this book?

In the Preface the authors state their intention to "avoid psychobabble" and they are true to their word. Also, some useful concepts (e.g. vigilance decrement (p.24), error cascade (p.43), latent conditions (p.77), 5 stages in breaking a bad habit (p.109)) are explained and placed within the wider context of error.

The summaries at the end of every chapter are quick to read but sufficiently detailed to act as an aide-mémoire. 

Although this is a book about "human" error, Reason and Hobbs underline the fact that people are often the solution to problems and that if we had evolved to be "super-safe" and risk averse we probably would not exist as a species. ("What do you mean, you want to leave the cave?")

Lastly, the authors use real-world examples to illustrate the theory. They also provide practical techniques for tackling error, e.g. "ten criteria for a good reminder" (p.131), while stressing that there is no one best way and that people are capable of devising their own solutions.

What's bad about this book?

Nothing… Honestly cannot find fault with this book, it may not be relevant to everyone but otherwise it is worth the time spent with it.

Final thoughts

One would hope that a book co-authored by Reason would be a good read and this book does not disappoint. For the human factors expert perhaps there is nothing new here, but for the rest of us it is worth reading.

No comments:

Post a Comment