Friday, 28 November 2014

Book of the month: Managing Maintenance Error (A Practical Guide) by Reason and Hobbs

About the authors

James Reason is Emeritus Professor of Psychology at the University of Manchester and one of the best-known names in the field of human factors (he came up with the Swiss cheese model of system failure).
Alan Hobbs is a Senior Research Associate at NASA's Human Systems Integration Division with a background as a human performance investigator at the Australian Bureau of Air Safety Investigation.


Who should read this book?

The authors' target readership is "people who manage, supervise or carry out maintenance activities in a wide range of industries." Simulation technicians and managers may therefore find some useful advice about maintenance of sophisticated equipment such as mannequins and audiovisual systems. The book will also appeal to human factors enthusiasts as it explores the unfamiliar world of the routine (maintenance) rather than the more familiar world (to simulation educators) of the crisis. The book will also be of interest to anybody who is involved in creating a "safety culture" or in analysing errors. Lastly, surgeons and other healthcare professionals who carry out maintenance-style tasks may enjoy this book. The authors talk of performing tasks in "poorly lit spaces with less-than-adequate tools, and usually under severe time pressure"; this may strike a chord with some...

In summary

The authors main argument is that maintenance error is a major, but under-investigated, cause of system failure. As automation increases, the maintenance of the automated systems, which is still primarily carried out by human beings, can lead to failure by omission (not stopping a fault) or commission (introducing a fault).

The book consists of 12 chapters:

  1. Human Performance Problems in Maintenance
  2. The Human Risks
  3. The Fundamentals of Human Performance
  4. The Varieties of Error
  5. Local Error-provoking Factors
  6. Three System Failures and a Model of Organizational Accidents
  7. Principles of Error Management
  8. Person and Team Measures
  9. Workplace and Task Measures
  10. Organizational measures
  11. Safety culture
  12. Making it Happen: The Management of Error Management

In a bit more detail

Chapter 1: Human Performance Problems in Maintenance
Reason and Hobbs make their case about the importance of maintenance error and its effects (including Apollo 13, Three Mile Island, Bhopal, Clapham Junction and Piper Alpha). They give us the "good news" that errors are generally not random. Instead, if one looks, one can find "systematic and recurrent patterns" and error traps.

Chapter 2: The Human Risks
The authors inform us that errors are "universal… unintended… (and) merely the downside of having a brain". They argue against trying to change the human condition (human nature) and ask organisations to focus their efforts on changing the conditions in which people work. They also tell us that errors are and should be expected and, in many cases, foreseeable.

Chapter 3: The Fundamentals of Human Performance
Using the activity space to define the 3 performance levels (p.29)
This chapter details human performance and its limitations in terms of attention,  vigilance, fatigue and stress. The authors also mention the "paradox of expertise", where highly skilled people can no longer describe what they are doing. (Teaching your teenage son or daughter how to drive the car might come to mind for some.) The authors explain automatic and conscious control modes (Kahneman's system 1 and system 2), Rasmussen's knowledge-/rule-/skill-based performance taxonomy and then combine them into a useful diagram. Reason and Hobbs also lay out the principal stages in skill acquisition and show how fatigue and stress cause us to revert back to more effortful ways of performing. The Yerkes-Dodson "inverted U-curve" is also referred to and explained.

Chapter 4: The Varieties of Error
Reason and Hobbs' definition of error is:
"An error is a failure of planned actions to achieve their desired goal, where this occurs without some unforeseeable or chance intervention."
They divide error into 3 types:

  1. Skill (failure at action stage): may be a recognition failure, memory failure or slip
  2. Mistake (failure at planning stage): may be rule-based (involving incorrect assumption or bad habit) or knowledge-based (involving failed problem-solving or lack of system knowledge)
  3. Violation (may be routine, thrill-seeking or optimising, and situational)
They then go on to look at the major types of unsafe acts that occur in maintenance. 

Chapter 5: Local Error-provoking Factors
Reason and Hobbs argue that, although there are many possible local factors, there are only a few  which are implicated in the majority of maintenance errors:

  • Documentation
  • Housekeeping and Tool Control
  • Coordination and Communication
  • Tools and Equipment
  • Fatigue
  • Knowledge and Experience
  • Bad Procedures
  • Procedure Usage
  • Personal Beliefs
The authors also provide us with a useful diagram showing the link between errors (Ch4) and contributing factors (Ch5). Thus, slips are linked with equipment deficiencies (too many similar-looking dials), while knowledge errors are linked with inadequate training.

Chapter 6: Three System Failures and a Model of Organizational Accidents
In this chapter Reason and Hobbs introduce us to latent conditions (which they compare to dormant pathogens in the human body) and active failures. They then go on to analyse three maintenance-involved incidents: The crash of an Embraer 120 aircraft in 1991, the Clapham Junction railway collision in 1988 and the Piper Alpha oil and gas platform explosion, also in 1988. They end the chapter by talking about the system defences, either to detect errors or to increase the system's resilience.

Chapter 7: Principles of Error Management
Reason and Hobbs provide us with a set of guiding principles of error management, including:
  • Human error is both universal and inevitable
  • Errors are not intrinsically bad
  • You cannot change the human condition, but you can change the conditions in which humans work
  • The best people can make the worst mistakes
  • Errors are Consequences rather than Causes
They complete the chapter by explaining that error management has 3 components: 1) error reduction, 2) error containment and 3) managing the first 2 so that they continue to work.

Chapter 8: Person and Team Measures
Red flag signals
In this chapter Reason and Hobbs discuss error management strategies directed at the person and the team. This includes providing people with knowledge about human performance and "red flags" which should alert the individual to the potential for error. These could be perhaps considered with reference to Reason's 3 bucket model of person, task, context. I.e. if you are tired, carrying out an unfamiliar task and constantly being interrupted, the risk of making a mistake is high.
The authors also stress the importance of the unseen mental changes required before behavioural changes (e.g. breaking a bad habit) are evident, and that these take time.

Chapter 9: Workplace and Task Measures
This chapter looks at environmental and task factors implicated in errors, including: fatigue, task frequency, equipment and environment design etc. 

Chapter 10: Organizational Measures
In this chapter, the authors look at reactive outcome measures and proactive process measures can be used to look for systemic and defensive weaknesses. They also explain how trust and convenience of reporting is essential in order to develop a safety culture. The Maintenance Error Decision Aid (MEDA) is used to show how information regarding events can be gathered and used to identity failed defences as well as potential solutions. While Managing Engineering Safety Health (MESH) is provided as an example of a proactive process measure.

Chapter 11: Safety Culture
Reason and Hobbs call this "most important chapter in the book. Without a supportive safety culture, any attempts at error management are likely to have only very limited success." They subdivide the safety culture into 3 sub-components:

  1. Reporting culture (the most important prerequisite for a learning culture)
  2. Just culture
  3. Learning culture
They discuss how it is very difficult or impossible to change people's values but much easier to change practices. They use smoking as an example of a practice which has changed because of a change in controls. Exhortations to stop smoking on national TV made little difference, but banning smoking in public places has had a much greater effect. This chapter also introduces us to some tests for determining culpability: the foresight test (would an average person have predicted that the behaviour was likely to lead to harm?) and the substitution test (could an average person have made the same mistake?)

Chapter 12: Making it Happen: The Management of Error Management
The authors discuss Safety and Quality Management systems and the difference between quality assurance and quality control. (Quality assurance ensures quality is engineered into the product at every stage, quality control is about testing the end product, when it's often too late to rectify mistakes). They also discuss organisational resilience which, they say, is a result of three Cs: commitment, competence and cognisance.

I haven't got time to read 175 pages!

The paragraph entitled "Looking ahead" on page 17 provides an overview of the book. In addition, reading through the useful summary at the end of each chapter will tell you if that chapter is worth reading in detail. Personally I found Chapter 7: Principles of Error Management particularly informative as it covered or put into words some concepts I had not yet seen elsewhere, such as "Errors are Consequences rather than Causes."

What's good about this book?

In the Preface the authors state their intention to "avoid psychobabble" and they are true to their word. Also, some useful concepts (e.g. vigilance decrement (p.24), error cascade (p.43), latent conditions (p.77), 5 stages in breaking a bad habit (p.109)) are explained and placed within the wider context of error.

The summaries at the end of every chapter are quick to read but sufficiently detailed to act as an aide-mémoire. 

Although this is a book about "human" error, Reason and Hobbs underline the fact that people are often the solution to problems and that if we had evolved to be "super-safe" and risk averse we probably would not exist as a species. ("What do you mean, you want to leave the cave?")

Lastly, the authors use real-world examples to illustrate the theory. They also provide practical techniques for tackling error, e.g. "ten criteria for a good reminder" (p.131), while stressing that there is no one best way and that people are capable of devising their own solutions.

What's bad about this book?

Nothing… Honestly cannot find fault with this book, it may not be relevant to everyone but otherwise it is worth the time spent with it.

Final thoughts

One would hope that a book co-authored by Reason would be a good read and this book does not disappoint. For the human factors expert perhaps there is nothing new here, but for the rest of us it is worth reading.

Monday, 24 November 2014

Human factors and the missing suitcase (by M Moneypenny)

The 2014 ASPiH conference took place at the East Midlands Conference Centre in Nottingham. The conference hotel was located a stone's throw away. The free Wi-fi, clean rooms and provision to print out your boarding cards made staying at this award-winning establishment a nice experience. Until the missing suitcase that is…

A timeline of events

Like many hotels, the Orchard Hotel offered a luggage storage facility. I handed in my suitcase and was given a small paper tab, the number on this matched the tag placed on my luggage. For additional security my name was written on the luggage tag. (Fig. 1)

Fig 1: Ironic luggage tag

The suitcase was then taken to a storage area, to be collected at the end of the conference. So far, so normal…

At the end of the conference I wandered over to the hotel reception, luggage tab in hand and was slightly dismayed to find that all the suitcases had been placed in the hotel lobby. "Not great security", I thought. My dismay turned into slight panic when I couldn't find my suitcase amongst the twenty or so that were left. Where was my "Very Important Package"? I asked the front of house manager, who was standing at reception, and she went off to look for it. After about ten minutes she returned to tell me that those were all the suitcases from the conference and was I sure it wasn't there? I was sure… At this stage there was only one suitcase left (which bore only fleeting resemblance to mine) and (by looking inside it) the front of house manager was able to identify the owner.

Fig 2: @TheRealAlMay springs into action
With the power of social media (Fig 2; thanks for the RTs) and Google, we were able to obtain contact details of the supposed lapse-maker. By the time I touched-down in Scotland there was an apologetic email in my inbox. The other person had a similar suitcase at home and had been distracted looking for their coat. They hadn't realised they had the wrong suitcase until they opened it up to do the washing… (No comment).

After a couple of unreturned phone calls I managed to speak to the general manager (GM) of the hotel the next day, to find out how they would endeavour to return the suitcase to me. To my surprise the GM told me that had this been their "fault" they would've made sure a courier had picked it up and returned it to me, but because it wasn't their responsibility they would be willing to pay 50% of the cost. I did my best to explain that if the suitcase had not been placed in the foyer (and what was the point of the luggage tag system anyway?) then it wouldn't have been taken in error. After a polite discussion the GM asked me to leave it with him.

Thankfully my suitcase (and the laptop inside) arrived the next day and I could get back to writing my MD, blog, etc.

Human factors

  1. The luggage tag system I: This is a relatively robust system if the "rules" are followed. You get your tab, you go back with your tab, hand it to the receptionist and tell them your name (as an additional check) and he/she gets your suitcase, having checked the tab and your name with the tag.
  2. The luggage tag system II: This is a very slow system. Especially when over 200 delegates want to pick up their luggage at the same time, which is why the luggage was placed in the foyer for people to "pick your own".
  3. The lapse: It's the end of a long (but engaging) day, you want to catch the train and get back to your family. There is a bit of a problem with finding your coat but you've got your suitcase and you're rushing out to the taxi. (Would the error pass Reason's substitution test? It sure would.)
  4. Blaming the sharp end: The hotel general manager was very keen to point out that this person had walked off with my suitcase and that they (the hotel) was not at fault. Blaming the person at the sharp end is a symptom of poor organisational culture.

Lessons learned

My suitcase now has a very distinctive red and white ribbon to make it look more "unique". Unfortunately this probably also makes it stand out more for opportunistic thieves…..