Friday, 31 May 2013

Book of the month: Inviting Disaster (Lessons from the edge of technology) by James R Chiles

"Inviting Disaster" has a number of praise quotes on the first page; the quote that made this book seem relevant to a simulation and human factors audience was:
"The lessons are clear and disconcerting: a sequence of minor human errors when combined with elementary technical problems can lead to unimaginable catastrophe."
From a human factors perspective the book gets off to a bad start but redeems itself in later chapters. The 'Special Introduction' to the paperback edition deals with the 9/11 terrorist attacks on the Twin Towers in New York.  It diverges from the rest of the book by not focusing on human error and complexity. Chiles could have discussed the attackers' movements and training in the US and considered the lack of communication between federal agencies (human errors) and the difficulty of tracking thousands of potential suspects (complexity). Instead this special introduction looks at the towers' design, construction and their evacuation. With regard to these factors there really wasn't a sequence of minor errors but instead perhaps a lack of comprehension that airplanes might be used as terrorist weapons.

After the disappointment of the special introduction, the book begins to redeem itself with the first chapter. Chiles describes the sinking of the Ocean Ranger oil rig off the Canadian coast in 1982. The specifics are unfamiliar but the types of error lining up to cause the sinking are familiar ones. They include:

  1. Equipment design

    1. The eight legs which support the oil platform are so big that the designers decided to use the space inside them for storage and other functions, such as the ballast control room (see below). Unfortunately the legs are also closer to the water than the platform, which means they are more likely to be hit by a wave. Also, because the legs are being used for storage, the space inside them could flood during a storm. There is a particular risk of flooding of the four corner legs because they have five-foot holes in them for bringing wire ropes and chains onboard. The conditions required for water to enter these holes (severe tilting of the platform in bad weather) are considered an 'impossibility'. Additionally there is no alarm to warn the crew that water is entering the rig legs.
    2. The ballast control room allows the operator to control the pumps which move the seawater ballast between the ballast tanks and the ocean. There are 16 tanks but only three pumps and this means that valves have to be set to certain positions in order to make sure the correct pump is connected to the correct tank. Under crisis conditions it might prove difficult to work out which pump was working which tank.
    3. The ballast control panel is not protected against seawater. There is a cutoff switch to stop electrical short-circuits from opening and closing the valves automatically, however this switch also disables all the electrical displays. Once the cutoff switch is active there is no way of knowing the valve positions, pump state or depth of ballast within the tanks.
  2. Training

    1. In order to qualify for the job of ballast control operator, a worker had to spend "several hours each day hanging about the control room and watching over the ballast operator's shoulder"(p.27). There had been a training programme for new operators but this had ended some time previously. Lack of training means that the ballast control operators do not know how the pumps work, or that the pumps will not work if the tank to be pumped is much lower than the pumping chamber (as occurs when the rig is listing.)
    2. There was no training for evacuation during a storm. Evacuation drills would occasionally consist of counting the people on the rig, at other times they would lower the lifeboats in calm conditions.
  3. Human error
    1. The glass portholes to the ballast control room can be protected by steel storm covers. These were not put in place before the storm
    2. The most knowledgeable ballast control operator on the rig found out about an emergency workaround which would allow the pump valves to be controlled without electrical power.  This undocumented and untested emergency procedure would allow the control operator to open specific valves. Unfortunately the ballast control operator thought that the procedure would close them instead.

During the storm on the 14th February 1982 a powerful wave smashes one of the ballast control room portholes and seawater splashes onto the control panel. Because the equipment is short-circuiting the cutoff switch is activated. Although not ideal, the rig is in a safe, 'holding' position. The valves to the tanks are all closed, the anchors holding the rig are secure and the rig is straight in the water. In this condition, the rig would not have sunk.

Unfortunately the crew decide at some stage to reconnect the power to the control panel. Electrical short-circuits and possibly trial and error attempts by the crew allow water to enter into the bow section via open valves. The emergency workaround is used at some stage which leads to additional valves opening. Eventually the rig is tilting to such an extent that the 'impossible' event of water entering the five-foot holes in the corner leg on the port bow occurs. As more water enters the rig, the listing worsens until even small waves can reach the hole.

At 01:05 AM the rig's support vessel is radioed to approach the rig which is now being abandoned. Unfortunately the 80mph winds cause the lifeboats to smash against the side of the rig and crack open. Additionally, once the lifeboats reach the water,  the ropes securing them to the rig can only be released when there is no tension on the rope. This means that the lifeboats cannot get clear of the rig and continue to smash into it. When the support vessel reaches the rig at 02:00AM only one lifeboat is still afloat. The eight men inside must have thought they were saved. Unfortunately as they try to climb onto the supply ship's deck, they tumble from the pitching lifeboat and are swept away. The support vessel has no gear to drag the men onboard and the men are too immobilised by cold to climb into the support vessel's life rafts. A total of 84 people die.

The subsequent chapters of Chiles' book detail events which are varied in time, place and industry, but similar in causation. They include the shutting down of the wrong engine on a Boeing 737, the Three Mile Island nuclear accident, the R.101 airship, the Challenger disaster and more. The in depth analyses of the various incidents are one of the book's great assets but also its major weakness. At times the description of the incident is so long that the reason for it being mentioned in the first place is forgotten. There is a lack of a logical progression from chapter to chapter so that, at times, the story seems to be drifting from one disaster to the next without an underlying structure.

However, Chiles does offer some suggestions for reducing human error. In a reference to unknown unknowns he states:
"When meeting a new system, people need time to know its workings under good conditions and bad. The most dangerous time is when the operators don't know what they don't know" (p.39)
Chiles also describes the benefits of a "fresh pair of eyes" in relation to the Three Mile Island accident (p.61) and the importance of situational awareness and communication.

Most importantly Chiles mentions the characteristics of a high-reliability organisation (HRO) (p.62):
  • A priority on safety from top to bottom
  • Deep redundancy so the inevitable errors or malfunctions are caught in time
  • A structure that allows key decisions at all levels
  • A premium on learning lessons from trials and errors 
  • Workers who keep their skills sharp with practice and emergency drills
It is in relation to this last bullet point where simulation can help to build a HRO. Simulation allows repeated, coached practice of routine and emergency drills.

In summary, read this book for its interesting stories of disasters and easy prose, but if you are looking for an in-depth discussion of human factors or human error then skip this one.

No comments:

Post a Comment